用户名: 密码: 验证码:
多维联机分析处理中的高效查询关键方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
联机分析处理(OLAP)是商务智能(BI)的关键技术之一,已成为人们获取知识和辅助决策的重要工具。然而,由信息化程度的提高所引起的数据量增大和数据维度数增多,以及决策支持所要求的高效率等特点,都促使着OLAP高效查询技术的不断发展和进步。因此,在多维海量数据环境中,如何提高OLAP查询的效率,缩短查询时间,以达到辅助决策的目的,就成为研究工作的焦点。
     本文以实现多维OLAP高效查询为目标,对提高OLAP查询效率的若干关键方法进行了研究。为了充分发挥数据分析的功能,本文借鉴了联机分析挖掘(OLAM)的概念,将数据挖掘、统计分析方法运用到研究过程中,设计了一个集成了数据挖掘、统计分析方法的OLAP查询框架,以提高OLAP查询效率。在这个框架中,对提高OLAP查询效率的三项关键技术进行了研究:由于数据立方体是进行OLAP查询的数据基础,其构建方法直接影响OLAP查询的效率,因此本文对其物化方法进行了研究;OLAP近似查询方法能够在查询时间与查询精度之间实现很好的折衷,有利于OLAP查询效率的显著提升,因此也成为本文研究的主要内容之一;OLAP查询维度推荐也是本文的研究内容之一,它以辅助决策为出发点,为用户提供与查询目标密切相关的维度,以缩短OLAP查询时间。具体来说,本文针对上述OLAP高效查询方法,进行了以下研究:
     (1)本文将数据挖掘的思想引入到提高OLAP查询效率的研究过程中。将关联规则挖掘技术中经典Apriori理论的思想运用于OLAP查询的数据基础——数据立方体的构建过程中,并提出用户兴趣度的概念,以此作为约束条件,以用户使用系统进行查询的实际情况为依据,设计了数据立方体部分物化的冰山立方体构建算法,以及增量式更新冰山立方体的方法。该方法通过对数据方体进行有选择的物化,使得系统在处理用户查询时不需通过即时计算。同时,由于考虑了用户查询的实际情况,该方法在大大节省数据存储空间的同时,使得此数据立方体支持用户查询的程度保持在较高水平,进一步提高了OLAP查询的效率。
     (2)本文首次将统计方法中的Copula理论引入到OLAP近似查询建模过程中,扩大了Copula理论的应用领域,针对连续维度建立了OLAP范围查询模型。本文所建立的模型提取了大量数据中的概要信息,只需要存储相关样本及参数信息,大大节省了数据的存储空间,且在保证查询准确率的前提下大大提高了OLAP查询效率。为了提升OLAP近似查询模型的精确度,本文采取了一系列措施。首先,在对各个维度样本数据进行边缘分布拟合时,为了使拟合效果更准确,本文采用了非参数核密度估计方法代替分布已知的参数方法对样本数据进行拟合,将模型的实用性推广到大部分数据;其次,本文充分考虑了各维度之间可能存在相关性的情况,使用Copula函数对联合分布进行拟合,提取维度间的相依结构,使分布拟合的结果更精确。同时,模型支持在连续维度上直接进行OLAP查询,当进行钻取操作时不需要事先设定维度的层次,大大增加了OLAP查询的灵活性。
     (3)针对维度较高的OLAP数据集,本文将适用于OLAP数据立方体的基于“C藤”结构的Pair Copula方法引入到OLAP近似查询的建模过程中,在使用Copula函数的基础上进一步考虑了不同观察维度与度量维度之间相关性的差异,根据样本数据的特征自由选取和构造其相关结构,使得模型拟合结果的精确度得到进一步提升,并使得模型适用于高维数据环境。
     (4)在将数据挖掘思想引入OLAP查询的研究过程中,本文还针对高维OLAP数据集,将变量选择方法运用到OLAP查询维度推荐上。由于高维OLAP数据集具有所含信息量大、不同维度间具有不同程度的相关性等特点,因此一定程度上影响了用户OLAP查询的效率,从而干扰了用户决策的效率和准确性。针对数据中存在对查询目标而言的冗余维度,本文设计了一种支持OLAP查询维度推荐的维度选择算法。该算法根据用户提供的决策属性分类信息来有针对性地去除与决策目标不相关的维度,并且同时找出具有线性相关性的维度集合,在有效识别观察维度之间的相关关系的同时提取与用户查询目标关联最紧密的维度集合,从而大大提高用户的查询与决策效率。
As one of the key techniques of business intelligence (BI), on-lineanalytical processing (OLAP) is a very important way for knowledgeacquisition and decision support. Meanwhile, due to the increase of datavolumn and data dimensionality caused by the improvement ofinformatization, and the high-efficiency required by decision support, theresearch on efficient OLAP query is developing. Aiming at decisionsupport, it becomes an important research topic to study how to improvethe multi-dimensional OLAP query efficiency in massive high-dimensional data sets so as to shorten the query time.
     The purpose of this dissertation is to realize the high-efficiency ofmulti-dimensional OLAP query by studying the key approaches forimproving the OLAP query efficiency. In order to bring the full play ofdata analysis, the concept of on-line analytical mining (OLAM) is usedfor reference in this dissertation. Data mining techniques and statistical analysis approaches are integrated to form an OLAP query frame. Thenthe key techniques are studied in the framework. Since data cube is thebase of OLAP query of which the construction way directly influences onthe OLAP query efficiency, the materialization way of data cube isstudied. On the other hand, OLAP approximate query approach is capableto realize the tradeoff between the query time and the query accuracy,which is beneficial to remarkably improve the OLAP query efficiency, soit is also considered to be a part of main content of this dissertation.Meanwhile, in order to improve the efficiency of OLAP query, therecommendation of OLAP query dimensions is another way from theother perspective. This approach focuses on aid decision making, whichprovides the users the dimensions closely related to the query target so asto shorten the query time. To study on the approaches which can improvethe efficiency of OLAP query mentioned above, the primary work of thisdissertation includes:
     (1) The idea of data mining is introduced into the research onhow to improve the OLAP query efficiency in this dissertation. TheApriori theory is applied in building data cube which is the base of OLAPquery. User-interest is proposed to be the constraint condition of choosingthe frequent data cuboids. According to the frequently used queries, aniceberg data cube construction algorithm is designed to construct the datacube, meanwhile the method to incrementally update the iceberg cube isproposed. It enables the system to respond the OLAP query withoutrealtime computing by partial materialization. On the other hand, sincethe approach is based on the real log of queries, the OLAP queryefficiency is further improved due to the strong support to the queriesrequested by users while the data storage space is drastically reduced.
     (2) Model construction for OLAP approximate query is also aneffective way to improve the query efficiency. During the research,Copula theory is extended to the new filed to build a statistical model on continuous dimensions for range queries. The model is used to extract thedata synopsis, which stores the related samples and the information ofparameters. The model drastically saves the data storage space whileimproves the OLAP query efficiency guaranteed with acceptableaccuracy rate. In order to improve the accuracy of OLAP approximatequery model, some methods are carried out to solve the problem. First, inorder to fit the marginal distribution of each dimension more precise,nonparametric kernel density estimation is applied instead of parametricmodels with known distribution, by which the applicability of the modelis extended to most type of data. Second, after considering the existingcorrelation between dimensions, Copula is used to fit the jointdistribution, which is aimed at extracting the dependency structure to fitthe data distribution more accurately. On the other hand, it is easy toimplement OLAP operations like drill-down or roll-up on continuousdimensions without setting up the dimension levels in advance, which makes the OLAP query procedure flexible.
     (3) The high-dimensional data environment is considered in thisdissertation.“C-Vine” Pair Copula is adopted, which makes a further stepon applying Copula to OLAP approximate query. It fits the structure ofOLAP data set, and the difference of correlation between differentobservation dimensions and measure dimension is taken into account, bywhich the dependency structure is constructed according to the features ofsamples. The accuracy of the query results derived by the model is furtherimproved, especially in high-dimensional data environment.
     (4) From the perspective of introducing data mining to OLAPquery, feature selection is applied to OLAP query dimensionrecommendation in high-dimensional OLAP data set. The OLAP data setincludes massive data and there is correlation with different intensitybetween different dimensions. Consequently, the OLAP query efficiencyis influenced and the decision efficiency and accuracy are interfered. In order to remove the redundant dimensions according to the query target, adimension selection algorithm is designed to support the recommendationof OLAP query dimensions. This algorithm is capable to remove thedimensions not related to the decision target based on the information ofdecision attribute classification, as well as to recognize the dimension setincluding the correlated dimensions. The purpose of this work is toprovide users the referential query dimension set, so as to improve thequery and decision efficiency.
引文
[1] S. Chaudhuri, U. Dayal, V. Narasayya. An overview of businessintelligence technology [J]. Communications of the ACM,2011,54(8):88-98.
    [2] A. Cuzzocrea. Improving range-sum query evaluation on data cubesvia polynomial approximation [J]. Data&Knowledge Engineering,2006,56(2):85-121.
    [3] J. Han, S. H. S. Chee, J. Y. Chiang. Issues for on-line analyticalmining of data warehouses [C]. SIGMOD'98Workshop on ResearchIssues on Data Mining and Knowledge Discovery1998:1-5.
    [4] J. Han. Olap mining: An integration of olap with data mining [C].PROCEEDINGS OF THE7TH IFIP2.6WORKINGCONFERENCE ON DATABASE SEMANTICS1997:1-9.
    [5] E. F. Codd, S. B. Codd, C. T. Salley. Providing olap (on-line analyticalprocessing) to user-analysts: An it mandate [J]. Codd and Date,1993,32:1-31.
    [6] J. Han. Towards on-line analytical mining in large databases [J]. ACMSigmod record,1998,27(1):97-107.
    [7] E. Pourabbas, M. Rafanelli. Characterization of hierarchies and someoperators in olap environment [C]. DOLAP '99Proceedings of the2nd ACM international workshop on Data warehousing and OLAP,1999:54-59.
    [8] H. J. Lenz, B. Thalheim. Olap databases and aggregation functions
    [C]. Proceedings of the13th International Conference on Scientificand Statistical Database Management2001:91-100.
    [9] M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, J. D.Ullman. Computing iceberg queries efficiently [C]. VLDB'98International Conference on Very Large Databases1998:299-310.
    [10] K. Beyer, R. Ramakrishnan. Bottom-up computation of sparse andiceberg cube [J]. ACM Sigmod record,1999,28(2):359-370.
    [11] D. Xin, J. Han, X. Li, B. W. Wah. Star-cubing: Computing icebergcubes by top-down and bottom-up integration [C]. VLDB '03Proceedings of the29th international conference on Very large databases2003:476-487.
    [12] J. Han, J. Pei, G. Dong, K. Wang. Efficient computation of icebergcubes with complex measures [C]. Proceedings of the2001ACMSIGMOD international conference on Management of data2001:1-12.
    [13] Z. Shao, J. Han, D. Xin. Mm-cubing: Computing iceberg cubes byfactorizing the lattice space [C]. Proceedings of16th InternationalConference on Scientific and Statistical Database Management,2004:213-222.
    [14] J. T. Horng, Y. J. Chang, B. J. Liu. Applying evolutionary algorithmsto materialized view selection in a data warehouse [J]. SoftComputing-A Fusion of Foundations, Methodologies andApplications,2003,7(8):574-581.
    [15] M. C. Hung, M. L. Huang, D. L. Yang, N. L. Hsueh. Efficientapproaches for materialized views selection in a data warehouse [J].Information Sciences,2007,177(6):1333-1348.
    [16] M. Lee, J. Hammer. Speeding up materialized view selection in datawarehouses using a randomized algorithm [J]. International Journalof Cooperative Information Systems,2001,10(3):327-353.
    [17] P. Karde, V. Thakare, S. Deshpande. An effective cost approachtechnique using materialized view for query evaluation [J].International Journal Of Computer Science And Applications,2011,4(1).
    [18]林子雨,杨冬青,宋国杰,王腾蛟,唐世渭.实时主动数据仓库中多维数据实视图的选择[J].软件学报,2008,19(02):301-313.
    [19] C. Zhang, J. Yang. Genetic algorithm for materialized view selectionin data warehouse environments [C]. DaWaK '99Proceedings of theFirst International Conference on Data Warehousing and KnowledgeDiscovery1999:116-125.
    [20] J. H. Gu, X. L. Zhao, Q. Tan. Application of ant colony system tomaterialized views selection [J]. Journal of Computer Applications,2007,11.
    [21] X. Song, L. Gao. An ant colony based algorithm for optimalselection of materialized view [C].2010International ConferenceonIntelligent Computing and Integrated Systems (ICISS),2010:534-536.
    [22] R. Derakhshan, F. Dehne, O. Korn, B. Stantic. Simulated annealingfor materialized view selection in data warehousing environment [C].DBA'06Proceedings of the24th IASTED international conferenceon Database and applications2006:89-94.
    [23] Z. Yuhang, L. Qi, Y. Wei. Materialized view selection algorithm—cssa_vsp [C].2010Second International ConferenceonComputational Intelligence and Natural Computing Proceedings(CINC),2010:68-71.
    [24] V. Harinarayan, A. Rajaraman, J. D. Ullman. Implementing datacubes efficiently [C]. SIGMOD '96Proceedings of the1996ACMSIGMOD international conference on Management of data,1996:205-216.
    [25] C. Zhang, X. Yao, J. Yang. An evolutionary approach to materializedviews selection in a datawarehouse environment [J]. IEEEtransactions on systems, man and cybernetics,Part C,2001,31(3):282-294.
    [26] J. Yang, K. Karlapalem, Q. Li. Algorithms for materialized viewdesign in data warehousing environment [C]. VLDB '97Proceedingsof the23rd International Conference on Very Large Data Bases1997:136-145.
    [27] P. Kalnis, N. Mamoulis, D. Papadias. View selection usingrandomized search [J]. Data&Knowledge Engineering,2002,42(1):89-111.
    [28] Y. Feng, D. Agrawal, A. El Abbadi, A. Metwally. Range cube:Efficient cube computation by exploiting data correlation [C].Proceedings20th International Conference on Data Engineering,2004:658-669.
    [29] A. Casali, S. Nedjar, R. Cicchetti, L. Lakhal. Closed cube lattices [J].New Trends in Data Warehousing and Data Analysis,2009,3:1-20.
    [30] Y. Zhao, P. M. Deshpande, J. F. Naughton. An array-based algorithmfor simultaneous multidimensional aggregates [C]. Proceedings ofthe1997ACM SIGMOD international conference on Managementof data1997:159-170.
    [31] L. V. S. Lakshmanan, J. Pei, J. Han. Quotient cube: How tosummarize the semantics of a data cube [C]. VLDB '02Proceedingsof the28th international conference on Very Large Data Bases2002:778-789.
    [32]向隆刚,龚健雅.一种高度浓缩和语义保持的数据立方[J].计算机研究与发展,2007,44(05):837-844.
    [33] W. Wang, J. Feng, H. Lu, J. X. Yu. Condensed cube: An effectiveapproach to reducing data cube size [C]. Proceedings18thInternational Conference on Data Engineering,2002:155-165.
    [34] Y. Sismanis, A. Deligiannakis, N. Roussopoulos, Y. Kotidis. Dwarf:Shrinking the petacube [C]. Proceedings of the2002ACM SIGMODinternational conference on Management of data2002:464-475.
    [35] N. Karayannidis, T. Sellis. Sisyphus: The implementation of achunk-based storage manager for olap data cubes [J]. Data&Knowledge Engineering,2003,45(2):155-180.
    [36]李.高宏,李金宝.数据仓库系统中层次式cube存储结构[J].软件学报,2003,14(07):1258-1266.
    [37]孙延凡,陈红. Gsfc——基于图结构的free cube存储方法[J].计算机研究与发展,2004,41(10):1652-1660.
    [38]李盛恩,王珊. Star cube——一种高效的数据立方体实现方法[J].计算机研究与发展,2004,41(04):587-593.
    [39]骆吉洲,李建中.一种有效的关系数据库压缩方法[J].软件学报,2005,16(02):205-214.
    [40]杨科华,胡孔法.基于维层次性的data cube存储优化方法[J].东南大学学报(自然科学版),2005,35(04):524-527.
    [41] O. Tlili, M. Sassi, H. Ounelli. Intelligent database flexible queryingsystem by approximate query processing [C]. DBKDA2011, TheThird International Conference on Advances in Databases,Knowledge, and Data Applications,2011:128-135.
    [42] C. T. Ho, R. Agrawal, N. Megiddo, R. Srikant. Range queries in olapdata cubes [J]. SIGMOD REC,1997,26(2):73-88.
    [43] A. Cuzzocrea. S-olap: Approximate olap query evaluation on verylarge data warehouses via dimensionality reduction and probabilisticsynopses [J]. Enterprise Information Systems,2009:248-262.
    [44] N. Hachem, C. Bao, S. Taylor. Approximate query answering innumerical databases [C]. SSDBM '96Proceedings of the EighthInternational Conference on Scientific and Statistical DatabaseManagement1996:63-73.
    [45] Y. Chen, G. Dong, J. Han, J. Pei, B. W. Wah, J. Wang. Regressioncubes with lossless compression and aggregation [J]. Knowledgeand Data Engineering, IEEE Transactions on,2006,18(12):1585-1599.
    [46] D. Barbará, X. Wu. Loglinear-based quasi cubes [J]. Journal ofIntelligent Information Systems,2001,16(3):255-276.
    [47] V. Poosala, P. J. Haas, Y. E. Ioannidis, E. J. Shekita. Improvedhistograms for selectivity estimation of range predicates [J]. ACMSigmod record,1996,25(2):294-305.
    [48] V. Poosala, Y. E. Ioannidis. Selectivity estimation without theattribute value independence assumption [C]. VLDB '97Proceedingsof the23rd International Conference on Very Large Data Bases1997:486-495.
    [49] Y. Ioannidis, V. Poosala. Histogram-based approximation of set-valued query-answers [C]. VLDB '99Proceedings of the25thInternational Conference on Very Large Data Bases1999:174-185.
    [50] V. Poosala, V. Ganti. Fast approximate answers to aggregate querieson a data cube [C]. SSDBM '99Proceedings of the11thInternational Conference on Scientific and Statistical DatabaseManagement1999:24-33.
    [51] M. Muralikrishna, D. J. DeWitt. Equi-depth multidimensionalhistograms [J]. ACM Sigmod record,1988,17(3):28-36.
    [52] S. Christodoulakis. Implications of certain assumptions in databaseperformance evauation [J]. ACM Transactions on Database Systems(TODS),1984,9(2):163-186.
    [53] D. Gunopulos, G. Kollios, V. J. Tsotras, C. Domeniconi.Approximating multi-dimensional aggregate range queries over realattributes [C]. SIGMOD '00Proceedings of the2000ACMSIGMOD international conference on Management of data2000:463-474.
    [54] A. Deshpande, M. Garofalakis, R. Rastogi. Independence is good:Dependency-based histogram synopses for high-dimensional data [J].ACM Sigmod record,2001,30(2):199-210.
    [55]曹巍,王珊,覃雄派,王秋月.面向不同数据分布的多维直方图算法coca-hist [J].计算机学报,2008(06):1013-1024.
    [56] A. Cuzzocrea, P. Serafino. Lcs-hist: Taming massive high-dimensional data cube compression [C]. EDBT '09Proceedings ofthe12th International Conference on Extending DatabaseTechnology: Advances in Database Technology,2009:768-779.
    [57] P. Schroder. Wavelets in computer graphics [J]. Proceedings of theIEEE,1996,84(4):615-625.
    [58] D. A. Keim, M. Heczko. Wavelets and their applications in databases[J]. Tutorial Notes of ICDE,2001.
    [59] M. Garofalakis, P. B. Gibbons. Wavelet synopses with errorguarantees [C]. Proceedings of the2002ACM SIGMODinternational conference on Management of data2002:476-487.
    [60] Y. Matias, J. S. Vitter, M. Wang. Dynamic maintenance of wavelet-based histograms [C]. VLDB '00Proceedings of the26thInternational Conference on Very Large Data Bases,2000:101-110.
    [61] J. S. Vitter, M. Wang, B. Iyer. Data cube approximation andhistograms via wavelets [C]. Proceedings of the seventhinternational conference on Information and knowledge management1998:96-104.
    [62] K. Chakrabarti, M. Garofalakis, R. Rastogi, K. Shim. Approximatequery processing using wavelets [J]. The VLDB Journal,2001,10(2):199-223.
    [63] S. Guha, H. Park, K. Shim. Wavelet synopsis for hierarchical rangequeries with workloads [J]. The VLDB Journal,2008,17(5):1079-1099.
    [64] L. Choudur, U. Dayal, C. Gupta, R. Swaminathan. On waveletcompression and cardinality estimation of enterprise data [R],2010.
    [65] J. S. Vitter, M. Wang, B. Iyer. Data cube approximation andhistograms via wavelets [C].1998:96-104.
    [66] Y. Matias, J. S. Vitter, M. Wang. Wavelet-based histograms forselectivity estimation [J]. ACM Sigmod record,1998,27(2):448-459.
    [67] A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, M. J. Strauss. Surfingwavelets on streams: One-pass summaries for approximateaggregate queries [C]. VLDB '01Proceedings of the27thInternational Conference on Very Large Data Bases,2001:79-88.
    [68] J. S. Vitter, M. Wang. Approximate computation of multidimensionalaggregates of sparse data using wavelets [J]. ACM Sigmod record,1999,28(2):193-204.
    [69] G. R. Cormode, M. N. Garofalakis. Fast approximate wavelettracking on streams [P].
    [70] R. Jin, L. Glimcher, C. Jermaine, G. Agrawal. New sampling-basedestimators for olap queries [C]. ICDE '06Proceedings of the22ndInternational Conference on Data Engineering,2006:18-18.
    [71] P. R sch, W. Lehner. A sample advisor for approximate queryprocessing [C]. ADBIS'10Proceedings of the14th east Europeanconference on Advances in databases and information systems,2010:490-504.
    [72] A. Cuzzocrea, V. Russo, D. Saccà. A robust sampling-basedframework for privacy preserving olap [C]. DaWaK '08Proceedingsof the10th international conference on Data Warehousing andKnowledge Discovery2008:97-114.
    [73] X. Li, J. Han, Z. Yin, J. G. Lee, Y. Sun. Sampling cube: A frameworkfor statistical olap over sampling data [C]. SIGMOD '08Proceedingsof the2008ACM SIGMOD international conference onManagement of data2008:779-790.
    [74] P. B. Gibbons, Y. Matias. New sampling-based summary statisticsfor improving approximate query answers [C]. SIGMOD '98Proceedings of the1998ACM SIGMOD international conference onManagement of data1998:331-342.
    [75] S. Chaudhuri, G. Das, V. Narasayya. Optimized stratified samplingfor approximate query processing [J]. ACM Transactions onDatabase Systems (TODS),2007,32(2):9.
    [76] V. Ganti, M. L. Lee, R. Ramakrishnan. Icicles: Self-tuning samplesfor approximate query answering [C]. VLDB '00Proceedings of the26th International Conference on Very Large Data Bases2000:176-187.
    [77] S. Chaudhuri, G. Das, M. Datar, R. Motwani, V. Narasayya.Overcoming limitations of sampling for aggregation queries [C].Proceedings17th International Conference on Data Engineering,2001:534-542.
    [78] B. Babcock, S. Chaudhuri, G. Das. Dynamic sample selection forapproximate query processing [C]. Proceedings of the2003ACMSIGMOD international conference on Management of data2003:539-550.
    [79] S. Acharya, P. B. Gibbons, V. Poosala. Congressional samples forapproximate answering of group-by queries [C]. SIGMOD '00Proceedings of the2000ACM SIGMOD international conference onManagement of data2000:487-498.
    [80] M. Ceci, A. Cuzzocrea, D. Malerba. Olap over continuous domainsvia density-based hierarchical clustering [J]. Knowlege-Based andIntelligent Information and Engineering Systems,2011:559-570.
    [81] Y. Feng, S. Wang. Compressed data cube for approximate olap queryprocessing [J]. Journal of Computer Science and Technology,2002,17(5):625-635.
    [82] A. Cuzzocrea, P. Serafino. Clustcube: An olap-based framework forclustering and mining complex database objects [C]. Proceedings ofthe2011ACM Symposium on Applied Computing2011:976-982.
    [83] J. Spiegel, N. Polyzotis. Tug synopses for approximate queryanswering [J]. ACM Transactions on Database Systems (TODS),2009,34(1):3-58.
    [84] A. Cuzzocrea. Lsa-based compression of data cubes for efficientapproximate range-sum query answering in olap [J]. Advances inIntelligent Information Systems,2010:111-145.
    [85] G. Adomavicius, A. Tuzhilin. Toward the next generation ofrecommender systems: A survey of the state-of-the-art and possibleextensions [J]. IEEE Transactions on Knowledge and DataEngineering,2005,17(6):734-749.
    [86] R. Baeza-Yates. Query intent prediction and recommendation [C].RecSys '10Proceedings of the fourth ACM conference onRecommender systems2010:5-6.
    [87] Q. He, D. Jiang, Z. Liao, S. C. H. Hoi, K. Chang, E. P. Lim, H. Li.Web query recommendation via sequential query prediction [C].Proceedings of the2009IEEE International Conference on DataEngineering,2009:1443-1454.
    [88] A. Ashkan, C. L. Clarke, E. Agichtein, Q. Guo. Classifying andcharacterizing query intent [C]. ECIR '09Proceedings of the31thEuropean Conference on IR Research on Advances in InformationRetrieval2009:578-586.
    [89] E. Sadikov, J. Madhavan, L. Wang, A. Halevy. Clustering queryrefinements by user intent [C]. Proceedings of the19th internationalconference on World wide web,2010:841-850.
    [90] A. Giacometti, P. Marcel, E. Negre, A. Soulet. Queryrecommendations for olap discovery driven analysis [C]. Scientificand Statistical Database Management2009:81-88.
    [91] N. Khoussainova, M. Balazinska, W. Gatterbauer, Y. C. Kwon, D.Suciu. A case for a collaborative query management system [C].4thBiennial Conference on Innovative Data Systems Research,2009:8-14.
    [92] K. Stefanidis, M. Drosou, E. Pitoura.“You may also like” results inrelational databases [C]. Proceedings of Very Large Database2009,2009:74-80.
    [93] S. Sarawagi. User-adaptive exploration of multidimensional data [C].Proceedings of the26th VLDB Conference,2000:307-316.
    [94] A. Giacometti, P. Marcel, E. Negre. Recommendingmultidimensional queries [C]. DaWaK '09Proceedings of the11thInternational Conference on Data Warehousing and KnowledgeDiscovery2009:453-466.
    [95] F. Bentayeb, C. Favre. Rok: Roll-up with the k-means clusteringmethod for recommending olap queries [C]. Proceedings of the20thInternational Conference on Database and Expert SystemsApplications2009:501-515.
    [96] Y. Z. Chen. Research on recommendation of olap queries based onclustering [J]. Computer Engineering and Design,2010,31(15):3503-3505.
    [97] C. Garcia-Alvarado, Z. Chen, C. Ordonez. Olap-based queryrecommendation [C]. Proceedings of the19th ACM internationalconference on Information and knowledge management2010:1353-1356.
    [98] L. Bellatreche, A. Giacometti, P. Marcel, H. Mouloudi, D. Laurent. Apersonalization framework for olap queries [C]. Proceedings of the8th ACM international workshop on Data warehousing and OLAP2005:9-18.
    [99] A. Giacometti, P. Marcel, E. Negre. A framework for recommendingolap queries [C]. Proceedings of the ACM11th internationalworkshop on Data warehousing and OLAP2008:73-80.
    [100] H. Jerbi, F. Ravat, O. Teste, G. Zurfluh. Preference-basedrecommendations for olap analysis [C]. DaWaK '09Proceedings ofthe11th International Conference on Data Warehousing andKnowledge Discovery2009:467-478.
    [101] M. Golfarelli, S. Rizzi. Expressing olap preferences [C]. SSDBM2009Proceedings of the21st International Conference on Scientificand Statistical Database Management2009:83-91.
    [102] M. Golfarelli, S. Rizzi, P. Biondi. Myolap: An approach to expressand evaluate olap preferences [J]. IEEE Transactions on Knowledgeand Data Engineering,2011,23(7):1050-1064.
    [103] V. Cariou, J. Cubillé, C. Derquenne, S. Goutier, F. Guisnel, H.Klajnmic. Built-in indicators to discover interesting drill paths in acube [C]. Proceedings of the10th international conference on DataWarehousing and Knowledge Discovery2008:33-44.
    [104] C. Sapia. On modeling and predicting query behavior in olapsystems [C]. Proceedings of the International Workshop on Designand Management of Data Warehouses (DMDW'99),1999.
    [105] C. Sapia. Promise: Predicting query behavior to enable predictivecaching strategies for olap systems [C]. DaWaK2000Proceedingsof the Second International Conference on Data Warehousing andKnowledge Discovery,2000:224-233.
    [106] S. Sarawagi. Explaining differences in multidimensional aggregates
    [C]. VLDB '99Proceedings of the25th International Conference onVery Large Data Bases1999:42-53.
    [107] G. Sathe, S. Sarawagi. Intelligent rollups in multidimensional olapdata [C]. Proceedings of the27th International Conference on VeryLarge Data Bases2001:531-540.
    [108] S. Sarawagi, R. Agrawal, N. Megiddo. Discovery-drivenexploration of olap data cubes [C]. Proceedings of the6thInternational Conference on Extending Database Technology:Advances in Database Technology1998:168-182.
    [109] C. Sapia. Promise: Predicting query behavior to enable predictivecaching strategies for olap systems [C]. Proceedings of the SecondInternational Conference on Data Warehousing and KnowledgeDiscovery2000:224-233.
    [110] R. Agrawal, R. Srikant. Fast algorithms for mining association rules[C]. Proc20th Int Conf Very Large Data Bases1994:487-499.
    [111] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, A. I. Verkamo.Fast discovery of association rules [J]. Advances in knowledgediscovery and data mining,1996,12:307-328.
    [112] J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M.Venkatrao, F. Pellow, H. Pirahesh. Data cube: A relationalaggregation operator generalizing group-by, cross-tab, and sub-totals[J]. Data Mining and Knowledge Discovery,1997,1(1):29-53.
    [113] J. Han, M. Kamber, J. Pei. Data mining: Concepts and techniques[M]: Morgan Kaufmann.2011.
    [114] J. Shanmugasundaram, U. Fayyad, P. S. Bradley. Compressed datacubes for olap aggregate query approximation on continuousdimensions [C]. KDD '99Proceedings of the fifth ACM SIGKDDinternational conference on Knowledge discovery and data mining1999:223-232.
    [115] A. Sklar. Fonctions de répartition à n dimensions et leurs marges [J].Publ. Inst. Statist. Univ. Paris,1959,8(1):11.
    [116] K. Aas, D. Berg. Modeling dependence between financial returnsusing pair-copula constructions [J]. Dependence Modeling: VineCopula Handbook:305-328.
    [117] C. R. Bhat, N. Eluru. A copula-based approach to accommodateresidential self-selection effects in travel behavior modeling [J].Transportation Research Part B: Methodological,2009,43(7):749-765.
    [118] B. Renard, M. Lang. Use of a gaussian copula for multivariateextreme value analysis: Some case studies in hydrology [J].Advances in Water Resources,2007,30(4):897-912.
    [119] R. B. Nelsen. An introduction to copulas [M]: Springer Verlag.2006.
    [120] H. Joe. Families of m-variate distributions with given margins andm (m-1)/2bivariate dependence parameters [J]. Lecture Notes-Monograph Series,1996:120-141.
    [121] A. W. Bowman, A. Azzalini. Applied smoothing techniques for dataanalysis: The kernel approach with s-plus illustrations [M]: OxfordUniversity Press, USA.1997.
    [122] M. Rudemo. Empirical choice of histograms and kernel densityestimators [J]. Scandinavian Journal of Statistics,1982:65-78.
    [123] A. W. Bowman. An alternative method of cross-validation for thesmoothing of density estimates [J]. Biometrika,1984,71(2):353-360.
    [124]刘琼芳.基于copula理论的金融时间序列相依性研究[D]:重庆大学博士论文,2010.
    [125] V. Epanechnikov. Nonparametric estimation of a multidimensionalprobability density [J]. Teoriya Veroyatnostei i ee Primeneniya,1969,14(1):156-161.
    [126] A. Mugdadi, I. A. Ahmad. A bandwidth selection for kernel densityestimation of functions of random variables [J]. Computationalstatistics&data analysis,2004,47(1):49-62.
    [127] X. Li, J. Han, H. Gonzalez. High-dimensional olap: A minimalcubing approach [C]. VLDB '04Proceedings of the Thirtiethinternational conference on Very large data bases2004:528-539.
    [128] K. Aas, C. Czado, A. Frigessi, H. Bakken. Pair-copula constructionsof multiple dependence [J]. Insurance: Mathematics and Economics,2009,44(2):182-198.
    [129] A. HEINEN, A. VALDESOGO. Asymmetric capm dependence forlarge dimensions: The canonical vine autoregressive model [J].CORE Discussion Papers,2009.
    [130] A. J. Patton. Estimation of multivariate models for time series ofpossibly different lengths [J]. Journal of Applied Econometrics,2006,21(2):147-173.
    [131] S. Acharya, P. B. Gibbons, V. Poosala, S. Ramaswamy. The aquaapproximate query answering system [J]. ACM Sigmod record,1999,28(2):574-576.
    [132]张丽新.高维数据的特征选择及基于特征选择的集成学习研究[D]:清华大学博士论文,2004.
    [133] I. Guyon, A. Elisseeff. An introduction to variable and featureselection [J]. The Journal of Machine Learning Research,2003,3:1157-1182.
    [134]胡心瀚. Copula方法在投资组合以及金融市场风险管理中的应用[D]:中国科学技术大学博士论文,2011.
    [135]毛勇.基于支持向量机的特征选择方法的研究与应用[D]:浙江大学博士论文,2006.
    [136]徐菲菲,苗夺谦,魏莱,冯琴荣,毕玉升.基于互信息的模糊粗糙集属性约简[J].电子与信息学报,2008(06).
    [137]骆公志,杨习贝,杨晓江.基于限制优势关系的粗糙模糊集及知识约简[J].系统工程与电子技术,2010(08).
    [138] Q. Wang, X. Yin. A nonlinear multi-dimensional variable selectionmethod for high dimensional data: Sparse mave [J]. Computationalstatistics&data analysis,2008,52(9):4512-4520.
    [139] X. Yin, F. Critchley, Q. Wang. Sufficient dimension reduction basedon the hellinger integral: A general, unifying approach [C]. The8thICSA International Conference,2010:189-197.
    [140]潘泓,李晓兵,金立左,夏良正.一种基于二值粒子群优化和支持向量机的目标检测算法[J].电子与信息学报,2011(01):117-121.
    [141] J. Fan, R. Li. Statistical challenges with high dimensionality:Feature selection in knowledge discovery [C]. Proceedings of theInternational Congress of Mathematicians,2006:595-622.
    [142] M. G. Tadesse, N. Sha, M. Vannucci. Bayesian variable selection inclustering high-dimensional data [J]. Journal of the AmericanStatistical Association,2005,100(470):602-617.
    [143] C. Boutsidis, M. W. Mahoney, P. Drineas. Unsupervised featureselection for the k-means clustering problem [J].
    [144] M. Alibeigi, S. Hashemi, A. Hamzeh. Unsupervised featureselection using feature density functions [J]. International Journal ofComputer Science and Engineering,2009,3(3):146-152.
    [145] R. Fraiman, A. Justel, M. Svarc. Selection of variables for clusteranalysis and classification rules [J]. Journal of the AmericanStatistical Association,2008,103(483):1294-1303.
    [146] S. Davies, S. Russell. Np-completeness of searches for smallestpossible feature sets [C]. Proceedings of the AAAI Fall94Symposium on Intelligent Relevance,1994:37-39.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700