用户名: 密码: 验证码:
一种新的改进聚类精确度和稳定性的融合技术
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
数据分组是理解学习中的一个基本和重要的模式,相似的模型被聚集到同一个分组中,不同的模型在不同的分组中。本论文提出了一种基于统计共识的聚类融合算法来提高聚类的准确性和稳定性,算法可以应用于隐私问题数据或大规模不能聚集到一个位置的分布式数据挖掘中。融合方法已广泛应用于监督式学习,并且事实证明,融合方法比单一的预测\学习模式相比,能将预测误差减少到相当高的程度。近些年,人们正在研究非监督的学习(聚类融合),期望获得可喜成果。本文所提出的基于统计共识的聚类融合方法通过四个步骤获得最后的共识聚类结果。
     第一步是利用K-means算法在不同的初始参数下运行多次产生聚类成员。初始参数对K-means算法的影响是比较大的,采用不同的初始参数在同一数据集上得到多种聚类结果。第二步是在产生聚类成员中选择一个最佳聚类。这部分通过基于K- means算法定义的目标函数来实现,这个目标函数可以减小误差并使得类之间的紧凑度和分离度更好。由于缺少标记,误差是能判断聚类分析质量的很好的数学方法。第三步是融合方法,论文采用选择性聚类融合方法,选取一致性聚类并丢弃不一致的聚类。在融合中,利用信息理论(互信息)作为选择一致性聚类的标准,第四步是一致性函数。最后的聚类结果是利用一致性聚类成员使用统计共识函数得到的。
     论文所研究聚类融合算法改善了聚类结果的精确性以及稳定性。由于聚类融合在数据挖掘和机器学习中有很大影响力,将多种聚类模型融合到一种聚类方法中,通常效果会比单一聚类算法好。大多数数据挖掘和知识发现技术是针对建模,而不是在结果的精确度上。但是对于复杂商业智能系统,确实需要更多关注聚类精确度,而不是聚类建模。任何商务智能系统都需要一个高质量聚类作为其核心,在大多数情况下它涉及到大量数据,并且数据有时可能在分布式环境下。问题在于,现有的经典聚类算法并不稳定,它们的不稳定导致在不准确的聚类结果,同时因为经典聚类算法假设数据是在单一的位置上,所以这些算法并不适合数据不能合并到单一位置的分布式数据环境。
     本文提出的新聚类融合算法除了提高稳定性和聚类结果精确度外,它还可以用于分布式数据的聚类。分布式数据挖掘是数据挖掘的有趣的方面之一,尤其是当数据集因存储(通常数据挖掘涉及到大量的数据)或隐私性等原因,不能合并到一个位置。单一的经典聚类算法是不能处理这些情况的。我们的方法使用许多模式和聚类中心表示聚类,这使得我们的算法独特于现有的使用类标签标识每个模式或数据点的聚类融合方法。用聚类中心和大量模式表示的聚类,直接解决了标签对应问题,而不用像现有的大多数算法引用额外的技术。这种方法也节省了时间与空间,共识函数只需要聚类中心和数据点数量这些信息,它远远小于数据集中的实际数据点数量,这使得我们的算法适用于处理并行或分布式环境中的大量数据。实验结果表明,本文提出的聚类融合算法与k-means经典聚类算反比较,算法的精度性和稳定性更好。
     论文章节安排如下:第一章介绍了数据挖掘和知识发现涉及的技术理念以及其应用。第二章着重于聚类和聚类融合,并对现有融合算法及技术进行了综述,第三章是提出的新聚类融合算法。第四章是实验与评估,第五章是结论。最后是感谢,参考文献和附录。
Organizing Data into sensible grouping is one of the most fundamental and crucial modes of understanding learning where by similar patterns are grouped together in one group and those not similar into another groups. This thesis presents a Statistical Consensus method for Cluster ensemble to improve clustering accuracy and stability and in some situations it can serve in distributed data mining in either case whether privacy issues or massive volume of data that cannot be pooled into a single location for processing or both. Ensemble methods have been popularly in supervised learning whereby it proved to reduce predictive errors to a considerable high margin compared to a single best classical predictor/learning model. Likewise recently intensive researches are in progress working on unsupervised learning (cluster ensembles) in which promising results are attained. Our proposed Clustering Ensemble technique comprises of four sub-parts on the way through the process in attaining final consensus clustering result.
     The first part of our approach is generating the partitions in which K-means Clustering algorithm is employed with different initiations and run the algorithm several times. K-means algorithm is sensitive to initial parameters whereby different initialization leads to diverse clustering results from same dataset. The second part is the selection of a single best clustering among generated partitions. This is achieved by employing the objective function from the k-means clustering algorithm which is regarded as an error in this case. The k-means clustering states that, the minimum this error implies compact and well separated clusters which is the essence to clustering. Due to the lack of labels, the error is the only well proved mathematical measure in clustering quality analysis. The third part of the consensus method is the selection of consistent clusterings in which the inconsistency partitions or clustering are filtered out from the ensemble, only the consistency partitions/clustering are included in the consensus process. Here the information theory; Mutual Information (MI) is employed as criteria in selecting consistent partitions and the fourth part of our method is the consensus function. The final clustering result is achieved by fusing/combining the consistency partitions in the ensemble with a Statistical Consensus function.
     Our research is based on Cluster Ensembles in which the concentration and focus is at improving the Accuracy of clustering result including Stability. Due to being the most influential development technique in Data Mining and Machine Learning, ensemble techniques combine multiple models into one which is usually more quality than the best of its components. Most Data Mining and Knowledge Discovery techniques emphasize more effort on model building rather than accuracy for stance in marketing, intrusion detections in networks and alike. But on contrary complex business intelligent systems like auditing, fraud detections including criminal detections do require much attention on clustering accuracy rather than the model. Any business intelligent system needs a quality clustering as its core technique and in most cases it involves huge volumes of data which sometimes can be in distributed environment. The issue is that, available classical clustering algorithms are not stable. Their instability leads to inaccurate clustering results at the same time they are not suitable for applications in distributed data environments whereby data cannot be pooled into a single location for processing because classical clustering techniques assumes that the data is located at single location.
     The proposed new Consensus approach for Cluster Ensembles apart from producing stable and accurate clustering results, it offers capability for clustering distributed data sets. Distributed data mining is one of the interesting aspects in data mining especially when dataset cannot be pooled into a single location due to either storage issues (always data mining involve massive data) or privacy reasons. As prescribed before from previous paragraph, in these kinds of situations single classic clustering technique cannot handle. Our Consensus method uses number of patterns and cluster centers as representation for the clusters; this makes our technique to be unique from existing cluster ensemble methods in which they tend to use labels from each pattern or data point. The representation of clusters with cluster centers and its number of patterns resolves the label correspondence problem directly without introducing additional technique which is always the case in most existing cluster ensemble methods. This technique also serves time and storage since the only information needed for the consensus is the cluster's centers with their number of data points which is always much less that the actual number of data points in the dataset, this makes our consensus suitable for clustering massive volume of data in parallel and or distributed environment. Experiment results from real datasets shows that, our New Consensus clustering produces improved result in terms of accuracy and stability compared to its components which is classical k-means clustering algorithm in this case and can deal with massive volume of data in distributed environment.
     The Thesis is presented as follows:The first chapter introduces the idea of Data Mining and Knowledge Discovery, techniques involved, applications including the challenges involves with a literature review. The second chapter focuses on the Ensembles and Clustering Ensembles reviewing the technique and existing ensemble methods citing their strengths and weaknesses, while the third chapter is presents the proposed Ensemble technique. The fourth chapter is about the Experiments undertaken and the Evaluation of results, with the fifth chapter giving the conclusion. The last three sections are about the Autography, the Appendices and the Acknowledgements respectively.
引文
[1]Ping Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining. Addison Wesley, Boston, USA,2005.
    [2]Krzysztof. C, Witold. P, Roman. S. Data Mining Methods for Knowledge Discovery. Kluwer Academic Publishers,1998.
    [3]Anil Jain. Data Clustering 50 years beyond k-means. The Nineteenth International Conference on Pattern Recognition, (ICPR) Tampa, FL, USA,2008.
    [4]Anil K. Jain and Dubes C. Richard. Clustering Techniques. (The User's dilemma) Pattern Recognition Pergamon Press, Great Britain,1976, Volume 8, pp.247-260.
    [5]Alexander Strehl and Joydeep Ghosh. Cluster Ensemble-a Knowledge Reuse Framework for Combining Multiple Partitions. Journal on Machine Learning Research (JMLR), vol 3,2002, pp 583-617.
    [6]A. Topchy, A.K. Jain and W. Punch. Combining Multiple weak Clusterings. In ICDM 2003:Proc. of the 3rd IEEE Inter. Conference on Data Mining" pg 331.
    [7]A. Fred and A.K. Jain. Data Clustering using Evidence accumulation. In Proc. of the 16th Inter. Conf. on Pattern Recognition (ICPR),2002, pp276-286.
    [8]A. Topchy, A. Jain and W. Punch. Mixture Model for Clustering Ensembles. In the Proceeding of the 4th SIAM Conference on Data Mining,2004, pp.379-390.
    [9]J. Ghosh, A. Strehl, and S. Merugu. A Consensus Framework for Integrating Distributed clusterings under limited knowledge sharing. In Proceedings of NSF Workshop on Next Generation Data Mining,2002, pp.99-108.
    [10]B. Minaei-Bidgoli, A. Topchy, and W. Punch. Ensembles of Partitions via Data Resampling. Proceedings of IEEE Intl. Conference on Information Technology: Coding and Computing, vol.2, pp.188-192,2004.
    [11]D. Opitz and R. Maclin. Popular ensemble methods:An empirical study. Journal of Artificial Intelligence Research, vol.11, pp.169-198,1999.
    [12]R. Tibshirani, G. Walther and T. Hastie. Estimating the number of clusters in a data set via the gap statistic. J.R. Statist. Soc.B (2001)63, Part2, pp.411-423.
    [13]S. Merugu, J. Ghosh. Privacy-preserving distributed clustering using generative models. In Proceedings of the Third IEEE International Conference on Data Mining (ICDM),2003, pp.211-218.
    [14]A. M. Kreiger and P. Green. A Generalized Rand-index method for consensus clustering of separate partitions of the same data base. Journal of Classification, vol.16, pp.63-89,1999.
    [15]X. Fern and C. Brodley. Solving cluster ensemble problems by bipartite graph partitioning. In ICML, pp.281-288,2004.
    [16]Gan, Guojun, Chaoqun Ma, and Jianhong Wu. Data Clustering:Theory, Algorithms, and Applications. ASA-SIAM Series on Statistics and Applied Probability, SIAM, Philadelphia, ASA, Alexandria, VA,2007.
    [17]David Taniar. Data Mining and Knowledge Discovery Technologies. IGI Publishing, Hershey New York,2008.
    [18]Anil K. Jain and Richard C. Dubes. Algorithms for Clustering Data. Printice Hall, Englewood Cliffs, New Jersey 07635,1988.
    [19]Kurt Hornk. A CLUE for Cluster Ensemble. Journal of Statistical Software,2007.
    [20]Alexander P. Topchy, Martin H. C. Law, Anil K. Jain, Ana L. Fred. Analysis of Consensus Partition in Cluster Ensemble. In Proceedings of the fourth IEEE International Conference on Data Mining (ICDM) 2004, pp 225-232.
    [21]Ulrie von Luxburg, Sebastien Bubeck, Stefanie Jegelka and Michael Kaufmann. Consistent Minimization of Clustering Objective Functions. Neural Information Processing Systems (NIPS),2007.
    [22]Vikas Singh, Lopamudra Mukherjee, Jiming Peng, Jinhui Xu. Ensemble Clustering Using Semidefinite Programming. NIPS,2007.
    [23]Kagan Turner and Adrian K. Agogino. Ensemble Clustering with Voting Active Clusters. Pattern Recognition Letters 29 (2008), pp 1947-1953.
    [24]Marina Meila. Comparing Clusterings-An Automatic View. In Proceeding of 22nd International Conference on Machine Learning, Born, German,2005.
    [25]Carlotta Domeniconi and Muna Al-Razgan. Weighted Cluster Ensembles: Methods and Analysis. ACM Transactions on Knowledge Discovery from Data, Vol.2, No.4, Article 17,2009.
    [26]A.K. Jain, M.N. Murty and P.J. Flynn. Data Clustering; A Review. ACM Computing Surveys, Vol.31, No.3,2009.
    [27]Jiawei Han and Micheline Kamber. Data Mining:Concepts and Techniques. Morgan Kaufmann,2000.
    [28]Witold Pedrycz. Knowledge Based Clustering-From Data to Information Granules. John Wiley & Sons, Inc., Hoboken, New Jersey,2005.
    [29]Eyke Hullermeier and Maria Rifqi. A Fuzzy Variant of the Rand Index for Comparing Clustering Structures. IFSA-EUSFLAT 2009. pp 1294-1298.
    [30]Dimitrios Frossyniotis, Minas Pertselakis and Andreas Stafylopatis. A Multi-clustering Fusion Algorithm. SETN 2002, LNAI 2308, pp.225-235, Springer-Verlag Berlin Heidelberg 2002.
    [31]Ana L.N. Fred, Anil K. Jain. Evidence Accumulation Clustering Based on the K-means Algorithm. In the Proceedings of the Joint IAPR International Workshop on Structural, Syntactic and Statistical Pattern Recognition, Springer-Verlag London, Uk,2002.
    [32]M. Mutta, A. Kakoti Mahanta, Arun K. Pujari. QROCK:A Quick version of the ROCK Algorithm for Clustering of Categorical data. Pattern Recognition Letters 26 (2005), pp 2364-2373.
    [33]Ana L.N. Fred and Anil K. Jain. Robust Data Clustering. cvpr, vol.2, pp.128, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'03)-Volume 2,2003.
    [34]Christopher M. Bishop. Pattern Recognition and Machine Learning. Information Science and statistics, Springer Science and Business Media, LLC,233 Spring Street, New York, NY 10013, USA 2006.
    [35]Alexander Topchy, Anil K. Jain and William Punch. Clustering Ensembles: Models of Consensus and Weak Partitions. IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol.27, No.12 (2005), pp.1866-1881.
    [36]Kunal Punera and Joydeep Ghosh. Consensus Based Ensembles of Soft Clusterings. In Proceedings of MALTA,2007, pp.3-9.
    [37]Erik L. Johnson, Hillol Kargupta. Collective, Hierarchical Clustering from Distributed, Heterogeneous Data. In Proceedings of Large-Scale Parallel Data Mining,1999, pp.221-244.
    [38]Sudipto Guha, Rajeev Rastogi and Kyuseok Shim. CURE:An Efficient Clustering Algorithm for Large Databases. ACM'1998.
    [39]A. Strehl and J. Ghosh. Relationship-Based Clustering and Visualization for High-Dimensional Data Mining. Informs Journal of Computing,2002, pp.1-23.
    [40]Tian Zhang, Raghu Ramakrishnan and Miron Livny. BITCH:An Efficient Data Clustering Method for Very Large Databases. SGMOD'96 Montreal, Canada, (ACM 1996).
    [41]Alexander Strehl, Joydeep Ghosh, and Raymond Mooney. Impact of similarity Measures on Web-page Clustering. AAAI-2000:Workshop of Artificial Intelligence for Web Search, July 2000, pp.58-64.
    [42]Ana L.N Fred, Anil K. Jain. Learning Pairwise Similarity for Data Clustering. Proceedings of 18th International Conference on Pattern Recognition (ICPR), Vol. 1, pp.925-928, Hong Kong, August 3-26,2006.
    [43]Mert R. Sabuncu and Peter Ramadge. Using Spanning Graphs for Efficient Image Recognition. IEEE Transaction on image Processing, Vol.17, No.5, May 2008.
    [44]Ron Bekkerman, Martin Scholz, and Krishnamurthy Viswanathan. Improving Clustering Stability with Combinatorial MRFs. In KDD'09 Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM New York, NY, USA,2009, pp.99-108.
    [45]Yoav Freund, and Robert E. Schapire. Experiments with a New Boosting Algorithm. In the Proceedings of the Thirteenth International Conference on Machine Learning,1996.
    [46]Matthew N. Murray. Sales Tax Compliance and Audit Selection. National Tax Journal, Vol.48 no.4, Dec 1995, pp 515-530
    [47]James Alm, Calvin Blackwell, and Michael McKee. Audit Selection and Firm Compliance with a Broad-based Sales Tax. Working paper, Georgia State University,2005, pp 6-42.
    [48]Michael G. Allingham and Agnar Sandmo. Income Tax Evasion:A Theoretical Analysis. Journal of Public Economics 1, North-Holland Publishing Company, 1972, pp.323-338.
    [49]F. Bonchi, F. Giannotti, G. Mainetto, and D. Pedreschi. A Classification-Based Methodology for Planning Audit Strategies in Fraud Detection. KDD-99, San Diago, CA USA, ACM 1999, pp.175-184.
    [50]Clifton Phua, Damminda Alahakoon, and Vicent Lee. Minority Report in Fraud Detection:Classification of Skewed Data. ACM SIGKDD Explorations Newsletters, Volume.6,2004, pp.50-59.
    [51]R. Deepa Lakshimi, and N. Radha. Machine Learning Approach for Taxation Analysis Using Classification Techniques. International Journal of Computer Applications, Volume.12, no.10,2011, pp.1-6
    [52]Hua S, Hong Z, Gui-Ran C. Applying Data Mining to Detect Fraud Behavior in Customs Declarations. In the Proceedings of the International Conference on Machine Learning and Cybernetics, Beijing, Nov 2002, pp.1241-1244.
    [53]Keinosuke Fukunaga. (1990). Introduction to Statistical Pattern recognition. (Second Edition), San Francisco USA:Morgan Kaufmann.
    [54]En.wikipedia.org/wiki/information theory
    [55]Claude E. Shannon. A Mathematical theory of Communication. University of Illinois Press,1949, USA.
    [56]Tomas M. Cover and Joy A. Thomas. Elements of Information Theory. John Wiley & Sons, Inc.1991. New York USA.
    [57]Alexander Strehl, Joydeep Ghosh, and Raymond Mooney. Impact of Similarity Measures on Web-page Clustering. In Proceedings of AAAI Workshop on Artificial Intelligence for Web Search (AAAI 2000), Austin, pp 58-64. AAAI, July 2000.
    [58]Alexander Strehl and Joydeep Ghosh, Cluster Ensembles-a Knowledge Reuse framework for combining partitionings. In Proceedings of AAAI 2002, Edmonton, Canada, pp 93-98. AAAI, July 2002a.
    [59]Xingquan Zhu, and Ian Davidson, Knowledge Discovery and Data Mining. (Challenges and Reality). Information Science Reference, Hershey. New York. 2007.
    [60]Fosca Giannotti, and Dino Pedreschi, Mobility, Data Mining and Privacy. (Geographic Knowledge Discovery.2008, Springer-Varlag Berlin Heidelberg.
    [61]Evangelos Triantaphyllou, Data Mining and Knowledge Discovery Via Logic-Based Methods. (Theory, Algorithms and Applications). Springer Science and Business Media, LLC,233 Spring Street, New York, NY 10013, USA 2010.
    [62]Mohamed Medhat Gaber, Scientific Data Mining and Knowledge Discovery. (Principles and Foundations). Springer-Verlag Berlin Heidelberg.2010.
    [63]Tsau Y.L, Setsuo O, Churn-Jung L, Xiaohua H, and Shusaku T. Foundations of Data Mining and Knowledge Discovery. Springer-Verlag Berlin Heidelberg,2005.
    [64]Valery A. Petrushin, and Latifur Khan. Multimedia Data Mining and Knowledge Discovery, Springer-Verlag London Limited.2007.
    [65]Sugato Basu, Ian Davidson, Kiri L. Wagstaff. Constrained Clustering:Advances in Algorithms, Theory and Applications. Taylor & Francis Group, LLC.2009.
    [66]Rui Xu, and Donald C. Wunsch II, Clustering (Data Mining), John Wiley & Sons, Inc., Hoboken, New Jersey.2009.
    [67]Jacob Kogan. Introduction to Clustering Large High-Dimensional Data. Cambridge University Press, New York. United States of America.2007.
    [68]S. Sumathi, and S.n. Sivanandam, Introduction to Data Mining and its Applications. Springer Berlin Heidelberg New York.2006.
    [69]David Hand, Heikki Mannila, and Padhraic Smyth. Principles of Data Mining, MIT Press, Cambridge Massachusetts, London, England.2001.
    [70]WP6, "D6.7c:Forensic Profiling", Future of Identity in the Information Society (FIDIS).2008.
    [71]Oscar H. Gandy, Jr. Herbert I. Schiller, "Data Mining and Surveillance in the post-9.11 Environment". Political Economy Section, International Association for Media and Communication Research (IAMCR), Barcelona, July,2002.
    [72]Paul B, Mark G. Integrative Data Mining:The New Direction in Bioinformatics. (Machine Learning for Analyzing Genomes-Wide Expression Profiles and Proteomics DataSet). IEEE Engineering in Medicine and Biology,2001. pp 33-40.
    [73]Indranil Bose, Radha K. Mahapatra. Business Data Mining-a Machine Learning Perspective, Elsevier. Information Science & management 39 (2001). pp 211-225.
    [74]Benny Pinkas. "Cryptographic Techniques for Privacy-preserving Data Mining", SIGKDD Exploration,2003. Volume 4, pp.12-19.
    [75]Chris Clifton. "Privacy, Security, and Data Mining. (How do we mine data when we can't even look at it?)", ECML/PKDD-2002.
    [76]Alexandre Evfimievski, and Tyrone Grandison. Privacy-Preserving Data Mining. IGI Global,2009. pp 1-8.
    [77]Charu C. Aggarwal, and Philip S. Yu. Privacy-Preserving Data Mining:Models and Algorithms. Kluwer Academic Publishers. Boston, USA.2008.
    [78]Justin Brickell and Vitaly Shmatikov, "The Cost of privacy:Destruction of Data Mining Utility in Anonymized Data Publishing," ACM KDD'08, August 24-27, 2008, Las Vegas, Nevada, United States of America.
    [79]Murat Kantarcioglu, Jiashun Jin, Chris Clifton. When do Data Mining Results Violet Privacy? ACM KDD'04,2004, Seattle, Washington, USA. pp.599-604.
    [80]Chris Clifton, Murat Kantarcioglu, Jaideep Vaidya, and Xiaodong Lin, Michael Y. Zhu, "Tools for Privacy Preserving Distributed Data Mining," ACM, SIGKDD Explorations. Volume 4(2),2002. pp.1-7.
    [81]Alexandre Evfimievski. Randomization in Privacy Preserving Data Mining. ACM SIGKDD Explorations. Volume 4(2),2002. pp 43-48.
    [82]Yehuda Lindell and Benny Pinkas. Privacy Preserving Data Mining. In the Journal of Cryptology, volume 15(3),2002, pp 177-206.
    [83]Gregory Piatetsky-Shapiro, Chabane Djeraba, et al. What Are The Grand Challenges for Data Mining? KDD-2006 Panel Report. ACM SIGKDD Explorations. Volume 8(2),2006. pp.70-77.
    [84]Bradley Malin and Latanya Sweeney. Determining the Identifiability of DNA Database Entries. Journal of American Medical Information Association, AMIA, 2000.pp 537-541.
    [85]F. Hoppner, F. Klawonn, R. Kruse, and T. Runkler. Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition, John Wiley & Sons, Ltd, Chichester, England,2000.
    [86]Joe H. Ward, Jr. Hierarchical Grouping to Optimize an Objective Function, Journal of American Statistical Association, SAM. Volume 58, issue 301,1963. pp 236-244.
    [87]Lars Elden, Matrix Methods in Data Mining and Pattern Recognition, Society of Industrial Applied Mathematics, SIAM,2007, Philadelphia, USA.
    [88]David J.C. MacKay. Information Theory, Inference, and Learning Algorithms, Cambridge University Press,2003, UK.
    [89]Richard W. Hamming, Coding and Information Theory. (Second Edition),1986, Prentice-Hall, Englewood Cliffs, New Jersey, United States of America.
    [90]Sushmita Mitra and Tinku Acharya. Data Mining. (Multimedia, Soft Computing, and Bioinformatics).2003, John Wiley & Sons, Inc. Hoboken, New Jersey, USA.
    [91]Sandhya Samarasinghe. Neural Networks for Applied Sciences and Engineering. (From Fundamentals to Complex Pattern Recognition). Auerbach Publications, Taylor & Francis Group,2007, Boca Raton, New York.
    [92]Michael J.A. Berry and Gordon S. Linoff. Data Mining Techniques For Marketing, Sales, and Customer Relationship Management. (Second Edition) Wiley Publishing, Inc., Indianapolis, Indiana,2004, USA.
    [93]Christopher Westphal. Data Mining for Intelligence, Fraud, and criminal Detection; Advanced Analysis and Information Sharing Technologies. CSR Press, Taylor & Francis Group Boca Raton, New York,2009.
    [94]Tom M. Mitchell. Machine Learning, McGraw-Hill, Science/Engineering/Math; March 1997.
    [95]S. Dudoit, J. Fridlyand. Bagging to Improve the Accuracy of a Clustering Procedure, Bioinformatics. (19).9, pp 1090-1099. Oxford University Press 2003.
    [96]Giovanni Seni and John Elder. Ensemble Methods in Data Mining:Improving Accuracy through Combining Predictions. (Synthesis Lectures on Data Mining & Knowledge Discovery). Morgan & Claypool Publishers,2010
    [97]Sergios Theodoridis and Konstantinos Koutroumbas. An Introduction to Pattern Recognition:A Mat Lab Approach, Academic Press, Elsevier Inc.2010, USA.
    [98]Ian H. Witten and Elbe Frank. Data Mining:Practical Machine Learning Tools and Techniques, Second Edition. Morgan Kaufmann Series in data Management Systems, June 2005.
    [99]Laurence Hubert and Phipps Arabie, Comparing Partitions. Journal of Classification. Vol 2:pp 193-218 (1985). Springer-Verlag New York Inc.
    [100]Sergios Thodoridis and Konstantinos Koutroumbas. Pattern Recognition, Fourth Edition. Academy Press, Elsevier Inc,2009, Burlington, MA 01803, USA.
    [101]Roelof K. Brouwer. Extending the Rand, Adjusted Rand and Jaccard Indices to Fuzzy Partitions, Journal of Intelligent Information Systems (2009). Vol.32, pp 213-235
    [102]Jorge M. Santos and Mark Embrechts. On the Use of the Adjusted Rand Index as a Metric for Evaluating Supervised Classification. In Proceedings of the 19th International Conference on Artificial Neural Networks (ICANN 2009), Springer-Verlag Berlin, Heidelberg,2009, pp 175-184.
    [103]Ian H. Witten, Elbe Frank, and Mark A. Hall. Data Mining:Practical Machine Learning Tools and Techniques, Third Edition. Morgan Kaufmann, Elsevier Inc. 2011, Burlington, MA 01803, USA.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700