可适应不良数据的数据分类若干方法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

可适应不良数据的数据分类若干方法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Several Methods of Data Classification in Adapting to Bad Data
作者：李俊林
论文级别：博士
学科专业名称：计算机软件与理论（专业学位）
中文关键词：数据分类 ; 数据聚类 ; 模式识别 ; 分子动力学 ; 不良数据
英文关键词：data classification ; data clustering ; pattern recognition ; molecular
英文关键词：kinetic theory ; bad data
学位年度：2012
导师：符红光
学科代码：081202
学位授予单位：电子科技大学
论文提交日期：2012-04-01

摘要

在对数据进行分类时,数据本身所具有的某些不良特点,如噪声影响、簇间密度变差显著、类间不平衡和特征维方差各不相同等问题都会影响分类效果。因此,研究适应不良数据特点的分类算法,具有重要的理论和应用价值。目前,虽然有DBSCAN, Trimmed k-means等算法也能处理一些不良特点的数据,但是渴求一种通用的处理所有不良类型数据的分类算法是不现实的,根据数据的特点研究一些有针对性的抗干扰算法逐渐形成一种共识。
     本文受分子动力学原理的启发,在数据点之间引入引力和斥力的相互作用机制,并结合数据点在原始特征空间和迭代空间的距离、簇密度差和近邻性等信息,提出了仿分子动力学数据聚类法；同样考虑近邻性和特征维方差等因素提出了椭球-平面分类法,并改进了基于核密度估计的数据分类算法。新的聚类方法除是对带噪声、簇间密度变差明显的数据具有好的适应力外,不需要预先设置簇个数,可自动发现数据中可能包含的簇,并解决了引力模型中的黑洞问题。
     基于核密度估计的数据分类法是实际应用中的常见分类法,它在处理不平衡类时可能出现诸如少数类的数据点错分到多数类的问题。为了使该方法可处理不平衡类带来的影响,并在不平衡类问题严重时也能发挥好的效果,本文对其进行了改进,在基于核密度估计的数据分类法中引入具有较小搜索区间的平滑因子,增强了其对不平衡类的适应力。实验表明这种改进是有效的,它提高了原方法对不平衡类的适应力。
     事实上,像基于核密度估计一类的分类方法在预测阶段由于可能涉及整个样本集的计算,当数据集规模较大时其预测开销可能会很大。为了达到减小预测开销,同时又使模型兼备包含数据在特征维上方差信息的特点,本文提出了一种新的椭球-平面分类法,它是一个两阶段的监督型分类方法。该方法利用椭球面和平面分类参考面进行分类,由于分类时待测点只需与相应的参考面进行计算,使其时间开销小于基于距离的k最近邻点方法和基于核密度估计一类的方法,并且强化了邻近性原则。
     以上算法除理论分析外,基于标准数据集都与其他现有方法进行了对比试验,确认了理论推导的正确性,为不良数据分类提供了新的有价值的探索研究。
Some characteristics of data can have negative effects on data categorization, such as noise pollution, density variance between clusters, class imbalance, different variances on different dimensions and so on. Therefore, the research of classification approaches that can be adaptable to bad data is importantly valuable in theory and practice. Although the present classification approaches such as DBSCAN and Trimmed k-means can deal with bad data of some characteristics, the eagerness for a general approach that is adaptable to all kinds of bad data is unrealistic. So, the research on anti-jamming approaches pertinent to data characteristics has become a common view.
     Inspired from molecular kinetic theory and concerning the information on neighbourhood,cluster density variance and the distance in original and iterating spaces, Molecular dynamics-like Data clustering Approach is proposed in this dissertation; Similarly considering neighbourhood information and (or) feature variance, Ellipsoid-plane classification Approach is designed, and KDE-based classification approach is improved in this paper. Besides being adaptable to noise and great density variance between clusters, the new clustering approach is able to automatically find possible clusters without presetting cluster number. This approach has solved "Black Hole" problem encountered by gravitational model.
     KDE-based data classification algorithm is one of the classification approaches widely used in different applications. Dealing with class imbalance data, it has the problem of misclassifying data of minority class into majority class. In order to enable this approach to cope with class imbalance data, and to be effective even when class imbalance problem is acute, this paper propose an improvement that is to add a small-searching-interval smooth factor into this approach. Experiment results showed the effectiveness of the improvement.
     In the phase of class prediction, classification methods like the KDE-based approach can be involved in computing the whole data, so that computation cost in this phase is rather high. In order to reduce prediction cost and to make classification model embrace variance information on feature dimension, a new Ellipsoid-plane classification approach is proposed in this paper. It is a two-stage supervised method, which uses elliptic surface and plane as reference surfaces for classification. Because the computation in classifying phase only involves testing point and reference surfaces, the computation cost in this phase is less than the distance-based k-nn method and the KDE-based approach. Moreover, ellipsoid-plane classification approach also strengthens neighbourhood principle.
     Besides theoretical analysis, the approaches mentioned above are also compared with other present methods in experiments, which has confirmed rightness of the theoretical derivations, and provides a new and valuable exploration in bad data classification.

引文

[1]Jie Tian, Shan-hua Xue, Hai-ning Huang,et al. Classification of underwater still objects based on multi-field features and SVM. Journal of Marine Science and Application,2007, Vol.1: 36-40
    [2]Erik McDermott, Timothy J. Hazen, Jonathan Le Roux,et al. Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error. IEEE Transactions on Audio, Speech, and Language Processing, Vol.15(1):203-223
    [3]Schierholt, K., Dagli, C.H. Stock market prediction using different neural network classification architectures. the IEEE/IAFE 1996 Conference on Computational Intelligence for Financial Engineering:72-78
    [4]Hakan Stille, Arild Palmstrom. Classification as a tool in rock engineering. Tunnelling and Underground Space Technology,2003,Vol.18(4):331-345
    [5]Daniel G. Brown, David P. Lusch, Kenneth A. Duda. Supervised classification of types of glaciated landscapes using digital elevation data. Geomorphology,1998, Vol.21(3-4):233-250
    [6]Ruili Lang, Guofan Shao, Bryan C. Pijanowski,et al. Optimizing unsupervised classifications of remotely sensed imagery with a data-assisted labeling approach. Computers & Geosciences, 2008,Vol.34(12):1877-1885
    [7]Pang-Ning Tan,M. Steinbach,V. Kumar, Introduction to Data Mining, Addison-Wesley Longman Publishing Co., Inc.,2005
    [8]Pemmaraju, S., Mitra, S. Identification of noise outliers in clustering by a fuzzy neural network.1993 Second IEEE International Conference on Fuzzy Systems, Vol.2:1269-1274
    [9]Kollios G., Gunopulos D., Koudas N.,et al. Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Transactions on Knowledge and Data Engineering, Vol.15(5):1170-1187
    [10]Dingxi Qiu. A comparative study of the K-means algorithm and the normal mixture model for clustering:Bivariate homoscedastic case. Journal of Statistical Planning and Inference,2010, Vol.140(7):1701-1711
    [11]J. A. Cuesta-albertos, A. Gordaliza and C. Matran. Trimmed k-means:an attempt to robustify quantizers. The Annals of Statistics,1997, Vol.25(2):553-576
    [12]A.H. Pilevar, M. Sukumar. GCHL:A grid-clustering algorithm for high-dimensional very large spatial data bases. Pattern Recognition Letters,2005,Vol.26(7):999-1010
    [13]Eden W.M. Ma, Tommy W. S. Chow. A new shifting grid clustering algorithm. Pattern Recognition,2004, Vol.37(3):503-514
    [14]P. Viswanath, V. Suresh Babu. Rough-DBSCAN:A fast hybrid density based clustering method for large data sets. Pattern Recognition Letters,2009, Vol.30(16):1477-1488
    [15]Hinneburg A, Keim DA. An efficient approach to clustering in large multimedia databases with noise, the 4th International Conference on Knowledge Discovery and Data mining, 1998:58-65.
    [16]Karypis G., Eui-Hong Han, Kumar V. Chameleon:hierarchical clustering using dynamic modeling. IEEE Computer,1999, Vol.32(8):68-75
    [17]Sudipto Guha, Rajeev Rastogi, Kyuseok Shim. Cure:an efficient clustering algorithm for large databases. Vol.26(1):35-58
    [18]周晓飞,杨静宇,姜文瀚.核最近邻凸包分类算法.中国图象图形学报,2007,12(7)：1209-1213
    [19]文传军,詹永照,陈长军.最大间隔最小体积球形支持向量机.控制与决策,2010,25(1)：79-83
    [20]Hans H. Bock. Probabilistic models in cluster analysis. Computational Statistics & Data Analysis,1996,Vol.23 (1):5-28
    [21]蒋盛益,李庆华.一种基于引力的聚类方法.计算机应用,2005,25(2)：286—300
    [22]Lizhi Peng, Bo Yang, Yuehui Chen,et al. Data gravitation based classification. Information Sciences,2009,Vol.179(6):809-819
    [23]M. Steinbach,P.-N. Tan,V. Kumar,et al. Discovery of Climate Indices Using Clustering.2003 the ninth ACM SIGKDD International Conference on knowledge discovery and data mining:446-455
    [24]孙即徉.现代模式识别.北京：高等教育出版社,2008,56-59
    [25]E. Krusinska, J. Liebhart. Objective evaluation of degree of illness with the weighted Mahalanobis distance. A study for patients suffering from chronic obturative lung disease. Computers in Biology and Medicine,1987, Vol.17(5):321-329
    [26]R. De Maesschalck, D. Jouan-Rimbaud, D. L. Massart. The Mahalanobis distance. Chemometrics and Intelligent Laboratory Systems,2000, Vol.50(1):1-18
    [27]Kaizhu Huang, Haiqin Yang, Irwin King, et al. Maxi-Min Margin Machine:Learning Large Margin. IEEE Transactions on Neural Networks,2008, Vol.19(2):260-272
    [28]Sergios Theodoridis,Konstantinos Koutroumbas. Pattern Recognition. Netherland:Elsevier Science,2003.
    [29]Q. Zhu, Y. Cai, L. Liu. A multiple hyper-ellipsoidal subclass model for an evolutionary classifier. Pattern Recognition,2001, Vol.34:547-560
    [30]Yao Leehter, Chin-chin Lin.Learning of class membership values by ellipsoidal decision regions.2004 International Conference on Computational Intelligence:60-64
    [31]J. Burez, D. Van den Poel. Handling class imbalance in customer churn prediction. Expert Systems with Applications,2009, Vol.36(3):4626-4636
    [32]R. Barandela, J. S. Sanchez, V. Garcia, E. Rangel. Strategies for learning in class imbalance problems. Pattern Recognition,2003, Vol.36(3):849-851
    [33]Wasikowski, M. Xue-wen Chen. Combating the Small Sample Class Imbalance Problem Using Feature Selection. IEEE Transactions on Knowledge and Data Engineering,2010, Vol.22(10):1388-1400
    [34]Hans H. Bock. Probabilistic models in cluster analysis. Computational Statistics & Data Analysis,1996, Vol.23(1):5-28
    [35]Topon Kumar Paul, Hitoshi Iba. Gene selection for classification of cancers using probabilistic model building genetic algorithm. Biosystems,2005, Vol.82(3):208-225
    [36]Barrho J., Adam M., Kiencke U. Finger Localization and Classification in Images based on Generalized Hough Transform and Probabilistic Models.2006 9th International Conference on Control, Automation, Robotics and Vision:1-6
    [37]Mansjur D.S., Juang, B.H. Improving Kernel Density Classifier Using Corrective Bandwidth Learning with Smooth Error Loss Function.2008 Seventh International Conference on Machine Learning and Applications:161-167
    [38]Paul D. McNicholas. Model-based classification using latent Gaussian mixture models. Journal of Statistical Planning and Inference,2010,Vol.140(5):1175-1181
    [39]Zadrozny B., Langford J., Abe N. Cost-sensitive learning by cost-proportionate example weighting.2003 Third IEEE International Conference on Data Mining:435-442
    [40]W. Fan, S.J.Stolfo, J.zhang,et al. AdaCost:Misclassification Cost Sensitive Boosting.1999 the 16th International Conference on Machine Learning:97-105
    [41]Van Hulse J., Khoshgoftaar T.M., Napolitano A. An empirical comparison of repetitive undersampling techniques.2009 IEEE International Conference on Information Reuse & Integration:29-34
    [42]M. K Sridharan. Subband adaptive filtering:Oversampling approach. Signal Processing,1998, Vol.71(1):101-104
    [43]Xiaoming Chen, Wanquan Liu, Huining Qiu, et al. APSCAN:A parameter free algorithm for clustering. Pattern Recognition Letters,2011, Vol.32(7):973-986
    [44]Derya Birant, Alp Kut. ST-DBSCAN:An algorithm for clustering spatial-temporal data. Data & Knowledge Engineering,2007, Vol.60 (1):208-221
    [45]Yaniv Gurwicz, Boaz Lerner. Bayesian network classification using spline-approximated kernel density estimation. Pattern Recognition Letters,2005,Vol.26(11):1761-1771
    [46]Aritz Perez, Pedro Larranaga, Inaki Inza. Bayesian classifiers based on kernel density estimation:Flexible classifiers. International Journal of Approximate Reasoning,2009, Vol.50(2):341-362
    [47]Charles Taylor. Classification and kernel density estimation. Vistas in Astronomy, Vol.41(3): 411-417
    [48]Biskin O.T., Kuntalp M., Kuntalp, D.G. Classification of arrhythmias according to the energy spectral density features by using Kernel density estimation.2010 15th National Biomedical Engineering Meeting (BIYOMUT):1-4
    [49]Wirjadi O., Breuel T. Global Modes in Kernel Density Estimation:RAST Clustering.7th International Conference on Hybrid Intelligent Systems,2007:314-319
    [50]A.K. Jain and R.C.Dubes, Algorithms for Clustering Data. Prentice Hall Advanced Reference Series. Prentice Hall, March 1988
    [51]S.M.Savaresi and D.Boley, "A Comparative Analysis on the Bisecting K-means and the PDDP Clustering Algorithms," Intelligent Data Analysis. IOS Press,2004, Vol.8(4) 345-362
    [52]P.H.A. Sneath and R.R.Sokal, Numerical Taxonomy. Freeman, San Francisco,1971
    [53]A. F. Gomez-Skarmeta, M. Delgado, M. A. Vila. About the use of fuzzy clustering techniques for fuzzy model identification. Fuzzy Sets and Systems,1999,Vol.106(2):179-188
    [54]Francisco de A.T. de Carvalho, Camilo P. Tenorio. Fuzzy K-means clustering algorithms for interval-valued data based on adaptive quadratic distances. Fuzzy Sets and Systems,2010, Vol.161 (23):2978-2999
    [55]I.Jonyer, D.J.Cook, L.B.Holder. Graph-based hierarchical conceptual clustering. Journal of Machine Learning Research,2002, Vol.2:19-43
    [56]U Maulik, S Bandyopadhyay.Genetic algorithm-based clustering technique. Pattern Recognition,2000,Vol.33(9):1455-1465
    [57]Lei Wang,Huan Ji.An artificial immune cell model based C-means clustering algorithm, the 7th World Congress on Intelligent Control and Automation,2008:825-829
    [58]J. Gomez, D. Dasgupta and O Nasraoui. A new gravitational clustering algorithm, the 3rd SIAM International Conference on Data Mining,2003:83-94.
    [59]W.E. Wright. Gravitational clustering.Pattern Recognition,1977,Vol.9(3):151-166.
    [60]T. V. Ravi and K. Chidananda Gowda. Clustering of Symbolic Objects Using Gravitational Approach. IEEE Transactions on Systems, Man, and Cybernetics,1999, Vol.29(6):888-894
    [61]U. Orhan and M. Hekim.Gravitational approach to supervised clustering for bi-class datasets. 6th International Conference on Electrical and Electronics Engineering,2009:11-398-11-400.
    [62]U. Orhan, M. Hekim, T. Ibrikci. Gravitational fuzzy clustering.MICAI 2008:Advances in Artificial Intelligence, Lecture Notes in Computer Science,2008, Vol.5317:524-531
    [63]Y. Endo,H. Iwata. Dynamic clustering based on universal gravitation model. Modeling Decisions for Artificial Intelligence Lecture Notes in Computer Science,2005, Vol. 3558:183-193
    [64]Li Junlin, Fu Hongguang. Data classification based on supporting data gravity. IEEE International Conference on Intelligent Computing and Intelligent Systems,2009:22-28
    [65]Li Junlin, Fu Hongguang. Molecular dynamics-like data clustering approach. Pattern Recognition,2011, Vol.44 (8):1721-1737
    [66]L. A. Garcia-Escudero,A. Gordaliza. Robustness properties of k-means and trimmed k-means. J. Amer. Stat. Assoc.,1999, Vol.94(447):956-969
    [67]R. N. Dave. Characterization and detection of noise in clustering. Pattern Recognition Letters,1991,Vol.12 (11):657-664
    [68]A.R. Leach. Molecular,Modelling:Principles and Applications.Prentice Hall,,2001
    [69]Kurt Binder. Monte Carlo and Molecular Dynamics Simulations in Polymer Science. Oxford University Press,1995
    [70]Chen Zhenglong, Xu Weiren and Tang Lida. Theories and Practices of Molecular Simulation, Chemical Industry Press,2007.
    [71]Qing-Bao Liu, Su Deng, Chang-Hui Lu,et al.Relative density based k-nearest neighbors clustering algorithm.2003 IEEE International Conference on Machine Learning and Cybernetics, Vol.1:133-137
    [72]Iris:UC Irvine Machine Learning Repository. On net:http://archive.ics.uci.edu/ml/datasets/ Iris
    [73]Wine:UC Irvine Machine Learning Repository. On net:http://archive.ics.uci.edu/ml/datasets/ Wine
    [74]K.Y. Yeung,C. Fraley,A. Murua,et al.Model-based clustering and data transformations for gene expression data. Bioinformatics,2001, Vol.17(10):977-987
    [75]K.Y. Yeung, M. Medvedovic, R.E. Bumgarner.Clustering gene expression data with repeated measurements. Genome Biol,2003, Vol.4(5):R34
    [76]IE. Tavazo, J.D. Hughes, M.J. Campbell, et al. Systematic determination of genetic network architecture. Natural Genetics,1999, Vol.22(3):281-285
    [77]Eisen M.B,Spellman P.T,Brown P.O, et al. Cluster analysis and display of genome-wide expression patterns.Proceedings of the National Academy of Sciences of the United States of America,1998,Vol.95(25):14863-14868.
    [78]Golub T.R,Slonim D.K, Tamayo P,et al. Molecular classification of cancer:class discovery and class predication by gene expression monitoring. Science,1999, Vol.286(5439):531-537
    [79]GSE25435 record:http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE25435. National Center for Biotechnology Information (NCBI).
    [80]李存华,孙志辉,陈耿,等.核密度估计及其在聚类算法构造中的应用.计算机研究与发展,2004,41(10)：1712-1719
    [81]淦文燕,李德毅.基于核密度估计的层次聚类算法.系统仿真学报,2004,16(2)：302-309
    [82]Thanh N. Tran, Ron Wehrens, Lutgarde M.C. Buydens. KNN-kernel density-based clustering for high-dimensional multivariate data. Computational Statistics & Data Analysis,2006, Vol.51(2):513-525
    [83]Hall P, Wand M. On the accuracy of binned kernel density estimators. Journal of Multivariate Analysis,1994,Vol.56(2):165-184
    [84]闵捷,王文梅,沈其君.核密度估计法在口腔粘膜增生性质判别中的应用.南京铁道医学院学报,2000,19(1)：28-29
    [85]高歌,王艾丽,曹晓韵.非参数逐步判别分析在脑中风分类诊断中的应用.数理统计与管理,2004,23(5)：48-51
    [86]Oyang YJ, Ou YY, Hwang SC, Chen CY, Chang DTH. Data classification with a relaxed model of variable kernel density estimation.2005 IEEE International Joint Conference on Neural Networks, Vol.5:2831-2836.
    [87]Hoti F, Holmstrom L. A semiparametric density estimation approach to pattern classification. Pattern Recognition,2004, Vol.37(3):409-419.
    [88]Holmstrom L, Hoti F. Application of semiparametric density estimation to classification. In: Pattern Recognition,2004. ICPR 2004. Proceedings of the 17th International Conference,2004, Vol.3:371-374.
    [89]Weiss G. Mining with rarity:A unifying framework. SIGKDD Explorations,2004, Vol.6(1): 7-19
    [90]Drummond C, Holte RC.C4.5,class imbalance, and cost sensitivity:why under-sampling beats over-sampling. Working Notes of the ICML'03 Workshop on Learning from Imbalanced Data sets,2003
    [91]Nathalie Japkowicz. The class imbalance problem:significance and strategy, the 2002 International Conference on Artificial Intelligence:Special Track on Inductive Learning, Vol.1:111-117.
    [92]Holte RC. Very simple classification rules perform well on most commonly used data sets. Machine Learning, Vol.1993,11:63-91.
    [93]韩家炜,堪博.数据挖掘概念与技术.机械工业出版社,2008,6
    [94]Stijn Viaene, Guido Dedene. Cost-sensitive learning and decision making revisited. European Journal of Operational Research,2005,Vol.166(1):212-220
    [95]Hao P Y,Chiang J H,Lin Y H. A new maximal margin spherical structured multiclass support vector machine. Applied Intelligence,2009, Vol.30(2):98-111
    [96]Y.H. Qiao, J.L. Liu, C.G. Zhang,,et al. SVM classification of human intergenic and gene sequences. Mathematical Biosciences, Vol.195(2):168-178
    [97]Geoffery J. Mclachlan. Discriminant Analysis and Statistical Pattern Recognition. New York,John Wiley and Sons,1992
    [98]Smith, F.W. Design of Multicategory Pattern Classifiers with Two-Category Classifier Design Procedures. IEEE Transactions on Computers,1969, Vol. C-18(6):548-551
    [99]Tetsuji Takahashi, Mineichi Kudo, Atsuyoshi Nakamura. Construction of Convex Hull Classifiers in High Dimensions. Pattern Recognition Letters,2011, online in press.
    [100]Kudo M., Nakamura A., Takigawa I. Classification by reflective convex hulls.19th International Conference on Pattern Recognition,2008:1-4
    [101]Qing He, Zhong-Zhi Shi and Li-An Ren.The classification method based on hyper surface.the 2002 International Joint Conference on Neural Networks, Vol.2:1499-1503
    [102]Erman J.,M.Arlitt, A.Mahant. Traffic classification using clustering algorithms. the SIGCOMM workshop-Mining network data,2006:281-286
    [103]Avi-Itzhak H.I., Van Mieghem J.A., Rub, L. Multiple subclass pattern recognition:A maximin correlation approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.17(4):418-431
    [104]Charles Bouveyron, Stephane Girard. Robust supervised classification with mixture models: Learning from data with uncertain labels. Pattern Recognition, Vol.42(11):2649-2658
    [105]Huan Xu,Constantine Caramanis,Shie Mannor. Robustness and Regularization of Support Vector Machines. The Journal of Machine Learning Research, Vol.10:1485-1510
    [106]Aleksander Kolcz,Choon Hui Teo. Feature Weighting for Improved Classifier Robustness. CEAS:Sixth Conference on Email and Anti-Spam,2009: http://users.cecs.anu.edu.au/-chteo/pub/KolTeo09.pdf
    [107]Bruzzone, L. A Novel Context-Sensitive Semisupervised SVM Classifier Robust to Mislabeled Training Samples. IEEE Transactions on Geoscience and Remote Sensing, Vol.47(7):2142-2154
    [108]Mehran Ahmadlou,Hojjat Adeli. Enhanced probabilistic neural network with local decision circles:A robust classifier. Integrated Computer-Aided Engineering, Vol.17(3):197-210
    [109]Lam, B.S.Y.; Hong Yan. Robust clustering algorithm for high dimensional data classification based on multiple supports. IEEE International Joint Conference on Neural Networks,2008: 1969-1976
    [110]P. A. Devijver, J. Kittler. Pattern Recognition, A Statistical Approach. London, Prentice Hall, 1982
    [111]Chunhua Shen, Hartley R..On the Optimality of Sequential Forward Feature Selection Using Class Separability Measure, International Conference on Digital Image Computing Techniques and Applications,2011:203-208
    [112]Padraig Cunningham.Dimension Reduction. Machine Learning Techniques for Multimedia, 2008:Part Ⅰ,91-112
    [113]K. A. Abdul Nazeer, M. P. Sebastian. Improving the Accuracy and Efficiency of the k-means Clustering Algorithm. Proceedings of the World Congress on Engineering,2009, Vol Ⅰ: http://www.iaeng.org/publication/WCE2009/WCE2009_pp308-312.pdf
    [114]Anil K. Jain. Data clustering:50 years beyond K-means. Pattern Recognition Letters,2010, 2010,Vol.31(8):651-666
    [115]John Peter, S. Minimum spanning tree based clustering for outlier detection. Journal of Discrete Mathematical Sciences & Cryptography,2011, Vol.14(2):149-166
    [116]R. O. Duda, P. E. Hart, D. G. Stork. Pattern Classification,John Wiley & Sons,2001
    [117]Wesam Ashour,Saad Sunoallah. Multi Density DBSCAN, Intelligent Data Engineering and Automated Learning,2011, Vol.6936:446-453
    [118]冯少荣,肖荣俊DBSCAN聚类算法的研究与改进.中国矿业大学学报.2008(1)：105-111
    [119]Zhi-Hua Zhou. Cost-Sensitive Learning. Modeling Decision for Artificial Intelligence,,2011,Vol.6820:17-18
    [120]Thai-Nghe N, Gantner Z., Schmidt-Thieme L. Cost-sensitive learning methods for imbalanced data. The 2010 International Joint Conference on Neural Networks,2010:1-8
    [121]Mark J. Lawson, Lenwood Heath, Naren Ramakrishnan. Using Cost-Sensitive Learning to Determine Gene Conversions. Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence,2008, Vol.5227:1030-1038

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700