摘要
【目的】解决传统的FCM算法随机选取初始聚类中心、对噪声敏感、只适合均衡分布的样本聚类问题。【方法】提出一种基于Huffman树的FCM新算法,该算法设计一种高密度样本的相异度矩阵构建Huffman树并获取初始聚类中心,进而给出非归一化约束的样本隶属度函数。【结果】通过人造样本及图像数据集、UCI数据集的实验对比结果表明,算法在聚类精度、运算时间等指标上比基于高斯核FCM算法及传统FCM算法更有优势。【局限】仅凭实验或经验确定样本密度调节因子?,尚缺乏理论依据。【结论】本研究在现实生活中对含有大量噪声样本及样本分布非均衡的数据集聚类有一定的实际应用价值。
[Objective] This paper tries to solve the issues facing traditional FCM algorithm,such as randomly choosing initial cluster center,sensitive to noise,and only capable of clustering the equally distributed samples.[Methods] We proposed a new FCM clustering algorithm based on Huffman tree with dissimilarity degree matrix of high density sample sets.The new algorithm could get initial clustering centers,and then generate the membership function of the non-normalized constraint samples.[Results] We examined the proposed algorithm with man-made samples,images,and UCI datasets.The clustering accuracy and the computation time of the new algorithm were better than algorithms based on the Gauss kernel or traditional FCM.[Limitations] The ?of the sample density adjustment factor was decided by experiment or experience without theoretical supports.[Conclusions] The proposed algorithm could be used for clustering data sets with high level of noise and distributed unequally.
引文
[1]Chaturvedi K,Patel R,Swami D K.Fuzzy C-Means Based Inference Mechanism for Association Rule Mining:A Clinical Data Mining Approach[J].International Journal of Advanced Computer Science and Applications,2015,6(6):103-110.
[2]He L H,Wen Y,Wan M,et al.Multi-Channel Features Based Automated Segmentation of Diffusion Tensor Imaging Using an Improved FCM with Spatial Constraints[J].Neurocomputing,2014,137(5):107-114.
[3]Li M,Gao Z,Pei Z,et al.Fuzzy Markov Model Based on FCM for Electromagnetic Environment Parameters Prediction[J].Journal of Information and Computational Science,2015,12(5):1713-1722.
[4]Kalam R,Thomas C,Rahiman M A.Gaussian Kernel Based Fuzzy C-Means Clustering Algorithm for Image Segmentation[J].Computer Science&Information Technology,2016,6(4):47-56.
[5]Bezdek J C.Fuzzy Mathematics in Pattern Classification PHP Thesis[D].Applied Math Center,Ithaca,Cornell University,1973:67-84.
[6]Dunn J C.A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters[J].Journal of Cybernetics,1973(1):32-57
[7]Oh S,Ahn C W,Jeon M.A New Evolutionary Approach to Cluster Validation Index[J].Journal of Computational and Theoretical Nanoscience,2010,7(5):806-812.
[8]Lin B,Wen W,Liu S,et al.Incremental Kernel Fuzzy C-Means with Optimizing Cluster Center Initialization and Delivery[J].Kybernetes,2016,45(8):1273-1291.
[9]Zhang B,Qin S,Wang W,et al.Data Stream Clustering Based on Fuzzy C-Mean Algorithm and Entropy Theory[J].Signal Processing,2016:111-116.
[10]Kannan S R,Ramathilagam S,Chung P C.Effective Fuzzy C-Means Clustering Algorithms for Data Clustering Problems[J].Expert Systems with Applications,2012,39(7):6292-6300.
[11]Mohamad F,Nosratallah F,Mohammad T.Parameter Optimization of Improved Fuzzy C-Means Clustering Algorithm for Brain MR Image Segmentation[J].Journal Engineering Applications of Artificial Intelligence,2010,23(2):160-168.
[12]Usman Q.A Dissimilarity Measure Based Fuzzy C-Means(FCM)Clustering Algorithm[J].Journal of Intelligent&Fuzzy Systems:Applications in Engineering and Technology,2014,26(1):229-238.
[13]孟海东,马娜娜,宋宇晨,等.基于密度函数加权的模糊C均值聚类算法研究[J].计算机工程与应用,2012,48(27):123-127.(Meng Haidong,Ma Nana,Song Yuchen,et al.Research on Fuzzy C-Means Clustering Algorithm Based on Density Function Weighted[J].Computer Engineering and Applications,2012,48(27):123-127.)
[14]Zhu C J,Yang S Z,Zhao Q,et al.Robust Semi-Supervised Kernel-FCM Algorithm Incorporating Local Spatial Information for Remote Sensing Image Classification[J].Journal of the Indian Society of Remote Sensing,2014,42(1):35-49.
[15]Zhang Q,Lu J,Wei H,et al.Dynamic Hand Gesture Segmentation Method Based on Unequai-Probabilities Background Difference and Improved FCM Algorithm[J].International Journal of Innovative Computing,Information and Control,2015,11(5):1823-1834
[16]Siarry P,Oulhadj H.A Benaichouche Multiobjective Improved Spatial Fuzzy C-Means Clustering for Image Segmentation Combining Pareto-Optimal Clusters[J].Journal of Heuristics,2016,22(4):1-22.
[17]Bache K,Lichman M.UCI Machine Learning Repository[EB/OL].[2017-10-10].http://archive.Ics.Uci.edu/datasets.html.