摘要
谱聚类将数据聚类问题转化成图划分问题,通过寻找最优的子图,对数据点进行聚类。谱聚类的关键是构造合适的相似矩阵,将数据集的内在结构真实地描述出来。针对传统的谱聚类算法采用高斯核函数来构造相似矩阵时对尺度参数的选择很敏感,而且在聚类阶段需要随机确定初始的聚类中心,聚类性能也不稳定等问题,本文提出了基于消息传递的谱聚类算法。该算法采用密度自适应的相似性度量方法,可以更好地描述数据点之间的关系,然后利用近邻传播(Affinity propagation,AP)聚类中"消息传递"机制获得高质量的聚类中心,提高了谱聚类算法的性能。实验表明,新算法可以有效地处理多尺度数据集的聚类问题,其聚类性能非常稳定,聚类质量也优于传统的谱聚类算法和k-means算法。
Spectral clustering transforms data clustering problem into a graph partitioning problem and classifies data points by finding the optimal sub-graphs. The key to spectral clustering is constructing a suitable similarity matrix,which can truly describe the intrinsic structure of the dataset. However,traditional spectral clustering algorithms adopt Gaussian kernel function to construct the similarity matrix,which results in their sensitivity of selection for scale parameter. In addition,the initial cluster centers need randomly determing at the clustering stage and the clustering performance is not stable. The paper presents an algorithm based on message passing. The algorithm uses a density adaptive similarity measure,which can well describe the relations between data points,and it can obtain high-quality cluster centers through message passing mechanism in affinity propagation(AP) clustering. Moreover, the performance of clustering is optimized by the method. Experiments show that the proposed algorithm can effectively deal with the clustering problem of multi-scale datasets. Its clustering performance is very stable,and the clustering quality is better than traditional spectral clustering algorithm and k-means algorithm.
引文
[1]孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008,19(1):48-61.Sun Jigui,Liu Jie,Zhao Lianyu.Clustering algorithms research[J].Journal of Software,2008,19(1):48-61.
[2]Nascimento M C V,Andre C,De Carvalho A.Spectral methods for graph clustering-A survey[J].European Journal of Operational Research,2011,211(2):221-231.
[3]Ding S F,Jia H J,Zhang L W,et al.Research of semi-supervised spectral clustering algorithm based on pairwise constraints[J].Neural Computing&Applications,2014,24(1):211-219.
[4]Chen W F,Feng G C.Spectral clustering:A semi-supervised approach[J].Neurocomputing,2012,77(1):229-242.
[5]Fan N,Pardalos P M.Multi-way clustering and biclustering by the ratio cut and normalized cut in graphs[J].Journal of Combinatorial Optimization,2012,23(2):224-251.
[6]Von Luxburg U.A tutorial on spectral clustering[J].Statistics and Computing,2007,17(4):395-416.
[7]李建元,周脚根,关佶红,等谱图聚类算法研究进展[J].智能系统学报,2011,6(5):405-414.Li Jianyuan,Zhou Jiaogen,Guan Jihong,et al.A survey of clustering algorithms based on spectra of graphs[J].CAAI Transactions on Intelligent Systems,2011,6(5):405-414.
[8]Tung F,Wong A,Clausi D A.Enabling scalable spectral clustering for image segmentation[J].Pattern Recognition,2010,43(12):4069-4076.
[9]Zhang D P,Chen F Y,Peng H L.Detecting group-level crowd using spectral clustering analysis on particle trajectories[J].Information Technology Journal,2013,12(1):174-179.
[10]Ding L,Gonzalez-Longatt F M,Wall P,et al.Two-step spectral clustering controlled islanding algorithm[J].IEEETransactions on Power Systems,2013,28(1):75-84.
[11]刘馨月,李静伟,于红,等.基于共享近邻的自适应谱聚类[J].小型微型计算机系统,2011,32(9):1876-1880.Liu Xinyue,Li Jingwei,Yu Hong,et al.Adaptive spectral clustering based on shared nearest neighbors[J].Journal of Chinese Computer System,2011,32(9):1876-1880.
[12]Zelnik-Manor L,Perona P.Self-tuning spectral clustering[C]//Proceeding of NIPS.Vancouver,Canada:Neural Information Processing Systems Foundation,2005:1601-1608.
[13]Liu X Y,Li J W,Yu H,et al.Adaptive spectral clustering based on shared nearest neighbors[J].Journal of Chinese Computer Systems,2011,32(9):1876-1880.
[14]陶新民,宋少宇,曹盼东,等.一种基于流形距离核的谱聚类算法[J].信息与控制,2012,41(3):307-313.Tao Xinmin,Song Shaoyu,Cao Pandong,et al.A spectral clustering algorithm based on manifold distance kernel[J].Information and Control,2012,41(3):307-313.
[15]Ding S F,Qi B J,Jia H J,et al.Research of semi-supervised spectral clustering based on constraints expansion[J].Neural Computing and Applications,2013,22(1):405-410.
[16]Hamad D,Biela P.Introduction to spectral clustering[C]//Proceedings of the International Conference on Information and Communication Technologies from Theory to Applications-ICTTA'08.Damascus,Syria:IEEE Computer Society,2008:634-639.
[17]蔡晓妍,戴冠中,杨黎斌.谱聚类算法综述[J].计算机科学,2008,35(7):14-18.Cai Xiaoyan,Dai Guanzhong,Yang Libin.Survey on spectral clustering algorithms[J].Computer Science,2008,35(7):14-18.
[18]Ng A Y,Jordan M I,Weiss Y.On spectral clustering:Analysis and an algorithm[J].Advances in Neural Information Processing Systems,2002,14:849-856.
[19]汪中,刘贵全,陈恩红.基于模糊K-harmonic means的谱聚类算法[J].智能系统学报,2009,4(2):95-99.Wang Zhong,Liu Guiquan,Chen Enhong.A spectral clustering algorithm based on fuzzy K-harmonicmeans[J].CAAI Transactions on Intelligent Systems,2009,4(2):95-99.
[20]Frey B J,Dueek D.Clustering by passing messages between data points[J].Sincence,2007,315(5814):972-976.
[21]董俊,王锁萍,熊范纶.可变相似性度量的近邻传播聚类[J].电子与信息学报,2010,32(3):509-514.Dong Jun,Wang Suoping,Xiong Fanlun.Affinity propagation clustering based on variable-similarity measure[J].Journal of Electronics&Information Technology,2010,32(3):509-514.
[22]Zhang X L,Wang W,N?rvag K,et al.K-AP:Generating specified k clusters by efficient affinity propagation[C]//Proceedings2010 10th IEEE International Conference on Data Mining(ICDM 2010).Sydney,Australia:IEEE,2010:1187-1192.
[23]Zhang X C,Li J W,Yu H.Local density adaptive similarity measurement for spectral clustering[J].Pattern Recognition Letters,2011,32(2):352-358.
[24]Wang Y,Jiang Y,Wu Y,et al.Spectral clustering on multiple manifolds[J].IEEE Transactions on Neural Networks,2011,22(7):1149-1161.
[25]Yang P,Zhu Q S,Huang B.Spectral clustering with density sensitive similarity function[J].Knowledge-Based Systems,2011,24(5):621-628.
[26]王玲,薄列峰,焦李成.密度敏感的谱聚类[J].电子学报,2007,35(8):1577-1581.Wang Ling,Bo Liefeng,Jiao Licheng.Density-sensitive spectral clustering[J].Acta Electronica Sinica,2007,35(8):1577-1581.
[27]Xie J Y,Jiang S,Xie W X,et al.An efficient global k-means clustering algorithm[J].Journal of Computers,2011,6(2):271-279.