用户名: 密码: 验证码:
一种基于K近邻的比较密度峰值聚类算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Comparative Density Peaks Clustering Based on K-Nearest Neighbors
  • 作者:杜沛 ; 程晓荣
  • 英文作者:DU Pei;CHENG Xiaorong;School of Control and Computer Engineering, North China Electric Power University;
  • 关键词:聚类算法 ; 密度峰值 ; K近邻 ; 决策图 ; 类簇中心
  • 英文关键词:clustering algorithm;;density peaks;;K-nearest neighbors;;decision graph;;cluster centers
  • 中文刊名:JSGG
  • 英文刊名:Computer Engineering and Applications
  • 机构:华北电力大学控制与计算机工程学院;
  • 出版日期:2018-12-29 14:46
  • 出版单位:计算机工程与应用
  • 年:2019
  • 期:v.55;No.929
  • 基金:中央高校基本科研业务费专项资金(No.2018MS073)
  • 语种:中文;
  • 页:JSGG201910025
  • 页数:8
  • CN:10
  • 分类号:166-173
摘要
快速搜索与发现密度峰值聚类算法(Fast Search and Discovery Density Peak Clustering Algorithm,CFSFDP)的聚类效果十分依赖截断距离dc的主观选取,而最佳dc值的确定并不容易,并且当处理分布复杂、密度变化大的数据集时,算法生成的决策图中类簇中心点与非类簇中心点的区分不够明显,使类簇中心的选取变得困难。针对这些问题,对其算法进行了优化,并提出了基于K近邻的比较密度峰值聚类算法(Comparative Density Peak Clustering algorithm Based on K-Nearest Neighbors,CDPC-KNN)。算法结合K近邻概念重新定义了截断距离和局部密度的度量方法,对任意数据集能自适应地生成截断距离,并使局部密度的计算结果更符合数据的真实分布。同时在决策图中引入距离比较量代替原距离参数,使类簇中心在决策图上更加明显。通过实验验证,CDPC-KNN算法的聚类效果整体上优于CFSFDP算法与DBSCAN算法,分离度实验表明新算法使类簇中心与非类簇中心点的区分度得到有效提高。
        The clustering effect of the Fast Search and Discovery Density Peak Clustering Algorithm(CFSFDP)relies heavily on the subjective setting of the truncation distance dc, while the determination of the optimum value is not easy,and when dealing with the data sets with complex structure and large variations in density, the distinction generated by CFSFDP algorithm between the cluster center points and the non-cluster center points in the decision graph is not obvious enough, making the selection of the cluster centers difficult. Aiming at these problems, the algorithm is optimized and a Comparative Density Peak Clustering algorithm based on K-Nearest Neighbors(CDPC-KNN)is proposed. The algorithm combines the concept of K-nearest neighbors to redefine the measurement method of truncation distance and local density.It can adaptively generate the truncation distance for arbitrary datasets, and make the calculation results of local density more consistent with the real distribution of data. Meanwhile, the distance comparison quantity is introduced to replace the distance parameter, so that the cluster centers are more obvious on the decision graph. The experimental results show that the clustering effect of CDPC-KNN algorithm is better than CFSFDP algorithm and DBSCAN algorithm in general. The separation experiment shows that the new algorithm effectively improves the discrimination between cluster center points and non-cluster center points.
引文
[1]Fahad A,Alshatri N,Tari Z,et al.A survey of clustering algorithms for big data:taxonomy and empirical analysis[J].IEEE Transactions on Emerging Topics in Computing,2014,2(3):267-279.
    [2]Córdova-Palomera A,Fatjó-Vilas M,Kebir O,et al.Intelligent approaches to interact with machines using hand gesture recognition in natural way:a survey[J].International Journal of Computer Science&Engineering Survey,2011,2(1):122-133.
    [3]Li Z,Liu J,Yang Y,et al.Clustering-guided sparse structural learning for unsupervised feature selection[J].IEEETransactions on Knowledge&Data Engineering,2014,26(9):2138-2150.
    [4]Wang D,Chai Q,Ng G S.Ovarian cancer diagnosis using a hybrid intelligent system with simple yet convincing rules[J].Applied Soft Computing Journal,2014,20(7):25-39.
    [5]Ryu T W.Discovery of characteristic knowledge in databases using cluster analysis and genetic programming[D].Houston:University of Houston,1998.
    [6]Fu L,Medico E.FLAME,a novel fuzzy clustering method for the analysis of DNA microarray data[J].BMC Bioinformatics,2007,8(1):3.
    [7]Chen T,Zhang N L,Liu T,et al.Model-based multidimensional clustering of categorical data[J].Artificial Intelligence,2012,176(1):2246-2269.
    [8]Parikh M,Varma T.Survey on different grid based clustering algorithms[J].International Journal of Advance Research in Computer Science and Management Studies,2014,2(2):427-430.
    [9]Jain A K.Data clustering:50 years beyond K-means[M]//Machine learning and knowledge discovery in databases.Berlin,Heidelberg:Springer,2008:3-4.
    [10]Ester M,Kriegel H P,Sander J.A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining,1996:226-231.
    [11]冯少荣,肖文俊.一种提高DBSCAN聚类算法质量的新方法[J].西安电子科技大学学报,2008,35(3):523-529.
    [12]Rodriguez A,Laio A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
    [13]Wu K L,Yang M S.Mean shift-based clustering[J].Pattern Recognition,2007,40(11):3035-3052.
    [14]李涛,葛洪伟,苏树智.基于密度自适应距离的密度峰聚类[J].小型微型计算机系统,2017,38(6):1347-1352.
    [15]马春来,单洪,马涛.一种基于簇中心点自动选择策略的密度峰值聚类算法[J].计算机科学,2016,43(7):255-258.
    [16]鲍舒婷,孙丽萍,郑孝遥,等.基于共享近邻相似度的密度峰值聚类算法[J].计算机应用,2018,38(6):1601-1607.
    [17]Li Zejian,Tang Yongchuan.Comparative density peaks clustering[J].Expert Systems with Applications,2018,95.
    [18]Mehmood R,Bie R,Dawood H,et al.Fuzzy clustering by fast search and find of density peaks[C]//International Conference on Identification,Information,and Knowledge in the Internet of Things,2016:258-261.
    [19]Mehmood R,Zhang G,Bie R,et al.Clustering by fast search and find of density peaks via heat diffusion[J].Neurocomputing,2016,208(C):210-217.
    [20]薛小娜,高淑萍,彭弘铭,等.结合K近邻的改进密度峰值聚类算法[J].计算机工程与应用,2018,54(7):36-43.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700