用户名: 密码: 验证码:
基于目标特征选择和去除的改进K-means聚类算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Improved K-means clustering algorithm based on feature selection and removal on target point
  • 作者:杨华晖 ; 孟晨 ; 王成 ; 姚运志
  • 英文作者:YANG Hua-hui;MENG Chen;WANG Cheng;YAO Yun-zhi;Department of Missile Engineering,Army Engineering University;
  • 关键词:K-均值算法 ; 特征选择 ; 高维数据聚类 ; 特征赋权 ; 数据去噪
  • 英文关键词:K-means algorithm;;feature selection;;high-dimensional data clustering;;feature weighting;;data denoising
  • 中文刊名:KZYC
  • 英文刊名:Control and Decision
  • 机构:陆军工程大学导弹工程系;
  • 出版日期:2018-05-14 09:45
  • 出版单位:控制与决策
  • 年:2019
  • 期:v.34
  • 基金:国家自然科学基金项目(61501493)
  • 语种:中文;
  • 页:KZYC201906012
  • 页数:8
  • CN:06
  • ISSN:21-1124/TP
  • 分类号:101-108
摘要
针对高维数据聚类中K-means算法无法有效抑制噪声特征、实现不规则形状聚类的缺点,提出一种基于目标点特征选择和去除的改进K-均值聚类算法.该算法使用闵可夫斯基规度作为评价距离进行目标点的分类,增设权重调节参数a、重置权重系数α进行特征选择和去除,可有效减小非聚类指标特征带来的噪声影响.算法验证实验选取UCI真实数据集和人工数据集进行聚类分析,验证改进算法对抑制噪声特征的有效性,与WK-means、iMWK-means算法进行实验对比,分析聚类学习时特征选择的适用性,同时寻找最优的距离系数β和权重系数α.
        Aiming at the weakness that the K-means algorithm cannot effectively suppress the noise attributes and realize irregular shape clustering on high-dimensional data, an improved K-means clustering algorithm based on feature selection and removal on target point is proposed. In the improved K-means algorithm, the Minkowski metric is adopted as the evaluation of distance for the classification of the target point. The weighting adjustment parameter a is added and the weighting coefficient α is reset for feature selection and removal, which can reduce the effect of non-clustering index noise features. The UCI real datasets and artificial datasets are used for clustering analysis in the algorithm validation experiment. And the effectiveness of suppressing the noise features is validated. Compared with the WK-means and iMWK-means algorithms in the validation experiment, the applicability of feature selection in clustering learning process is analyzed. At the same time, the optimal distance coefficient β and the weighting coefficient α are found.
引文
[1]Bai L,Cheng X Q,Liang J Y,et al.Fast density clustering strategies based on the k-means algorithm[J].Pattern Recognition,2017,71(3):375-386.
    [2]Jiang X P,Li C H,Sun J.A modified K-means clustering for mining of multimedia databases based on dimensionality reduction and similarity measures[J].Cluster Computing,2017,20(10):1-8.
    [3]黄月,吴成东,张云洲,等.基于K均值聚类的二进制传感器网络多目标定位方法[J].控制与决策,2013,28(10):1497-1501.(Huang Y,Wu C D,Zhang Y Z,et al.Multi-objective localization method based on K-means clustering in binary sensor network[J].Control and Decision,2013,28(10):1497-1501.)
    [4]Chan E Y,Ching W K,Ng M K,et al.An optimization algorithm for clustering using weighted dissimilarity measures[J].Pattern Recognition,2004,37(5):943-952.
    [5]Huang J Z,Ng M K,Rong H,et al.Automated variable weighting in k-means type clustering[J].IEEE Trans on Pattern Analysis and Machine Learning,2005,27(5):657-668.
    [6]Chen E Y,Ye Y,Xu X,et al.A feature group weighting method for subspace clustering of high-dimensional data[J].Pattern Recognition,2012,45(1):434-446.
    [7]Amorim R C,Mirkin B.Minkowski metric,feature weighting and anomalous cluster initializing in K-means clustering[J].Pattern Recognition 2012,45(3):1061-1075.
    [8]Amorim R C D,Komisarczuk P.On initializations for the minkowski weighted k-means[C].Int Conf on Advances in Intelligent Data Analysis.Helsinki:IEEE,2012:45-55.
    [9]Amorim R C D,Hennig C.Recovering the number of clusters in data sets with noise features using feature rescaling factors[J].Information Science,2015,324:126-245.
    [10]Amorim R C D,Mirkin B.Selecting the Minkowski exponent for intelligent K-means with feature weighting[M].Clusters,Orders,Trees:Methods and Applications,Optimization and its Applications.Berlin:Springer,2014:103-117.
    [11]Tsai C Y,Chui C C.Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm[J].Computational Statistics Data Analysis,2008,52(10):4685-4672.
    [12]Chen X,Ye X,Xu X,et al.A feature group weighting method for subspace clustering of high-dimensional data[J].Pattern Recognition,2012,45(1):434-446.
    [13]Ji J C,Bai T,Zhou C G,et al.An improved K-prototypes clustering algorithm for mixed numeric and categorical data[J].Neurocomputing,2013,120(22):590-596.
    [14]陈爱国,王士同.基于多代表点的大规模数据模糊聚类算法[J].控制与决策,2016,31(12):2122-2130.(Chen A G,Wang S T.Fuzzy clustering algorithm based on multiple medoids for large-scale data[J].Control and Decision,2016,31(12):2122-2130.)
    [15]李向丽,耿鹏,邱保志.混合属性数据集的聚类边界检测技术[J].控制与决策,2015,30(1):171-175.(Li X L,Geng P,Qiu B Z.Clustering boundary detection technology for mixed attribute data set[J].Control and Decision,2015,30(1):171-175.)
    [16]Anaraki F P,Becker S.Preconditioned data sparsification for big data with applications to PCA and K-means[J].IEEE Trans on Information Theory,2017,63(5):2954-2974.
    [17]Chiang M M,Mirkin B.Intelligent choice of the number of clusters in k-means clustering:An experimental study with different cluster spreads[J].J of Classification,2010,27(1):1-38.
    [18]李武,赵娇燕,严太山.基于平均差异度优选初始聚类中心的改进K-均值聚类算法[J].控制与决策,2017,32(4):759-762.(Li W,Zhao J Y,Yan T S.Improved K-means clustering algorithm optimizing initial clustering centers based on average difference degree[J].Control and Decision,2017,32(4):759-762.)
    [19]王莉,周献中,沈捷.一种改进的粗K均值聚类算法[J].控制与决策,2012,27(11):1711-1719.(Wang L,Zhou X Z,Shen J.An improved rough K-means clustering algorithm[J].Control and Decision,2012,27(11):1711-1719.)

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700