基于层次粒化的特征选择算法

设为首页

收藏本站

网站地图 | English | 公务邮箱

NSTL服务站

基于层次粒化的特征选择算法

详细信息查看全文 | 推荐本文 |

英文篇名：Feature Selection Algorithm Based on Hierarchical Granulation
作者：陈辉皇 ; 林耀进 ; 王晨曦 ; 童先群 ; 胡敏杰
英文作者：CHEN Huihuang;LIN Yaojin;WANG Chenxi;TONG xianqun;HU Minjie;School of Computer Science,Minnan Normal University;Department of Computer Engineering,Zhangzhou Institute of Technology;
关键词：特征选择 ; 粒计算 ; 层次粒化 ; 互信息
英文关键词：feature selection;;granular computing;;hierarchical granulation;;mutual information
中文刊名：ZZDZ
英文刊名：Journal of Zhengzhou University(Natural Science Edition)
机构：闽南师范大学计算机学院;漳州职业技术学院计算机工程系;
出版日期：2016-10-17 13:39
出版单位：郑州大学学报(理学版)
年：2016
期：v.48
基金：国家自然科学基金资助项目(61303131,61672272);; 福建省高校新世纪优秀人才、福建省教育厅科技项目(JA14192)
语种：中文;
页：ZZDZ201603013
页数：7
CN：03
ISSN：41-1338/N
分类号：72-77+84

摘要

许多实际应用问题中,特征空间存在着层次粒化结构.首先,提出基于核方法度量的层次聚类来对特征空间进行层次粒化.其次,在层次粒化后的各个子空间上,基于邻域互信息考量特征和标记间最大相关以及特征与特征间最小冗余性,在某一指定的层次上对特征进行排序.在此基础上,选择各个子空间具有代表性的部分特征,组成最终的特征子集.最后,在6个UCI数据集和2个不同基分类器上的实验表明所提算法的有效性.
In many practical application problems,there is a hierarchical granular structure in feature space. Firstly,hierarchical clustering based on kernel method was proposed to conduct hierarchical granulation in feature space. Secondly,after hierarchical granulation,features were sorted at a specified level in each subspace by measuring the maximum correlation between labels and features,and the minimum redundancy between features based on the neighborhood mutual information. On this basis,some representative features were chosen in each subspace to form the final feature subset. Finally,the result with six UCI data sets and two different base classifiers confirmed the effectiveness of the proposed algorithm.

引文

[1]LIANG J Y,WANG F,DANG C Y,et al.An efficient rough feature selection algorithm with a multi-granulation view[J].Int J Approx Reason,2012,53(6):912-926.
    [2]GUYON I,ELISSEEFF A.An introduction to variable and feature selection[J].J Mach Learn Res,2003,3(6):1157-1182.
    [3]李霞,蒋盛益,郭艾侠.基于聚类和信息熵的特征选择算法[J].郑州大学学报(理学版),2009,41(1):77-80.
    [4]何华平,陈光建.一种最小测试代价约简的改进算法[J].郑州大学学报(理学版),2015,47(1):74-77.
    [5]王杰,蔡良健,高瑜.一种基于决策树的多示例学习算法[J].郑州大学学报(理学版),2016,48(1):81-84.
    [6]TANG J,ALELYANI S,LIU H.Data classification:algorithms and applications[M].Florida:Chemical Rubber Company Press,2014.
    [7]LI Y,GAO S Y,CHEN S.Ensemble feature weighting based on local learning and diversity[C]//Proceedings of the 26th AAAI conference on artificial intelligence.Edmonton,2012.
    [8]LIANG J,WANG F,DANG C,et al.An efficient rough feature selection algorithm with a multi-granulation view[J].Int J Approx Reason,2012,53(6):912-926.
    [9]ZHU W,SI G,ZHANG Y,et al.Neighborhood effective information ratio for hybrid feature subset evaluation and selection[J].Neurocomputing,2013,99(1):25-37.
    [10]LIN Y J,LI J J,LIN P R,et al.Feature selection via neighborhood multi-granulation fusion[J].Knowl-based Syst,2014,67(1):162-168.
    [11]LIN Y J,HU X G,WU X D,Quality of information-based source assessment and selection[J].Neurocomputing,2014,133(1):95-102.
    [12]刘景华,林梦雷,王晨曦,等.基于最大近邻粗糙逼近的特征选择算法[J].小型微型计算机系统,2015,36(8):1832-1836.
    [13]HU Q H,CHE X,ZHANG L,et al.Feature evaluation and selection based on neighborhood soft margin[J].Neurocomputing,2010,73(10):2114-2124.
    [14]彭鹏,闫晓琳.血常规检验中的常见误差观察研究[J].中国卫生标准管理,2015,(15):172-174.
    [15]胡清华,于达仁,谢宗霞.基于邻域粒化和粗糙逼近的数值属性约简[J].软件学报,2008,19(3):640-649.
    [16]HU Q H,ZHANG L,ZHANG D,et al.Measuring relevance between discrete and continuous features based on neighborhood mutual information[J].Expert Syst Appl,2011,38(9):10737-10750.
    [17]YU L,LIU H.Efficient feature selection via analysis of relevance and redundancy[J].J Mach Learn Res,2004,5(10):1205-1224.
    [18]ZHOU N,WANG L.A modified T-test feature selection method and its application on the Hap Map enotype data[J].Genomics proteomics bioinformatics,2007,5(Z1):242-249.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700