摘要
维度灾难是机器学习任务中的常见问题,特征选择算法能够从原始数据集中选取出最优特征子集,降低特征维度.提出一种混合式特征选择算法,首先用卡方检验和过滤式方法选择重要特征子集并进行标准化缩放,再用序列后向选择算法(SBS)与支持向量机(SVM)包裹的SBS-SVM算法选择最优特征子集,实现分类性能最大化并有效降低特征数量.实验中,将包裹阶段的SBS-SVM与其他两种算法在3个经典数据集上进行测试,结果表明,SBS-SVM算法在分类性能和泛化能力方面均具有较好的表现.
Dimensional disaster is a common problem in machine learning tasks. The feature selection algorithm can select the optimal feature subset from the original data set and reduce the feature dimension. A hybrid feature selection algorithm is proposed. Firstly, the chi-square test and filtering method are used to select the important feature subsets and normalize scale, and then SBS-SVM wrapped by SBS and SVM. The algorithm selects the optimal feature subset to maximize the classification performance and effectively reduce the number of features. In the experiment, the SBS-SVM in the parcel stage and the other two algorithms are tested on three classical data sets. The results show that the SBS-SVM algorithm has better performance in classification performance and generalization ability.
引文
1黄铉.特征降维技术的研究与进展.计算机科学,2018,45(6A):16-21,53.
2Liu H,Yu L.Toward integrating feature selection algorithms for classification and clustering.IEEE Transactions on Knowledge and Data Engineering,2005,17(4):491-502.[doi:10.1109/TKDE.2005.66]
3初蓓,李占山,张梦林,等.基于森林优化特征选择算法的改进研究.软件学报,2018,29(9):2547-2558.[doi:10.13328/j.cnki.jos.005395]
4Almuallim H,Dietterich TG.Learning Boolean concepts in the presence of many irrelevant features.Artificial Intelligence,1994,69(1-2):279-305.[doi:10.1016/0004-3702(94)90084-1]
5Pudil P,Novovi?ováJ,Kittler J.Floating search methods in feature selection.Pattern Recognition Letters,1994,15(11):1119-1125.[doi:10.1016/0167-8655(94)90127-9]
6Fujarewicz K,Wiench M.Selecting differentially expressed genes for colon tumor classification.International Journal of Applied Mathematics and Computer Science,2003,13(3):327-335.
7Kabir MM,Shahjahan M,Murase K.A new hybrid ant colony optimization algorithm for feature selection.Expert Systems with Applications,2012,39(3):3747-3763.[doi:10.1016/j.eswa.2011.09.073]
8Mao Y,Zhou XB,Xia Z,et al.A survey for study of feature selection algorithms.Pattern Recognition and Artificial Intelligence,2007,20(2):211-218.
9叶小泉,吴云峰.基于支持向量机递归特征消除和特征聚类的致癌基因选择方法.厦门大学学报(自然科学版),2018,57(5):702-707.
10Tan KC,Teoh EJ,Yu Q,et al.A hybrid evolutionary algorithm for attribute selection in data mining.Expert Systems with Applications,2009,36(4):8616-8630.[doi:10.1016/j.eswa.2008.10.013]
11谢娟英,谢维信.基于特征子集区分度与支持向量机的特征选择算法.计算机学报,2014,37(8):1704-1718.
12雷海锐,高秀峰,刘辉.基于机器学习的混合式特征选择算法.电子测量技术,2018,41(16):42-46.
13武小年,彭小金,杨宇洋,等.入侵检测中基于SVM的两级特征选择方法.通信学报,2015,36(4):2015127.
14Platt JC.Fast training of support vector machines using sequential minimal optimization.Sch?lkopf B,Burges CJC,Smola AJ.Advances in Kernel Methods:Support Vector Learning.Cambridge:MIT Press,1998:185-208.