基于信息熵的特征选择算法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于信息熵的特征选择算法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：A Study on Feature Selection Algorithms Using Information Entropy
作者：刘华文
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：特征选择 ; 数据分类 ; 数据挖掘 ; 机器学习 ; 信息熵 ; 互信息 ; 学习算法 ; 集成学习 ; 数据聚类分析 ; Boosting学习
英文关键词：Feature selection ; Classification ; Data mining ; Machine learning ; Information entropy ; Mutual information ; Learning algorithm ; Ensemble learning ; Clustering analysis ; Boosting learning
学位年度：2010
导师：刘磊
学科代码：081202
学位授予单位：吉林大学
论文提交日期：2010-06-01

摘要

随着新技术的不断出现,现实中数据集朝着大规模方向发展,并呈现样本少、维数高等特点,这给传统的数据分类学习带来了巨大的挑战,其中冗余特征的存在间接加重这种不利影响。因此,如何从高维数据中剔除冗余或无关的特征,以避免维灾难问题,使得传统学习算法仍然能在高维数据上进行学习训练是目前人们面临的一道难题。特征选择就是在这种情况下提出的,它主要是指从数据的原始特征中选择一个最优特征子集,使得它包含原始特征的全部或大部分分类信息的过程。目前,特征选择是数据挖掘、统计模式识别和机器学习等领域的重要研究课题之一,在实际情况中也得到了广泛的应用。
     本文首先介绍数据挖掘中的分类问题,然后阐述特征选择问题的研究现状和研究热点等,并重点讨论信息度量标准。然后针对数据中冗余特征的消除问题,以信息理论中信息熵和互信息为基础,围绕如何度量特征之间相关性这一主体,开展了一系列的研究工作,以解决不同的数据分类预测问题。本文主要贡献和研究内容包括以下几个方面:
     1).根据数据挖掘中层次聚类算法的思想,提出了一种新的Filter特征选择算法ISFS。这种算法采用互信息和关联系数分别表示特征间的“类间距离”和“类内距离”,并依据层次聚类分析思想选择重要特征,以保证选择的特征子集具有最小的冗余性和最大的相关性,从而最终提高分类性能;
     2).针对现有特征选择算法中不同的信息度量标准,给出了一种泛化表示形式,并详细讨论该形式与其他信息标准之间的关系。此外,针对现有选择算法中互信息估值不准确的问题,提出了动态互信息的概念,以准确描述特征之间的相关性。在此基础上,还提出了基于动态互信息(DMIFS)和条件动态互信息(CDMI)这两种新的特征选择算法,以克服传统互信息选择算法的缺点,即不能准确反映选择过程中相关性的动态变化问题;
     3).针对现有特征选择算法中数据样本权重不变的问题,利用数据抽样技术,提出一种新的Boosting特征选择算法框架,以描述数据样本的重要性程度在特征选择过程中是不断发生变化的,从而避免了动态互信息对噪声数据比较敏感的问题。另外,还讨论了该选择算法框架中其他的相关问题,如样本权重更新方式和误差函数等;
     4).针对生物信息学中基因表达数据的样本少、维数高的特点,根据基因(即特征)之间的相似性,利用信息相关系数和近似Markov blanket技术对基因进行相似性分组,并结合集成学习技术,提出一种集成的基因选择算法EGSG,以最终提高分类模型的识别诊断能力。
With the emergence of new technologies and their rapid development, a large volume of data, which plays an increasingly role in our everyday life, in various fields has been accumulated. Nowdays, how to get important and useful information from the mass data and then take full advantage of these information to achieve realistic goals is becoming a problem facing to people. Under this context, the concept of data mining has been introduced. It refers to the process of figuring out those hidden and potentially useful information or knowledge from the mass data with noise. Data classification is one of the most extensively studied issues in data mining. Its main purpose is to obtain patterns or behavior from known data, so as to assist users in predicting or determining the patterns of unknown data. As feature of the patterns contains a lot of important information or knowledge, it becomes one of the theoretical bases for predicting behavior patterns of unknown data.
     However, lots of features in databases are redundant or irrelevant. They may bring great challenges to traditional classification algorithms, such as reducing the efficiency of learning algorithms, distracting the learning process, and ultimately resulting in the obtained model with poor quality and over-fitting. Additionally, the more features in a database, the more noise inhabited it. This, however, may bring learning algorithms into more adverse situations. To tackle with this problem, the concept of feature selection has been introduced. It refers to the process of identifying an optimal feature subset, which embodies most information of data, from the original space of features in terms of some given measurements. By now, feature selection is becoming one of hot topics in various fields, such as data mining, statistical pattern recognition and machine learning, and attracting many focuses of scholars, but also is widely used in practice.
     This study mainly focuses on the issue of removing redundant or useless features from databases. According to the characteristics of selection process, the relevant degree between features in this thesis is measured by information measurements, whose bases are information entropy and its related concepts. Moreover, this thesis has proposed five different kinds of feature selection algorithms using information measurements to deal with the prediction tasks in different situations of classification. Specifically, the related research work of this thesis is carried out in the following aspects:
     Firstly, a new filter feature selection algorithm, called ISFS, has been proposed. The idea of this method mainly originates from data clustering analysis. However, unlike traditional clustering algorithms, the data item in ISFS is a single feature, rather than sample in dataset. In addition, the class feature, i.e., the classificatory labels, of the dataset is taken as a special class or group, called label group, while other features are marked as candidate groups and selected group in ISFS, where the candidate and selected groups denote single candidate feature and selected features, respectively. In this case, the feature selection problem is now transferred into the issue of hierarchical clustering analysis in aggregated manner. Thus, during each clustering or selection procedure, the selected group will continuously combine with the candidate group, which has the greatest“distance”to the label group and the smallest“distance”to the selected group. This clustering procedure will be repeated until the number of features in the selected group is larger than a specified threshold.
     To measure the“distance”between data items, ISFS adopts two information criteria, i.e., mutual information and correlation coefficients, to measure the distances to the label group (inter-class distance) and the selected group (intra-class distance), respectively. Consequently, the purpose of clustering is to keep the selected group with the minimal intra-class distance and the maximal inter-class distance with the label group at the same time, and the feature with more discriminability would be selected in a higher priority during the whole clustering process.
     Secondly, a generalized representation of information metric has been introduced. This generalized representation can bring most information metrics used in feature selection algorithms into a unified framework. Moreover, the relationship between our measurement and others are discussed in detail, and then a unified framework for feature selection using information measurements is given.
     As information entropy or mutual information can effectively measure the degree of uncertainty of feature, they have been extensively studied in feature selection. However, the value of entropy or mutual information in existing selection algorithm may contain“false”information. As a result, it can not exactly represent the degree of relevance between features. To this end, dynamic mutual information is introduced to accurately represent the relevance between features, which dynamically changes during the selection process. Unlike traditional mutual information, the value of dynamic one is not estimated on the whole sample space, but unrecognized samples. Based on this concept, two feature selection algorithms using dynamic mutual information (DMIFS) and conditional ones (CDMI) have been proposed respectively. These algorithms work more like the method of decision tree in classification, where the samples, which can be identified by the newly selected feature, will be removed from the original space, and then the value of mutual information of candidate features will be estimated on the remaining samples.
     Thirdly, based on data sampling techniques, a new selection framework for feature selection algorithm using boosting technique has been presented. Additionally, several related issues about this framework, such as updating strategies and error functions, have also been discussed. The purpose of using boosting technique is to overcome the negative aspects resulted from the fact that the recognized samples would be removed in estimating the values of dynamic mutual information of features, and to alleviate the impact of noise on the whole process of feature selection.
     During the selection process, the proposed method estimates mutual information by dynamically adjusting the weights of samples, so as to the estimated values can accurately measure the correlation between features, which is dynamically changed along with the selection process. As a result, the selected feature at each time can exactly represent the characteristics of classification. Unlike other boosting selection algorithms, the result of our framework is a feature subset, rather than one or several classifiers. In addition, its updating strategy is not the error of base-classifier, that is, the information measurement in our method is independent of base-classifiers.
     Finally, for the less-sample and high-dimensionality of gene expression datasets in bioinformatics, an ensemble feature selection algorithm, named EGSG, has been proposed. This selection algorithm takes use of Information Correlation Coefficient to measure the relevance between genes, and approximate Markov blanket technique to divide genes into several groups, which represent different biological functions. After the partition stage, a feature subset is formed by picking a representation gene from each group. Since genes in the subset come from different groups, it has similar prediction capability to the original genes.
     Due to the limited capability of a single gene subset for high-dimensionality gene expression datasets, EGSG take the advantages of multiple gene subsets to further improve the prediction capability in disease diagnostic by ensemble technique. That is to say, EGSG obtains several gene subsets in the same way, and then aggregates many base-classifiers trained on these subsets into an overall one by virtue of the major voting strategy. Compared with other ensemble methods, the diversity of EGSG lies in combining different gene subsets, where each subset has similar biological function with each other and higher discriminative capability.
     Simulation experiments carried out on public datasets show the performance and effectiveness of the selection algorithms proposed in this thesis. They can effectively improve the efficiency and performance of learning algorithms for classification in most cases. Nevertheless, there still are several problems in the proposed algorithms. For example, ISFS has relatively poor efficiency, and CDMI and DMIFS are sensitive to noises. The performance of our boosting algorithm relies on the specific values of parameters, while the results of EGSG are difficult to be explained. Therefore, our future work will be carried out on these factors in order to further improve their performance and efficiencies.

引文

[1]边肇棋,张学工.模式识别[M].第二版.北京:清华大学出版社, 2000.
    [2]王国胤等. Rough集理论与知识获取[M].西安:西安交大出版社, 2001.
    [3] Fayyad U, Piatetsky-Shapiro G, Smyth P. From Data Mining to Knowledge Discovery in Databases [J]. AI Magazine, 1996, 17: 37-24.
    [4] Bellman R. Adaptive Control Processes: A Guided Tour [M]. Princeton: Princeton University Press, 1961.
    [5] Jain A K, Duin R P W, Mao J. Statistical Pattern Recognition: A Review [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(1): 4 - 37.
    [6] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning [M]. Berlin-Heidelberg: Springer, 2001.
    [7] Langley P. Selection of relevant features in machine learning [C] // Proc of the AAAI Fall Symposium on Relevance, Menlo Park, CA, 1994, 140-144.
    [8] Jain A K, Zongker D. Feature selection: evaluation, application, and small sample performance [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(2): 153-158.
    [9] Kohavi R, John G H. Wrappers for feature subset selection [J]. Artificial Intelligence, 1997, 97(1-2): 273-324.
    [10] Dash M, Liu H. Dimensionality Reduction [M] //Wah B W. eds. Encyclopedia of Computer Science and Engineering. Hoboken, New York: John Wiley & Sons. 2009: 958-966.
    [11] Fodor I K, A survey of dimension reduction techniques Number [R]. Lawrence Livermore National Laboratory, US Department of Energy, 2002.
    [12] Liu H, Motoda H. Feature Selection for Knowledge Discovery and Data Mining [M]. Boston: Kluwer Academic Publishers, 1998.
    [13] Liu H, Yu L. Toward Integrating Feature Selection Algorithms for Classification and Clustering [J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(4): 491-502.
    [14] Duda R O, Hart P E, Stork D G. Pattern Classification [M]. 2nd ed. New York: John Wiley & Sons, 2001.
    [15] Quinlan J R. C4.5: Programs for Machine Learning [M]. CA: Morgan Kaufmann, 1993.
    [16]刘大有等.知识系统中不确定性合模糊性处理的数值方法[M].长春:吉林大学出版社, 2000.
    [17] Davies S, Russell S. NP-Completeness of Searches for Smallest Possible Feature Sets [C] // Proc of the 1994 AAAI Fall Symposium on Relevance, 1994, 37 - 39.
    [18] Kira K, Rendell L. A practical approach to feature selection [C] // Proc of the 9th International Conference on Machine Learning, 1992, 249-256.
    [19] Kononenko I. Estimation attributes: analysis and extensions of RELIEF [C] // Proc of the European Conference on Machine Learning, Catania, Italy, 1994, 171-182.
    [20] Sun Y. Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(6): 1035-1051.
    [21] Dash M, Liu H. Consistency-based search in feature selection [J]. Artificial Intelligence, 2003, 151(1-2): 155-176.
    [22] Wei H-L, Billings S A. Feature Subset Selection and Ranking for Data Dimensionality Reduction [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(1): 162-166.
    [23] Abe N, Kudo M. Non-parametric classifier-independent feature selection[J]. Pattern Recognition, 2006, 39: 737-746.
    [24] Yu L, Liu H. Efficient Feature Selection via Analysis of Relevance and Redundancy [J]. Journal of Machine Learning Research, 2004, 5: 1205-1224.
    [25] Peng H, Long F, Ding C. Feature Selection Based on Mutual Information: Criteria of Max- Dependency, Max-Relevance, and Min-Redundancy [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238.
    [26] Bell D A, Wang H. A Formalism for Relevance and Its Application in Feature Subset Selection [J]. Machine Learning, 2000, 41: 175-195.
    [27] Guyon I, Weston J, Barnhill S, et al. Gene selection for cancer classification using support vector machines [J]. Machine Learning, 2002, 46(1-3): 389-422.
    [28] Wang L, Zhu J, Zou H. Hybrid huberized support vector machines for microarray classificationand gene selection [J]. Bioinformatics, 2008, 24(3): 412-419.
    [29] Liu J, Ranka S, Kahveci T. Classification and feature selection algorithms for multi-class CGH data [J]. Bioinformatics, 2008, 24: i86-i95.
    [30] Zhou X, Tuck D P. MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data [J]. Bioinformatics, 2007, 23(9): 1106-1114.
    [31] Huang J, Cai Y, Xu X. A hybrid genetic algorithm for feature selection wrapper based on mutual information [J]. Pattern Recognition Letters, 2007, 28: 1825-1844.
    [32] Jarvis R M, Goodacre R. Genetic algorithm optimization for preprocessing and variable selection of spectroscopic data [J]. Bioinformatics, 2005, 21(7): 860-868.
    [33] Zhu Z, Ong Y-S, Dash M. Markov blanket-embedded genetic algorithm for gene selection [J]. Pattern Recognition, 2007, 40: 3236-3248.
    [34] Sebban M, Nock R. A hybrid filter/wrapper approach of feature selection using information theory [J]. Pattern Recognition, 2002, 35(4): 835-846.
    [35] Das S. Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection [C] // Proc of the 18th International Conference on Machine Learning, San Francisco, CA, USA, 2001, 74-81.
    [36] Zhang J-G, Deng H-W. Gene selection for classification of microarray data based on the Bayes error [J]. BMC Bioinformatics, 2007, 8: 370-378.
    [37] Bacauskiene M, Verikas A, Gelzinis A, et al. A feature selection technique for generation of classification committees and its application to categorization of laryngeal images [J]. Pattern Recognition, 2009, 42(5): 645-654.
    [38] Alexe G, Alexe S, Hammer P L, et al. Pattern-based feature selection in genomics and proteomics [J]. Annels Operations Research, 2006, 148: 189-201.
    [39] Yin X-C, Liu C-P, Han Z. Feature combination using boosting [J]. Pattern Recognition Letters, 2005, 26: 2195-2205.
    [40] Hung W-L, Yang M-S, Chen D-H. Bootstrapping approach to feature-weight selection in fuzzy c-means algorithms with an application in color image segmentation [J]. Pattern Recognition Letters, 2008, 29(9): 1317-1325.
    [41] Lu Y, Tian Q, Neary J, et al. Adaptive Discriminant Analysis for Microarray-Based Classification [J]. ACM Transactions on Knowledge Discovery from Data, 2008, 2(1): Article 5.
    [42] Li G-Z, Meng H-H, Lu W-C, et al. Asymmetric bagging and feature selection for activities prediction of drug molecules [J]. BMC Bioinformatics, 2008, 9(S6): S7.
    [43] Skurichina M, Duin R P W. Combining Feature Subsets in Feature Selection [C] // Proc of the 6th International Workshop on Multiple Classifier System, Seaside, USA, 2005, 165-175.
    [44] Bertoni A, Folgieri R, Valentini G. Feature selection combined with random subspace ensemble for gene expression based diagnosis of malignancies [M] //Apolloni B. eds. Biological and Artificial Intelligence Environments. Berlin-Heidelberg: Springer. 2005: 29-35.
    [45] Draminski M, Rada-Iglesias A, Enroth S, et al. Monte Carlo feature selection for supervised classification [J]. Bioinformatics, 2008, 24(1): 110-117.
    [46] Strobl C, Boulesteix A-L, Kneib T, et al. Conditional variable importance for random forests [J]. BMC Bioinformatics, 2008, 9: 307-317.
    [47] de Sousa E P M, Jr C T, Traina A J M, et al. A fast and effective method to find correlations among attributes in databases [J]. Data Mining and Knowledge Discovery, 2007, 14: 367-407.
    [48] Bhavani S D, Rani T S, Bapi R S. Feature selection using correlation fractal dimension: Issues and applications in binary classification problems[J]. Applied Soft Computing, 2008, 8:555-563.
    [49] Hsing T, Liu L-Y, Brun M, et al. The coefficient of intrinsic dependence (feature selection using el CID) [J]. Pattern Recognition, 2005, 38: 623-636.
    [50] de Souza J T, Matwin S, Japkowicz N. Parallelizing Feature Selection [J]. Algorithmica, 2006, 45: 433–456.
    [51] Qu Y, Adam B, Thornquist M, et al. Data Reduction Using a Discrete Wavelet Transform in Discriminant Analysis of Very High Dimensionality Data [J]. Biometrics, 2003, 59: 143-151.
    [52] Shen Q, Jensen R. Approximation-based feature selection and application for algae population estimation [J]. Artificial Intelligence, 2008, 28: 167-181.
    [53] Okun O, Priisalu H. Dataset complexity in gene expression based cancer classification using ensembles of k-nearest neighbors [J]. Artificial Intelligence in Medicine, 2009, 45(2-3): 151-162.
    [54] Cover T M, Thomas J A. Elements of Information Theory [M]. New York: Wiley, 1991.
    [55] Beirlant J, Dudewicz E J, Gyorfi L, et al. Nonparametric Entropy Estimation: An Overview [J]. International Journal of the Mathematical Statistics Sciences, 1997, 6: 17-39.
    [56] Moon Y-I, Rajagopalan B, Lall U. Estimation of mutual information using kernel density estimators [J]. Phys Rev E, 1995, 52(3): 2318-2321.
    [57] Parzen E. On Estimation of a Probability Density Function and Mode [J]. Annals of Math Statistics, 1962, 33: 1065-1076.
    [58] Gomez-Verdejo V, Verleysen M, Fleury J. Information-theoretic feature selection for functional data classification [J]. Neurocomputing, 2009, 72(16-18): 3580-3589.
    [59] Kwak N, Choi C-H. Input feature selection by mutual information based on Parzen window [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(12): 1667-1671.
    [60] Huang D, Chow T W S. Effective feature selection scheme using mutual information [J]. Neurocomputing, 2005, 63: 325-343.
    [61] Kozachenko L F, Leonenko N N. Sample estimate of the entropy of a random vector [J]. Problems of Information Transmission, 1987, 23(2): 95-101.
    [62] Hero A O, Michael O. Asymptotic theory of greedy approximations to minimal k-point random graphs [J]. IEEE Transactions on Information Theory, 1999, 45(6): 1921-1939.
    [63] Bonev B, Escolano F, Cazorla M. Feature selection, mutual information, and the classification of high-dimensional patterns [J]. Pattern Analysis and Application, 2008, 11: 309-319.
    [64] Neemuchwala H, Hero A, Carson P. Image registration methods in high-dimensional space [J]. International Journal of Imaging Systems and Technology, 2006, 16(5): 130-145.
    [65] Molina L C, Belanche L, Nebot A, Feature Selection Algorithms: A Survey and Experimental Evaluation Number [R]. Barcelona, Spain: Universitat Politècnica de Catalunya, 2002.
    [66] Somol P, Novovicova J, Pudil P. Notes on the Evolution of Feature Selection Methodology [J]. Kybernetika, 2007, 43(5): 713-730.
    [67] Somol P, Pudil P, Kittler J. Fast branch & bound algorithms for optimal feature selection [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(7): 900-912.
    [68] Arauzo-Azofra A, Benitez J M, Castro J L. Consistency measures for feature selection [J]. Journal of Intelligent Information System, 2008, 30: 273-292.
    [69] Devijver P A, Kittler J. Pattern Recognition– A Statistical Approach [M]. London: Prentice Hall, 1992.
    [70] Mitra P, Murthy C A, Pal S K. Unsupervised feature selection using feature similarity [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(3): 301-312.
    [71] Hall M A. Correlation-based Feature Subset Selection for Machine Learning [D]. Hamilton, NewZealand: University of Waikato, 1999.
    [72] Zhang D, Chen S, Zhou Z-H. Constraint score: A new filter method for feature selection with pairwise constraints [J]. Pattern Recognition, 2008, 41(5): 1440-1451.
    [73] Neumann J, Schnorr C, Steidl G. Combined SVM-Based Feature Selection and Classification [J]. Machine Learning, 2005, 61: 129-150.
    [74] Dash M, Liu H. Feature selection for classification [J]. Intelligent Data Analysis: An International Journal, 1997, 1(3): 131-156.
    [75] Wu S, Flach P A. Feature selection with labelled and unlabelled data [C] // Proc of ECML/PKDD'02 workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning. 2002: 156-167.
    [76] Song Y, Nie F, Zhang C, et al. A unified framework for semi-supervised dimensionality reduction [J]. Pattern Recognition, 2008, 41(9): 2789-2799.
    [77] Zhao Z, Liu H. Semi-supervised feature selection via spectral analysis [C] // Proc of the 7th SIAM International Conference on Data Mining, Minneapolis, MN, 2007.
    [78] Dy J G, Brodley C E. Feature Selection for Unsupervised Learning [J]. Journal of Machine Learning Research, 2004, 5: 845-889.
    [79] Li Y, Dong M, Hua J. Localized Feature Selection for Clustering [J]. Pattern Recognition Letters, 2008, 29(1): 10-18.
    [80] Hong Y, Kwong S, Chang Y, et al. Consensus unsupervised feature ranking from multiple views [J]. Pattern Recognition Letters, 2008, 29: 595-602.
    [81] Zhao Z, Liu H. Spectral feature selection for supervised and unsupervised learning [C] // Proc of the 24th international Conference on Machine Learning. Corvalis, Oregon. 2007: 1151-1157.
    [82] Zhu X, Semi-supervised learning literature survey Number [R]. Madison, WI: Department of Computer Sciences, University of Wisconsin, 2007.
    [83] Breiman L, Friedman J H, Olshen R A, et al. Classification and Regression Trees [M]. Belmont, California: Wadsworth, 1984.
    [84] Li G-Z, Yang J Y. Feature Selection for Ensemble Learning and Its Application [M] //Zhang Y-Q, Rajapakse JC. eds. Machine Learning in Bioinformatics. NY: John Wiley&Sons. 2008: 135-155.
    [85] Langley P, Iba W, Thompson K. An Analysis of Bayesian Classifiers [C] // Proc of the 10th National Conference on Artificial Intelligence, 1992, 223-228.
    [86] Torkkola K. Feature Extraction by Non-Parametric Mutual Information Maximization [J]. Journal of Machine Learning Research, 2003, 3: 1415-1438.
    [87] Forman G. An Extensive Empirical Study of Feature Selection Metrics for Text Classification [J]. Journal of Machine Learning Research, 2003, 3: 1289-1305.
    [88] Leopold E, Kindermann J. Text categorization with support vector machines: how to represent texts in input space? [J]. Machine Learning, 2002, 46: 423-444.
    [89] Rui Y, Huang T S, Chang S. Image retrieval: Current techniques, promising directions and open issues [J]. Visual Communication and Image Representation, 1999, 10(4): 39-62.
    [90] Dy J G, Brodley C E, Kak A C, et al. Unsupervised feature selection applied to content-based retrieval of lung images [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(3): 373-378.
    [91] Saeys Y, Inza I, Larranaga P. A review of feature selection techniques in bioinformatics [J]. Bioinformatics, 2007, 23(19): 2507-2517.
    [92] Xing E, Jordan M, R. Karp. Feature selection for high-dimensional genomic microarray data [C] // Proc of the 8th International Conference on Machine Learning, 2001, 601-608.
    [93] Lee W, Stolfo S J, Mok K W. Adaptive intrusion detection: A data mining approach [J]. Artificial Intelligence Review, 2000, 14(6): 533-567.
    [94] Blum A L, Langley P. Selection of Relevant Features and Examples in Machine Learning [J]. Artificial Intelligence 1997, 97: 245-271.
    [95] Guyon I, Elisseeff A. An Introduction to Variable and Feature Selection [J]. Journal of Machine Learning Research, 2003, 3: 1157-1182.
    [96] Robnik-Sikonja M, Kononenko I. Theoretical and Empirical Analysis of ReliefF and RReliefF [J]. Machine Learning, 2003, 53: 23-69.
    [97] Yu L. Feature Selection for Genomic Data Analysis [M] //Liu H, Motoda H. eds. Computational Methods of Feature Selection. London: Chapman & Hall/CRC. 2008: 337-353.
    [98] Lai C, Reinders M J T, Wessels L. Random subspace method for multivariate feature selection [J]. Pattern Recognition Letters, 2006, 27: 1067-1076.
    [99] Stracuzzi D J, Utgoff P E. Randomized Variable Elimination [J]. Journal of Machine Learning Research, 2004, 5: 1331-1362.
    [100] Han J, Kamber M. Data Mining: Concepts and Techniques [M]. CA: Morgan Kaufmann, 2001.
    [101] Kerr G, Ruskin H J, Crane M, et al. Techniques for clustering gene expression data [J]. Computers in Biology and Medicine, 2008, 38: 283-293.
    [102] Hall M A, Holmes G. Benchmarking Attribute Selection Techniques for Discrete Class Data Mining [J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(3): 1041-4347.
    [103] Yang Y, Pedersen J O. A Comparative Study on Feature Selection in Text Categorization [C] // Proc of the 14th International Conference on Machine Learning, Nashville, US, 1997, 412-420.
    [104] Novovicova J, Somol P, Haindl M, et al. Conditional Mutual Information Based Feature Selection for Classification Task [C] // Proc of the 12th Iberoamericann Congress on Pattern Recognition, Valparaiso, Chile, November 13-16, 2007, 417-426.
    [105] Asuncion A, Newman D J. UCI Machine Learning Repository [EB/OL], Irvine: School of Inf and Comp Sci, Univ of California, http://www.ics.uci.edu/~mlearn/MLRepository.html, 2007.
    [106] Liu H, Hussain F, Tan C L, et al. Discretization: An Enabling Technique [J]. Data Mining and Knowledge Discovery, 2002, 6: 393-423.
    [107] Witten I H, Frank E. Data Mining - Pracitcal Machine Learning Tools and Techniques with JAVA Implementations [M]. 2nd ed. CA: Morgan Kaufmann, 2005.
    [108] Liang J, Yang S, Winstanley A. Invariant Optimal Feature Selection: A Distance Discriminant and Feature Ranking Based Solution [J]. Pattern Recognition, 2008, 41(5): 1429-1439.
    [109] Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data [J]. Journal of Bioinformatics and Computational Biology, 2005, 3(2): 185-205.
    [110] Fleuret F. Fast Binary Feature Selection with Conditional Mutual Information [J]. Journal of Machine Learning Research, 2004, 5: 1531-1555.
    [111] Liu H, Liu L, Zhang H. Feature Selection using Mutual Information: An Experimental Study [C] // Proc of the 10th Pacific Rim Intern Conf on Artificial Intelligence, 2008, 235-246.
    [112] Torkkola K. Information-Theoretic Methods [M] // Guyon I, Gunn S, et al. eds. Feature Extraction: Foundations and Applications. Berlin-Heidelberg: Springer. 2006: 167-186.
    [113] Hild K E, Erdogmus D, Torkkola K, et al. Feature Extraction Using Information-Theoretic Learning [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(9): 1385 - 1392.
    [114] Battiti R. Using Mutual Information for Selecting Features in Supervised Neural Net Learning [J]. IEEE Transactions on Neural Networks, 1994, 5(4): 537-550.
    [115] Qu G, Hariri S, Yousif M. A New Dependency and Correlation Analysis for Features [J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(9): 1199-1207.
    [116] Cai R, Hao Z, Yang X, et al. An efficient gene selection algorithm based on mutual information [J]. Neurocomputing, 2009, 72(4-6): 991-999.
    [117] Wang G, Lochovsky F H, Yang Q. Feature Selection with Conditional Mutual Information MaxiMin in Text Categorization [C] // Proc of the 13th ACM international conference on Information and knowledge management, Washington, D.C., USA, 2004, 342-349.
    [118] Levi D, Ullman S. Learning to classify by ongoing feature selection [J]. Image and Vision Computing, 2010, 28(4): 715-723.
    [119] Meyer P E, Schretter C, Bontempi G. Information-Theoretic Feature Selection in Microarray Data Using Variable Complementarity [J]. IEEE Journal of Selected Topics in Signal Processing, 2008, 2(3): 261-274.
    [120] Estevez P A, Tesmer M, Perez C A, et al. Normalized Mutual Information Feature Selection [J]. IEEE Transactions on Neural Networks, 2009, 20(2): 189-201.
    [121] John G H, Kohavi R, Pfleger K. Irrelevant feature and the subset selection problem [C] // Proc of the 11th International Conference on Machine Learning, San Francisco, CA, 1994, 121-129.
    [122] Hua J, Tembeb W D, Doughertya E R. Performance of feature-selection methods in the classification of high-dimension data [J]. Pattern Recognition, 2009, 42: 409-424.
    [123]金勇进,杜子芳,蒋妍.抽样技术[M].第2版.北京:中国人民大学出版社, 2008.
    [124] Lindenbaum M, Markovitch S, Rusakov D. Selective Sampling for Nearest Neighbor Classifiers [J]. Machine Learning, 2004, 54: 125-152.
    [125] Lee S D, Cheung D W, Kao B. Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules [J]. Data Mining and Knowledge Discovery, 1998, 2: 233-262.
    [126] Rocke D M, Dai J. Sampling and Subsampling for Cluster Analysis in Data Mining:With Applications to Sky Survey Data [J]. Data Mining and Knowledge Discovery, 2003, 7: 215-232.
    [127] Schein A I, Ungar L H. Active learning for logistic regression: an evaluation [J]. Machine Learning, 2007, 68: 235-265.
    [128]周春光,梁艳春.计算智能:人工神经网络模糊系统进化计算[M].长春:吉林大学出版社, 2001.
    [129] Freund Y, Schapire R. Experiments with a New Boosting Algorithm [C] // Proc of the 13th International Conference on Machine Learning, Bari, Italy, 1996, 148-156.
    [130] Schapire R E. The Strength of Weak Learn ability [J]. Machine Learning, 1990, 5(2): 197-227.
    [131] Valiant L G. A theory of the learnable [J]. Communications of the ACM, 1984, 27(22): 1134-1142.
    [132] Kearns M, Valiant L G, Learning Boolean formulate or finite automata is as hard as factoring Number [R]. Aiken Computation Laboratory, Harvard University, 1988.
    [133] Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting [J]. Journal of Computer and System Sciences, 1997, 55(1): 119-139.
    [134] Duffy N, Helmbold D. Boosting Methods for Regression [J]. Machine Learning, 2002, 47(2-3): 153-200.
    [135] Buhlmann P, Hothorn T. Boosting algorithms: regularization, prediction and model fitting [J]. Statistical Science, 2007, 22(4): 477-505.
    [136] Tieu K, Viola P. Boosting Image Retrieval [J]. International Journal of Computer Vision, 2004, 56(1-2): 17-36.
    [137] Redpath D B, Lebart K. Boosting Feature Selection [C] // Proc of the 3rd International Conference on Advances in Pattern Recognition, Bath, UK, 2005, 305-314.
    [138] Xu X, Zhang A. Boost Feature Subset Selection: A New Gene Selection Algorithm for Microarray Dataset [C] // Proc of International Conference on Computational Science (2), UK, 2006, 670-677.
    [139] Bryll R, Gutierrez-Osuna R, Quek F. Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets [J]. Pattern Recognition, 2003, 36: 1291-1302.
    [140] Liu H, Motoda H, Yu L. A Selective Sampling Approach to Active Feature Selection [J]. Artificial Intelligence, 2004, 159(1-2): 49-74.
    [141] Francois D, Rossi F, Wertz V, et al. Resampling methods for parameter-free and robust feature selection with mutual information [J]. Neurocomputing, 2007, 70: 1276-1288.
    [142] Ho T K. The Random Subspace Method for Constructing Decision Forests [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(8): 832-844.
    [143] Tsymbal A, Puuronen S, Patterson D W. Ensemble feature selection with the simple Bayesian classification [J]. Information Fusion, 2003, 4: 87-100.
    [144] Oliveira L S, Morita M, Sabourin R. Feature selection for ensembles applied to handwriting recognition [J]. International Journal of Document Analysis, 2006, 8(4): 262-279.
    [145] Tumer K, Oza N C. Input decimated ensembles [J]. Pattern Analysis and Applications, 2003, 6(1): 65-77.
    [146] Lu Y, Han J. Cancer classification using gene expression data [J]. Information Systems, 2003, 28(4): 243-268.
    [147]卢圣栋.生物技术与疾病诊断兼论人类基因治疗[M].北京:化学工业出版社. 2003.
    [148] Golub T R, Slonim D K, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring [J]. Science, 1999, 286(5439): 531-537.
    [149] Russel P. Fundamentals of Genetics [M]. Addison Wesley, Longman Inc., 2000.
    [150] Boulesteix A L, Strobl C, Augustin T, et al. Evaluating Microarray-based Classifiers: An Overview [J]. Cancer Informatics, 2008, 6: 77-97.
    [151] Dupuy A, Simon R M. Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting [J]. Journal of the National Cancer Institute, 2007, 9(2): 147-157.
    [152] Natsoulis G, Ghaoui L E, Lanckriet G R G, et al. Classification of a large microarray data set: Algorithm comparison and analysis of drug signatures [J]. Genome Research, 2005, 15: 724-736.
    [153] Larranaga P, Calvo B, Santana R, et al. Machine learning in bioinformatics [J]. Briefings in Bioinformatics, 2006, 7(1): 86-112.
    [154] Somorjai R L, Dolenko B, Baumgartner R. Class prediction and discovery using gene microarray and proteomics mass spectrometry data: curses, caveats, autions. [J]. Bioinformatics, 2003, 19: 1484-1491.
    [155] Hilario M, Kalousis A. Approaches to dimensionality reduction in proteomic biomarker studies [J]. Briefings in Bioinformatics, 2008, 9(2): 102-118.
    [156] Nam D, Kim S-Y. Gene-set approach for expression pattern analysis [J]. Briefings in Bioinformatics, 2008, 9(3): 189-197.
    [157] Statnikov A, Wang L, Aliferis C F. A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification [J]. BMC Bioinformatics, 2008, 9: 319-328.
    [158] Cho S-B, Won H-H. Data Mining for Gene Expression Profiles from DNA Microarry [J]. Journal of Software Engineering and Knowledge Engineering, 2003, 13(6): 593-608.
    [159] Cho S-B, Won H-H. Cancer classification using ensemble of neural networks with multiple significant gene subsets [J]. Applied Intelligence, 2007, 26: 243-250.
    [160] Hong J-H, Cho S-B. The classification of cancer based on DNA microarray data that uses diverse ensemble genetic programming [J]. Artificial Intelligence in Medicine, 2006, 36: 43-58.
    [161] Lee Z-J. An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer [J]. Artificial Intelligence in Medicine, 2008, 42: 81-93.
    [162] Moon H, Ahn H, Kodell R L, et al. Ensemble methods for classification of patients for personalized medicine with high-dimensional data [J]. Artificial Intelligence in Medicine, 2007, 41: 197-207.
    [163] Saeys Y, Abeel T, van de Peer Y. Robust Feature Selection Using Ensemble Feature Selection Techniques [C] // Proc of European Conference on Machine Learning and Knowledge Discovery in Databases, ECML/PKDD 2008, Antwerp, Belgium, 2008, 313-325.
    [164] Ein-Dor L, Kela I, Getz G, et al. Outcome signature genes in breast cancer: is there a unique set? [J]. Bioinformatics, 2005, 21: 171-178.
    [165] Dietterieh T G. Machine leaming researeh: Four current directions [J]. AI Magazine, 1997, 18(4): 97-136.
    [166] Hansen L K, Salamon P. Neural network ensembles [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990, 12(10): 993-1001.
    [167] Dietterich T. Ensemble methods in machine learning [C] // Proc of the first International Workshop on Multiple Classifier Systems, 2000, 1-15.
    [168] Zhou Z-H, Wu J, Tang W. Ensembling neural networks: Many could be better than all [J]. Artificial Intelligence, 2002, 137(1-2): 239-263.
    [169] Wolpert D H, Macready W G. No free lunch theorems for optimization [J]. IEEE Transactions on Evolutionary Computation, 1997, 1(1): 67-82.
    [170] Xu L, Krzyzak A, Suen C Y. Several Methods for Combining Multiple Classifiers and Their Applications in Handwritten Character Recognition [J]. IEEE Trans on System, Man and Cybernetics, 1992, 22(3): 418-435.
    [171] Breiman L. Bagging predictors [J]. Machine Learning, 1996, 24(2): 123-140.
    [172] Huang D, Chow T W S. Effective Gene Selection Method with Small Sample Sets Using Gradient-Based and Point Injection Techniques [J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2007, 4(3): 467-475.
    [173] Kim K-J, Cho S-B. An Evolutionary Algorithm Approach to Optimal Ensemble Classifiers for DNA Microarray Data Analysis [J]. IEEE Transactions on Evolutionary Computation, 2008, 12(3): 377-388.
    [174] Ressom H W, Varghese R S, Abdel-Hamid M, et al. Analysis of mass spectral serum profiles for biomarker selection [J]. Bioinformatics, 2005, 21(21): 4039-4045.
    [175] Yeh J-Y. Applying data mining techniques for cancer classification on gene expression data [J]. Cybernetics and Systems: An International Journal, 2008, 39: 583-602.
    [176] van't Veer L J, Dai H, van de Vijver M J, et al. Gene expression profiling predicts clinical outcome of breast cancer [J]. Nature, 2002, 415(6871): 530-536.
    [177] Kim K-J, Cho S-B. Ensemble classifiers based on correlation analysis for DNA microarray classification [J]. Neurocomputing, 2006, 70: 187-199.
    [178] Su Y, Murali T M, Pavlovic V, et al. RankGene: identification of diagnostic genes based on expression data [J]. Bioinformatics, 2003, 19(12): 1578-1579.
    [179] Au W-H, Chan K C C, Wong A K C, et al. Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data [J]. IEEE/ACM Transactions on Computational Biology and Bioinformatcs, 2005, 2(2): 83-101.
    [180] Yu L, Ding C, Loscalzo S. Stable Feature Selection via Dense Feature Groups [C] // Proc of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, Las Vegas, USA, 2008, 803-811.
    [181] Ruiz R, Riquelme J C, Aguilar-Ruiz J S. Incremental wrapper-based gene selection from microarray data for cancer classification [J]. Pattern Recognition, 2006, 39: 2383-2392.
    [182] Li X, Rao S, Wang Y, et al. Gene mining: a novel and powerful ensemble decision approach to hunting for disease genes using microarray expression profiling [J]. Nucleic Acids Research, 2004, 32(9): 2685-2694.
    [183] Tan A C, Gilbert D. Ensemble machine learning on gene expression data for cancer classification [J]. Applied Bioinformatics, 2003, 2(S3): S75-S83.
    [184] Wang L, Zhou N, Chu F. A General Wrapper Approach to Selection of Class-Dependent Features [J]. IEEE Transactions on Neural Networks, 2008, 19(7): 1267-1278.
    [185] Diaz-Uriarte R, de Andres S A. Gene selection and classification of microarray data using random forest [J]. BMC Bioinformatics, 2006, 7: 3-15.
    [186] Zeng X-Q, Li G-Z, Yang J Y, et al. Dimension Reduction with Redundant Genes Elimination for Tumor Classification [J]. BMC Bioinformatics, 2008, 9(S6): S8.
    [187] Li T, Zhang C, Ogihara M. A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression [J]. Bioinformatics, 2004, 20: 2429-2437.
    [188] Jiang D, Tang C, Zhang A. Cluster Analysis for Gene Expression Data: A Survey [J]. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(11): 1370-1386
    [189] Koller D, Sahami M. Toward optimal feature selection [C] // Proc of the 13th International Conference on Machine Learning, 1996, 284-292.
    [190] Tsymbal A, Pechenizkiy M, Cunningham P. Diversity in search strategies for ensemble feature selection [J]. Information Fusion, 2005, 6: 83-98.
    [191] Pomeroy S L, Tamayo P, Gaasenbeek M, et al. Prediction of central nervous system embryonal tumour outcome based on gene expression [J]. Nature, 2002, 415(6870): 436-442.
    [192] Alon U, Barkai N, Notterman D A, et al. Broad patterns of gene expression revealed by clusteringanalysis of tumor and normal colon tissues probed by oligonucleotide arrays [C] // Proc of the National Academy of Sciences of the United States of America, 1999, 6745-6750.
    [193] Singh D, Febbo P G, Ross K, et al. Gene expression correlates of clinical prostate cancer behavior [J]. Cancer Cell, 2002, 1: 203-209.
    [194] Yang K, Cai Z, Li J, et al. A stable gene selection in microarray data analysis [J]. BMC Bioinformatics, 2006, 7: 228-243.
    [195] Davis C A, Gerick F, Hintermair V, et al. Reliable gene signatures for microarray classification: assessment of stability and performance [J]. Bioinformatics, 2006, 22(19): 2356-2363.
    [196] Kalousis A, Prados J, Hilario M. Stability of feature selection algorithms: a study on high-dimensional spaces [J]. Knowledge and Information Systems, 2007, 12(1): 95-116.
    [197] Domingos P. A unified bias-variance decomposition and its applications [C] // Proc of the 17th International Conference on Machine Learning, San Fransisco, 2000, 231-238.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700