用户名: 密码: 验证码:
基于粒化机理的粗糙特征选择高效算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
数据挖掘旨在将数据转换为有用信息,是目前信息化社会中发现知识的重要手段之一。随着信息技术的迅猛发展,尤其是Internet和数据库技术的快速进步,信息化产业中可获取到的数据正呈现着爆炸性的增长趋势,维数也迅速增高,使得“海量高维”的数据时代已经到来。数据集的海量高维导致了难以与之匹配的大计算量及大量的传统挖掘算法很难适应,然而各行各业尤其日常生活中对信息的庞大需求却与日俱增,这为传统的数据挖掘技术带来了全新而巨大的挑战,探索快速且有效的数据挖掘算法已成为一个全球性的热点研究领域。
     特征选择是数据挖掘中一个至关重要的数据预处理技巧,如何高效可行地实现对海量高维数据集的有效特征选取,也是目前特征选择研究中面临的主要困难之一。为此,本文以粗糙集理论为研究工具,针对面向海量数据集的特征选择进行了系统性的研究,主要取得了以下的研究成果。
     (1)构建了一个基于分解和融合的海量数据集高效特征选择框架。对给定的海量规模数据集,借鉴于使用样本表征整体的思想,通过深入分析如何将数据集由一个大的信息粒细化为多个可有效表征其整体的小的信息粒以及如何融合多个小信息粒结果这两个核心问题,构造了多粒度视角的高效特征选择框架,为大规模数据背景下的数据分析提供了可以借鉴的新途径。
     (2)基于高效特征选择框架,通过将代表性的算法嵌入其中,分别发展了面向海量规模符号数据集和混合数据集的高效粗糙特征选择算法。发展的高效算法可以高效地找到一个有效的近似结果,尤其处理大规模数据集,其高效性更加明显。相关实验结果也进一步验证了高效算法的高效性和可行性。
     (3)针对动态数据集,分别构造了三种代表性信息熵的组增量机制、维数增量机制以及随数据取值动态变化的更新机制。针对数据动态更新的三种主要情况,通过分析动态数据集中基本信息粒以及粒空间结构的变化,分别建立了三种代表性信息熵基于上述三种变化情况的更新机制。
     (4)基于信息熵的更新机制,定义了特征重要度的度量,并依此分别设计了粗糙特征选择的组增量更新算法、维数增量式更新算法以及随数据取值动态变化的更新算法。理论分析和相关实验结果也都进一步验证了算法的有效性和高效性。更新原理为动态性数据的数据分析提供了新的方法和理论支撑,为多源数据集的信息融合提供了新的研究路径。
     本文在系统分析了现有特征选择算法在处理海量高维数据集中的局限性,基于粗糙集理论,深入探索了如何构建高效的特征选择算法,并通过借鉴一些其它学科中的处理方法,发展了一系列高效的粗糙特征选择算法。相关的实验结果也都进一步验证了本文中新算法的可行性和高效性。因此,本文的主要研究内容及相关成果为海量高维数据集的知识发现提供了新的处理技巧和研究思路。
At present, data mining has been conceived as a significant approach for knowledge discovery in the information society, which aims at transforming data into useful information. With the rapid development of information tech-nology including internet and database, both the size and the dimension of data sets increase at an unprecedented rate, which has brought the times of "large-scale data with high dimension". These data and their high dimension brings big challenges for traditional data mining algorithms, and exploring ef-ficient and effective data mining algorithms has quickly become a global issue in many areas.
     Feature selection is an important data preprocessing technique in data mining. However, existing feature selection algorithms are usually low in com-putational efficiency, especially when dealing with large-scale data sets. In this paper, on the basis of rough set theory, efficient feature selection for large-scale data sets is studied systematically. Main contributions are listed as follows.
     1. Based on the idea of decompose and fusion, an efficient framework for feature selection is constructed. According to the idea of sample estimation, two key steps are discussed in this paper. One is decompose which means decomposing a big granule into a family of small ones which have the similar distribution with the large one. The other one is fusion which means fusing all the estimates got from small granules together and generating a final feature subset of the large data set. The framework provides new ways for analyzing big data.
     2. By employing the framework, two efficient rough feature selection algorithms are developed. One is used for nominal data and the other one is applicable for hybrid data. Two typical algorithms for nominal data and hybrid data are embedded in the framework respectively, and then, two efficient algo-rithm are developed. The two developed algorithms can find an effective result efficiently, especially for large-scale data sets. Experiments better illustrate effectiveness of the two developed algorithms and the framework.
     3. For dynamic data sets, group incremental mechanisms, dimension in- cremental mechanisms and updating mechanisms of three representative in-formation entropies are introduced. On the consideration of there are three situations of data updating in databases, based on analyzing changes of ele-mentary granules and granular space in dynamic data sets, the corresponding mechanisms of three employed information entropies are proven.
     4. On the basis of mechanisms, three efficient rough feature selection are proposed for dynamic data sets. They are a group incremental feature se-lection algorithm, a dimension incremental feature selection algorithm and a feature selection algorithm for data sets with varying data value. Both theo-retical analysis and experiments illustrate effectiveness and efficiency of the three algorithms. In addition, the main ideas can be expanded to fusion of two data sets or even multiple data sets. It is our wish that this study provides new approaches on fusion of multi-source data sets.
     In this paper, on the basis of analyzing limitations of existing feature se-lection algorithms for large-scale data sets, several efficient rough feature se-lection algorithms are introduced. Experiments better illustrate that these al-gorithm are effective and efficient. Hence, the development in the paper makes an important contribution to knowledge discovery for large-scale data sets.
引文
[1]C. Bachman. Integrated data store [J]. DPMA Quarterly,1965.
    [2]Nature Staff. Data for the masses [J]. Nature,2009,457(7226):129.
    [3]D. N. Reshef, Y. A. Reshef, H. K. Finucane, et al. Detecting Novel Associations in Large Data Sets [J]. Science,2011,334(6062):1518-1524.
    [4]J. B. Tenenbaum, C. Kemp, T. L. Griffiths, et al. How to grow a mind:statistics, structure, and abstraction [J]. Science,2011,331(6022):1279-1285.
    [5]王珊,王会举,覃雄派等.架构大数据:挑战、现状与展望[J].计算机学报,2011,34(10):1741-1752.
    [6]于戈,谷峪,鲍玉斌等.云计算环境下的大规模图数据处理技术[J].计算机学报,2011,34(10):1753-1767.
    [7]Science Staff. Challenges and opportunities [J]. Science,2011,331(6018):692-693.
    [8]R. Agrawal, T. Imielinski, A. Swami. Database mining:a performance perspective [J]. IEEE Transaction on Knowledge and Data Engineering,1993,5(6):914-925.
    [9]U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth. From data mining to knowledge dis-covery:An overview [M]. Advances in Knowledge Discovery and Data Mining, U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, et al, pp.495-515, AAAI Press/The MIT Press,1996.
    [10]J. Han, Y. Fu. Attribute-oriented induction in data mining [M]. Advances in Knowl-edge Discovery and Data Mining, U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, et al, pp.399-421, AAAI Press/The MIT Press,1996.
    [11]J. Han, M. Kamber. Data mining:concepts and techniques [M]. Morgan Kaufman, 2001.
    [12]H. Liu, H. Motoda. Feature selection for knowledge discovery and data mining [M]. Boston:Kluwer Academic,1998.
    [13]D. Pyle. Data preparation for data mining [M]. Morgan Kaufmann Publishers,1999.
    [14]A. L. Blum, P. Langley. Selection of relevant features and examples in machine learn-ing [J]. Artificial Intelligence,1997,97:245-271.
    [15]H. Liu, H. Motoda. Feature extraction, construction and selection:a data mining per-spective [M]. Boston:Kluwer Academic,1998, second printing,2001.
    [16]H. Liu, L. Yu. Toward integrating feature selection algorithms for classification and clustering [J]. IEEE Transaction on Knowledge and Data Engineering,2005,17(4): 491-502.
    [17]M. Ben-Bassat. Pattern recognition and reduction of dimensionality. Handbook of Statistics-Ⅱ [M], P. R. Krishnaiah, L. N. Kanal, eds., pp.773-791, North Holland, 1982.
    [18]A. Jain, D. Zongker. Feature selection:evaluation, application, and small sample per-formance [J]. IEEE Transaction on Pattern Analysis and Machine Intelligence,1997, 19(2):153-158.
    [19]P. Mitra, C. A. Murthy, S. K. Pal. Unsupervised feature selection using feature similar-ity [J]. IEEE Transaction on Pattern Analysis and Machine Intelligence,2002,24(3): 301-312.
    [20]W. Siedlecki, J. Sklansky. On automatic feature selection [J]. International Journal of Pattern Recognition and Artificial Intelligence,1988,2:197-220.
    [21]N. Wyse, R. Dubes, A. K. Jain. A critical evaluation of intrinsic dimensionality algo-rithms [M]. Pattern Recognition in Practice, E. S. Gelsema and L. N. Kanal, eds., pp. 415-425, Morgan Kaufmann, Inc.,1980.
    [22]M. Dash, K. Choi, P. Scheuermann, et al. Feature selection for clustering-a filter solu-tion [C]. Proceedings of the Second International Conference on Data Mining,2002, 115-122.
    [23]M. Dash, H. Liu. Feature selection for classification [J]. Intelligent Data Analysis, 1997,1(3):131-156.
    [24]Y. Kim, W. Street, F. Menczer. Feature selection for unsupervised learning via evolu-tionary search [C]. Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2000,365-369.
    [25]G. H. John, R. Kohavi, K. Pfleger. Irrelevant feature and the subset selection problem [C]. Proceedings of the 11th International Conference on Machine Learning,1994, 121-129.
    [26]K. Kira, L. A. Rendell. The feature selection problem:traditional methods and a new algorithm [C]. Proceedings of the 10th National Conference on Artificial Intelligence, pp,1992,129-134.
    [27]R. Kohavi, G. H. John. Wrappers for feature subset selection [J]. Artificial Intelligence, 1997,97(1-2):273-324
    [28]E. Leopold, J. Kindermann. Text categorization with support vector machines. How to represent texts in input space? [J]. Machine Learning,2002,46:423-444.
    [29]K. Nigam, A. K. Mccallum, S. Thrun, et al. Text classification from labeled and unla-beled documents using EM [J]. Machine Learning,2000,39:103-134.
    [30]Y. Yang, J. O. Pederson. A comparative study on feature selection in text categorization [C]. Proceedings of the 14th International Conference on Machine Learning,1997, 412-420.
    [31]Y. Rui, T. S. Huang, S. Chang. Image retrieval:current techniques, promising direc-tions and open issues [J]. Journal of Visual Communication and Image Representation, 1999,10(4):39-62.
    [32]D. L. Swets, J. J. Weng. Efficient content-based image retrieval using automatic feature selection [C]. Proceedings of International Symposium on Computer Vision,1995,85-90.
    [33]K. S. Ng, H. Liu. Customer retention via data mining [J]. AI Review,2000,14(6): 569-590.
    [34]W. Lee, S. J. Stolfo, K. W. Mok. Adaptive intrusion detection:a data mining approach [J]. AI Review,2000,14(6):533-567.
    [35]E. Xing, M. Jordan, R. Karp. Feature selection for high-dimensional genomic microar-ray data [C]. Proceedings of the 15th International Conference on Machine Learning, 2001,601-608.
    [36]L. Zadeh. Fuzzy sets and information granularity [M]. In:Gupta N., Ragade R., Yager R. (eds.) Advances in fuzzy set theory and application. Amsterdam:North-Holland, 1979,111-127.
    [37]L. Zadeh. Fuzzy logic equals computing with words [J]. IEEE Transactions on Fuzzy Systems,1996,4(2):103-111.
    [38]L. Zadeh. Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic [J]. Fuzzy Sets and Systems,1997,90(2):111-127.
    [39]J. R. Hobbs. Granularity [C]. Proceedings of International Joint Conference on Artifi-cial Intelligence, Los Angeles,1985,432-435.
    [40]R. R. Yager, D. Filev. Operations for granular computing:mixing words with numbers [C]. Proceedings of 1998 IEEE International Conference on Fuzzy Systems,1998, 123-128.
    [41]B. Zhang, L. Zhang. Theory and applications of problem solving [M]. North-Holland: Elsevier Science Publishers,1992.
    [42]张铃,张钹.问题求解理论及应用[M].清华大学出版社,2007.
    [43]T. Y. Lin. Neighborhood systems and relational database [C]. Proceedings of the 16th ACM Annual Conference on Computer Science, Altanta,1988,725-728.
    [44]T. Y. Lin. Neighborhood systems and approximation in database and knowledge base systems [C]. Proceedings of the 4th International Symposium on Methodologies of Intelligent Systems, Charlotte,1989,75-86.
    [45]T. Y. Lin. Granular computing on binary relations I:data mining and neighborhood systems [M]. In:Rough Sets In Knowledge Discovery, A. Skowron, L. Polkowski (eds), Physica-Verlag,1998,107-121.
    [46]Z. Pawlak. Rough sets:theoretical aspects of reasoning about data [M]. Kluwer Aca-demic Publishers, Boston,1991
    [47]Z. Pawlak, A. Skowron. Rudiments of rough sets [J]. Information Sciences,2007, 177(1):3-27.
    [48]张文修,吴伟志,梁吉业,李德玉.粗糙集理论与方法[M].科学出版社,2001.
    [49]梁吉业,李德玉.信息系统中的不确定性与知识获取[M].科学出版社,2005.
    [50]张文修,梁怡,吴伟志.信息系统与知识发现[M].北京,科学出版社,2003.
    [51]徐泽水.不确定多属性决策方法及应用[M].清华大学出版社,2004.
    [52]徐玖平,吴巍.多属性决策的理论与方法[M].清华大学出版社,2006.
    [53]Z. Pawlak. Some remarks on conflict analysis [J]. Information Sciences,2005,166, 649-654.
    [54]Z. Pawlak, A. Skowron. Rough sets:some extensions [J]. Information Sciences,2007, 177,28-40.
    [55]Z. Pawlak, A. Skowron. Rough sets and boolean reasoning [J]. Information Sciences, 2007,177,41-73.
    [56]A. L. Blum, R. L. Rivest. Training a 3-node neural networks is NP-complete [J]. Neu-ral Networks,1992,5,117-127.
    [57]P. Langley. Selection of relevant features in machine learning [C]. Proceedings of the AAAI Fall symposium on relevance,1994,140-144.
    [58]J. Doak. An evaluation of feature selection methods and their application to computer security [D]. Technical report, University of California at Davis, Department of Com-puter Science,1992.
    [59]P. M. Narendra, K. Fukunaga. A Branch and Bound Algorithm for Feature Subset Selection [J]. IEEE Transaction on Computer,1977,26(9):917-922.
    [60]G. Brassard, P. Bratley. Fundamentals of algorithms [M]. New Jersey:Prentice Hall, 1996.
    [61]H. Almuallim, T. G. Dietterich. Learning boolean concepts in the presence of many irrelevant features [J]. Artificial Intelligence,1994,69(1-2):279-305.
    [62]M. A. Hall. Correlation-based feature selection for discrete and numeric class machine learning [C]. Proceedings of the 17th International Conference on Machine Learning, 2000,359-366.
    [63]M. Dash, H. Liu. Feature selection for clustering [C]. Proceedings of the Fourth Pacific Asia Conference of Knowledge Discovery and Data Mining, (PAKDD-2000),2000, 110-121.
    [64]J. G. Dy, C. E. Brodley. Feature subset selection and order identification for unsu-pervised learning [C]. Proceedings of the 17th International Conference on Machine Learning,2000,247-254.
    [65]I. H. Witten, E. Frank. Data mining-pracitcal machine learning tools and techniques with JAVA implementations [M]. Morgan Kaufmann,2000.
    [66]H. Liu, R. Setiono. A probabilistic approach to feature selection-a filter solution [C]. Proceedings of the 13th International Conference on Machine Learning,1996,319-327.
    [67]L. Yu, H. Liu. Feature selection for high-dimensional data:a fast correlation-based fil-ter solution [C]. Proceedings of the 20th International Conference on Machine Learn-ing,2003,856-863.
    [68]R. Caruana, D. Freitag. Greedy attribute selection [C]. Proceedings of the 11th Inter-national Conference on Machine Learning,1994,28-36,1994.
    [69]S. Das. Filters, wrappers and a boosting-based hybrid for feature selection [C]. Pro-ceedings of the 18th International Conference on Machine Learning,2001,74-81.
    [70]A. Y. Ng. On feature selection:learning with exponentially many irrelevant features as training examples [C]. Proceedings of the 15th International Conference on Machine Learning,1998,404-412.
    [71]Q. H. Hu, D. R. Yu, J. F. Liu, et al. Neighborhood rough set based heterogeneous feature subset selection [J]. Information Sciences,2008,178(18):3577-3594.
    [72]Q. H. Hu, D. R. Yu, Z. X. Xie. Information-preserving hybrid data reduction based on fuzzy-rough techniques [J]. Pattern Recognition Letters,2006,27,414-423.
    [73]Q. H. Hu, Z. X. Xie, D. R. Yu. Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation [J]. Pattern Recognition,2007,40,3509-3521.
    [74]M. Dash, H. Liu, J. Yao. Dimensionality reduction of unsupervised data [C]. Proceed-ings of the Ninth IEEE International Conference on Tools with AI (ICTAI' 97),1997, 532-539.
    [75]L. Talavera. Feature selection as a preprocessing step for hierarchical clustering [C]. Proceedings International Conference on Machine Learning (ICML'99),1999,389-397.
    [76]A. Skowron, C. Rauszer. The discernibility matrices and functions in information sys-tems [M]. Intelligent Decision Support:Handbook of Applications and Advances of Rough Set Theory,1992,331-362.
    [77]A. Skowron, L. Polkowski. Rough sets in knowledge discovery [M]. Germany: Springer-Verlag,1998,351-367.
    [78]X. H. Hu, N. Cercone. Learning in relational databases:a rough set approach [J]. International Journal of Computational Intelligence,1995,11(2):323-338.
    [79]苗夺谦,胡桂荣.知识约简的一种启发式算法[J].计算机研究与发展,1999,36(6):681-684.
    [80]王国胤.决策表核属性的计算方法[J].计算机学报,2003,26(5):611-615.
    [81]王国胤,于洪,杨大春.基于条件信息熵的决策表约简[J].计算机学报,2002,25(7):759-766.
    [82]J. Y. Liang, K. S. Chin, C. Y. Dang, et al. A new method for measuring uncertainty and fuzziness in rough set theory [J]. International Journal of General Systems,2002, 31(4):331-342.
    [83]J. Y. Liang, Z. B. Xu. The algorithm on knowledge reduction in incomplete infor-mation systems [J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,2002,10(1):95-103.
    [84]J. Y. Liang, Z. Z. Shi, D. Y. Li. The information entropy, rough entropy and knowledge granulation in rough set theory [J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,2004,12(1):37-46.
    [85]J. Y. Liang, Z. Z. Shi, D. Y. Li, et al. The information entropy, rough entropy and knowledge granulation in incomplete information systems [J]. International Journal of General Systems,2006,34(1):641-654.
    [86]J. Y. Liang, Y. H. Qian. Information granules and entropy theory in information sys-tems [J]. Science in China (Series F),2008,51(9):1427-1444.
    [87]J. Y. Liang, J. H. Wang, Y. H. Qian. A new measure of uncertainty based on knowledge granulation for rough sets [J]. Information Sciences,2009,179,458-470.
    [88]Y. H. Qian, J. Y. Liang. Combination entropy and combination granulation in rough set theory [J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,2008,16(2):179-193.
    [89]Y. H. Qian, J. Y. Liang. Combination entropy and combination granulation in incom-plete information system [C]. Lecture Notes in Artificial Intelligence,2006,4062, 184-190.
    [90]Y. H. Qian, J. Y. Liang, F. Wang. A new method for measuring the uncertainty in incomplete information systems [J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,2009,17(6):855-880.
    [91]Y. H. Qian, J. Y. Liang, W. Pedrycz, et al. Positive approximation:an accelerator for attribute reduction in rough set theory [J]. Artificial Intelligence,2010,174(9-10): 597-618.
    [92]刘少辉,盛秋戬,吴斌等.Rough集高效算法的研究[J].计算机学报,2003,26(5):524-529.
    [93]刘少辉,盛秋戬,史忠植.一种新的快速计算正区域的方法[J].计算机研究与发展,2003,40(5):637-642.
    [94]徐章艳,刘作鹏,杨炳儒等.一个复杂度为max(O(|C||U|),O(|C|2|U/C|))的快速属性约简算法[J].计算机学报,2006,29(3):391-399.
    [95]刘勇,熊蓉,褚健.Hash快速属性约简算法[J].计算机学报,2009,32(8):1493-1499.
    [96]胡峰,王国胤.属性序下的快速约简算法[J].计算机学报,2007,30(8):1429-1435.
    [97]W. Ziarko. Variable precision rough set model [J]. Journal of Computer and System Science,1993,46(1):39-59.
    [98]Y. Y. Yao. Two views of the theory of rough sets in finite universes [J]. International Journal of Approximate Reasoning,1996,15:291-317.
    [99]Y. Y. Yao. Neighborhood systems and approximate retrieval [J]. Information Sciences, 2006,176(23):3431-3452.
    [100]Y. Y. Yao. Decision-theoretic rough set models [C]. Lecture Notes in Artificial Intel-ligence,2007,4481,1-12.
    [101]S. Greco, B. Matarazzo, R. Slowinski. Rough sets theory for multicriteria decision analysis [J]. European Journal of Operational Research,2001,129:1-47.
    [102]S. Greco, B. Matarazzo, R. Slowinski, Rough sets methodology for sorting problems in presence of multiple attributes and criteria [J]. European Journal of Operational Research,2002,138,247-259.
    [103]S. Greco, B. Matarazzo, R. Slowinski. Rough approximation by dominance relations [J]. International Journal of Intelligent Systems,2002,17,153-171.
    [104]M. W. Shao, W. X. Zhang. Dominance relation and rules in an incomplete ordered information system [J]. International Journal of Intelligent Systems,2005,20,13-27.
    [105]D. Dubois, H. Prade. Two fold fuzzy sets and rough sets-some issues in knowledge representation [J]. Fuzzy Sets and Systems,1987,23,3-18.
    [106]D. Dubois, H. Prade. Rough fuzzy sets and fuzzy rough sets [J]. International Journal of General Systems.1990,17(2-3):191-209.
    [107]D. Dubois, H. Prade. Putting fuzzy sets and rough sets together [M], In:Slowiniski, R. (Ed.), Intelligent Decision Support, Kluwer Academic, Dordrecht,1992,203-232.
    [108]T. Y. Lin, Q. Liu, K. J. Huang. Rough Sets Neighborhood Systems and Approxima-tion [J]. Methodologies for Intelligent Systems,1990,5,130-141.
    [109]Y. H. Qian, J. Y. Liang, C. Y. Dang. Interval ordered information systems [J]. Com-puters & Mathematics with Applications,2008,56,1994-2009.
    [110]Y. H. Qian, C. Y. Dang, J. Y. Liang, et al. Set-valued ordered information systems [J], Information Sciences,2009,179,2809-2832.
    [111]Q. H. Hu, S. An, D. R. Yu. Soft fuzzy rough sets for robust feature evaluation and selection [J]. Information Sciences,2010,180(22):4384-4400.
    [112]R. Bhatt, M. Gopal. On fuzzy-rough sets approach to feature selection [J]. Pattern Recognition Letters,2005,26,965-975.
    [113]R. Jensen, Q. Shen. Fuzzy-rough sets assisted attribute selection [J]. IEEE Transac-tions on Fuzzy Systems,2007,15(1):73-89.
    [114]Y. Y. Yao, Y. Zhao. Attribute reduction in decision theoretic rough set models [J]. Information Sciences,2008,178,3356-3373.
    [115]Z. Q. Meng, Z. Z. Shi. A fast approach to attribute reduction in imcomplete decision systems with tolerance relation based rough sets [J]. Information Sciences,2009,179: 2774-2793.
    [116]Y. H. Qian, J. Y. Liang, W. Pedrycz, et al. An efficient accelerator for attribute reduc-tion from incomplete data in rough set framework [J]. Pattern Recognition,2011,44: 1658-1670.
    [117]钱宇华,梁吉业,王锋.面向非完备决策表的正向近似特征选择加速算法[J].计算机学报,2011,34(3):435-442.
    [118]W. Wei, J. Y. Liang, Y. H. Qian, et al. An attribute reduction approach and its acceler-ated version for hybrid data [C]. The 8th IEEE International Conference on Cognitive Informatics,2009,167-173.
    [119]N. Shan, W. Ziarko. Data-based acquisition and incremental modification of classifi-cation rules [J]. Computational Intelligence,1995,11(2):357-370.
    [120]W. C. Bang, B. Zeungnam. New incremental learning algorithm in the framework of rough set theory [J]. International Journal of Fuzzy Systems,1999,1(1):25-36.
    [121]L. Y. Tong, L. P. An. Incremental learning of decision rules based on rough set theory [C]. World Congress on Intelligent Control and Automation,2002,420-425.
    [122]Z. Zheng, G. Y. Wang. RRIA:A rough set and rule tree based incremental knowledge acquisition algorithm [J]. Fundamenta Informaticae,2004,59(2-3):299-313.
    [123]S. Guo, Z. Y. Wang, Z. C. Wu, et al. A novel dynamic incremental rules extraction algorithm based on rough set theory [C]. Proceedings of the 4th International Confer-ence on Machine Learning and Cybernetics,2005,1902-1907.
    [124]D. Liu, T. R. Li, D. Ruan, et al. An incremental approach for inducing knowl-edge from dynamic information systems [J]. Fundamenta Informaticae,2009,94: 245-260.
    [125]D. Liu, T. R. Li, D. Ruan, et al. Incremental learning optimization on knowledge discovery in dynamic business intelligent systems [J]. Journal of Global Optimization, 2011,51(2):325-344.
    [126]B. Jerzy, R. Slowinski. Incremental induction of decision rules from dominance-based rough approximations [J]. Electronic Notes in Theoretical Computer Science, 2003,82:40-51.
    [127]H. M. Chen, T. R. Li, D. Ruan, et al. A rough-set based incremental approach for updating approximations under dynamic maintenance environments [J]. IEEE Trans-actions on Knowledge and Data Engineering,2010,6(1):1-12.
    [128]C. C. Chan. A rough set approach to attribute generalization in data mining [J]. In-formation Sciences,1998,107:169-176.
    [129]T. R. Li, D. Ruan, W. Geert, et al. A rough sets based characteristic relation approach for dynamic attribute generalization in data mining [J]. Knowledge-Based Systems, 2007,20(5):485-494.
    [130]Y. Cheng. The incremental method for fast computing the rough fuzzy approxima-tions [J]. Data & Knowledge Engineering,2011,70:84-100.
    [131]J. B. Zhang, T. R. Li, D. Liu. An approach for incremental updating approximations in variable precision rough sets while attribute generalizing [C]. Proceedings 2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering, Hangzhou, China,2010,77-81.
    [132]S. Y. Li, T. R. Li, D. Liu. Incremental updating approximations in dominance-based rough sets approach under the variation of the attribute set [J]. Knowledge-Based Sys-tems,2013,40:17-26.
    [133]H. M. Chen, T. R. Li, S. J. Qiao, et al. A rough set based dynamic maintenance ap-proach for approximations in coarsening and refining attribute values [J]. International Journal of Intelligent Systems,2010,25:1005-1026.
    [134]D. Liu, T. R. Li, G. R. Liu, et al. An incremental approach for inducing interest-ing knowledge based on the change of attribute values [C]. Proceedings 2009 IEEE International Conference on Granular Computing, Nanchang, China,2009,415-418.
    [135]刘宗田.属性最小约简的增量式算法[J].电子学报,1999,27(11),96-98.
    [136]M. Orlowska, M. Orlowski. Maintenance of knowledge in dynamic information sys-tems [M]. In:Slowinski R. (ed.), Intelligent Decision Support. Handbook of Appli-cations and Advances of the Rough Set Theory, Kluwer Academic Publishers, Dor-drecht,1992,315-330.
    [137]M. Kryszkiewicz. Maintenance of reducts in the variable precision rough sets model [J]. Rough Sets and Data Mining,1996,355-372.
    [138]F. Hu, G.Y. Wang, H. Huang, et al. Incremental attribute reduction based on elemen-tary sets [C]. Proceedings of the 10th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Regina, Canada,2005,185-193.
    [139]杨明,孙志军.改进的差别矩阵及其求核方法[J].复旦大学学报(自然版),2004,43(5):865-868.
    [140]杨明.一种基于改进差别矩阵的核增量式更新算法[J].计算机学报,2006,29(3):407-413.
    [141]杨明.一种基于改进差别矩阵的属性约简增量式更新算法[J].计算机学报,2007,30(5):815-822.
    [142]梁吉业,魏巍,钱宇华.一种基于条件熵的增量核求解方法[J].系统工程理论与实践,2008,4:81-89.
    [143]刘薇,梁吉业,魏巍等.一种基于条件熵的增量式属性约简求解算法[J].计算机科学,2011,38(1):229-231,239.
    [144]J. Catlett. On changing continuous attributes into ordered discrete attributes [C]. Pro-ceedings of the European Working Session on Learning on Machine Learning,1991, 482:164-178.
    [145]J. Y. Ching. Class-dependent discretization of continuous attributes for inductive learning [D]. Master Thesis, University of Waterloo, Canada,1992.
    [146]U. M. Fayyad, K. B. Irani. On the handling of continuous-valued attributes in deci-sion tree generation [J], Machine Learning,1992,8:87-102.
    [147]U. Fayyad, K. B. Irani. Multi-Interval discretization of continuous attributes as pre-processing for classification learning [C]. The 13th International Joint Conference on Artificial Intelligence, Morgan Kaufmann,1022-1027.
    [148]L. Yu, H. Liu. Efficient feature selection via analysis of relevance and redundancy [J]. Journal of Machine Learning Research,2004,5:1205-1224.
    [149]W. Y. Tang, K. Z. Mao. Feature selection algorithm for mixed data with both nominal and continuous features [J]. Pattern Recognition Letters,2007,28(5):563-571.
    [150]N. Kwak, C. H. Choi. Input feature selection by mutual information based on Parzen window [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2002, 24(12):1667-1671.
    [151]H. Peng, F. Long, C. Ding. Feature selection based on mutual information criteria of max-dependency,max-relevance, and min-redundancy [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(8):1226-1238.
    [152]R. Thawonmas, S. Abe. A novel approach to feature selection based on analysis of class regions [J]. IEEE Transactions on Systems Man and Cybernetics Part B-cybernetics,1997,27(2):196-207.
    [153]H. Wang. Nearest neighbors by neighborhood counting [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,28(6):942-953.
    [154]谢宏,程浩忠,牛东晓.基于信息熵的粗糙集连续属性离散化算法[J].计算机学报,2005,28(9):1570-1574.
    [155]石红,沈毅,刘志言.一种改进的连续属性全局离散化算法[J].电机与控制学报,2004,8(3):268-270.
    [156]胡清华.混合数据知识发现的粗糙计算模型和算法[D].哈尔滨:哈尔滨工业大学,2008,17-20,33-65,70-73,159-176.
    [157]Q. H. Hu, D. R. Yu, Z. X. Xie. Neighborhood classifiers [J]. Expert Systems with Applications,2008,34(2):866-876.
    [158]N. N. Morsi, M. M. Yakout. Axiomatic for fuzzy rough set [J]. Fuzzy sets system, 1998,100:327-342.
    [159]T. Beaubouef, F. E. Petry. Fuzzy rough set techniques for uncertainty processing in a relational database [J]. International Journal of Intelligent Systems,2000,15(5):389-424.
    [160]W. Z. Wu, J. S. Mi, W. X. Zhang. Generalized fuzzy rough sets [J]. Information Sciences,2003,151:263-282.
    [161]R. Jensen, C. Cornelis. Fuzzy-rough nearest neighbor classification and prediction [J]. Theoretical Computer Science,2011,412:5871-5884.
    [162]J. P. Jia. Principles of Statistics(the Fourth Edition) [M]. Beijing:China Renmin University Publishing,2009.
    [163]Y. J. Jin, Z. F. Du, Y. Jiang. Sampling technique [M]. Beijing:China Renmin Uni-versity Publishing,2008.
    [164]J. X. Ni. Sampling survey [M]. Guangxi Normal University Press,2002.
    [165]G. Kader, M. Perry. Variability for categorical variables [J]. Journal of Statistics Ed-ucation,2007,15(2).
    [166]M. Perry, G. Kader. Variation as unalikeability [J]. Teaching Statistics,2005,27(2): 58-60.
    [167]J. W. Grzymala-Busse. An algorithm for computing a single covering [M], in:J.W. Grzymala-Busse (Ed.). Managing Uncertainty in Expert Systems. Kluwer Academic Publishers, Netherlands,1991,66.
    [168]J. W. Grzymala-Busse. LERS-a system for learning from examples based on rough sets [M], in:R. Slowinski (Ed.). Intelligent Decision Support:Handbook of Applica-tions and Advances of the Rough Set Theory. Kluwer Academic Publishers, Nether-lands,1992,3-18.
    [169]C. E. Shannon. The mathematical theory of communication [J]. The Bell System Technical Journal,1948,27(3-4):373-423.
    [170]I. Duntsch, G. Gediga. Uncertainty measures of rough set prediction [J]. Artificial Intelligence,1998,106:109-137.
    [171]T. Beaubouef, F. E. Petry, G. Arora. Information-theoretic measures of uncertainty for rough sets and rough relational databases [J]. Information Sciences,1998,109(1-4):185-195.
    [172]D. Slezak. Approximate entropy reducts [J]. Fundamenta Informaticae,2002,53(3-4):365-390.
    [173]W. H. Xu, X. Y. Zhang, W. X. Zhang. Knowledge granulation, knowledge entropy and knowledge uncertainty measure in ordered information systems [J]. Applied Soft Computing,2009,9(4):1244-1251.
    [174]P. Song, J. Y. Liang, Y. H. Qian. A two-grade approach to ranking interval data [J]. Knowledge-Based Systems,2012,27:234-244.
    [175]F. Jiang, Y. F. Sui, C. G. Cao.Some issues about outlier detection in rough set theory [J]. Expert Systems with Applications,2009,36:4680-4687.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700