用户名: 密码: 验证码:
基于本体的知识库分类研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
语言知识库是自然语言处理的基础资源,知识库中知识的丰富程度、知识表示形式以及知识的组织方式直接关系到基于知识库的自然语言处理应用的性能。
     现有基于领域分类的知识库大多是在以人为对象的词典基础之上建立起来的,知识的覆盖度低,知识库更新周期长,独立存储的领域知识无法满足知识共享以及降低知识冗余的需求。另一方面,现有自然语言处理应用大多只涉及到词汇层面的知识,而以概念以及概念之间的关系为描述对象的语义知识很少被使用,从而限制了相关应用的性能。
     针对现有知识库在构建和使用中存在的不足,本文提出一种基于通用电子词典来自动扩充领域词典的词汇领域标注方法,并借助本体知识库良好的分类特性、概念的形式化描述来改善现有知识库在知识表示、知识存储和知识共享等方面的性能。本文的主要工作包括以下四个方面:
     1.提出一种基于词汇注释信息的单词领域标注方法。该方法利用领域词典和一部包括词汇注释信息的通用电子词典来训练领域标注模型,并利用该模型自动为通用电子词典中的新词加注领域标记,在降低人力成本的前提下提高知识库的覆盖度;
     2.提出一种自适应的层次化分类体系生成方法,并在该分类体系的基础上实现层次化的领域标注。该方法利用领域词典所包含的词汇信息来分析领域之间的相关度,在此基础上自动生成层次化分类树,并进一步实现自顶向下的层次化领域标注方法;
     3.针对领域术语存在“一词多义”和“多词同义”的问题,提出一种基于本体的概念化特征描述模型C-VSM,通过将文本中的词汇映射到本体上的概念节点,达到词义消歧和合并同义词的目的,在有效减少文本特征数量的同时提高主要特征的权重,从而提高文本表示的准确性;在此基础上实现训练文本和待分类文本的概念化表示,从而将C-VSM用于传统的文本分类器;
     4.研究实现了基于C-VSM模型的文本分类算法,包括特征选择方法、特征权重计算和文本相似度计算等。针对低频特征和非平衡语料对文本分类性能的影响,提出一种信息增益和文档频率相结合的均匀特征选择方法;在此基础上通过分析概念间的语义关系来调整特征的权重值,并实现了一种新的文本相似度计算方法。
Linguistic knowledge base is a fundamental resource for natural language processing. The completeness, representation and organization of the knowledge directly affect the application performance of the knowledge-based natural language processing.
     Most of the taxonomy-based knowledge bases were constructed on the basis of human-oriented dictionary, they have a low converge and a long updating cycle, and the isolated storage strategy for domain knowledge bases is hard to meet the need of knowledge sharing and redundancy reducing.On the other hand, many of the existing natural language processing applications only involve the word level knowledge, and rare of them use the semantic knowledge about the concepts and the relationships among concepts, which limits the applications'performance.
     To solve the problems mentioned above, this paper proposed a domain label assignment method based on the manually constructed machine readable dictionary, which can be used to automatically implement the domain dictionaries. By using the well-defined taxonomy and formal description of concepts in ontology, we can improve the performance of knowledge storage, representation and sharing for existing knowledge base. The main works of this dissertation are summarized as follows:
     1. A word domain assignment method based on the word gloss is proposed. The domain specialized dictionaries and a general dictionary are used in this method to train the label model which is then used to automatically add domain labels to the new word in the general dictionary. This method can effectively reduce the labor cost while improving the coverage of knowledge base
     2. An adaptive hierarchical classification system generation method is proposed in chapter3, and a hierarchical domain assignment method based on the automatically generated classification system is also proposed in this chapter. The method utilize the vocabulary information to analyze the relevancy between domains, and on the basis a hierarchical classification tree is automatically generated and then be used in the top-down hierarchical domain label step.
     3. A new conceptualized feature description model C-VSM based on ontology is proposed, in order to resolve the polysemy and synonyms problems in domain terminology. We make word sense disambiguation on polysemy and merge the synonyms by mapping the word in text to the concept node in ontology to reduce the number of features and increase the weight of main features, which can improve the efficiency of text representation. The training documents and the new documents are represented as C-VSM and then be used in the traditional classifier.
     4. We introduce the C-VSM model into text classification, and discuss the related technologies including feature selecting method, feature weighting calculation, text similarity calculation and so on. A new balanced feature selecting method is presented by combining the information gain and the document frequency, to promote the classification performance. And the feature weight is adjusted to improve the text similarity calculation by analyzing the semantic relation between concepts.
引文
陈文亮,朱靖波,朱慕华,等.基于领域词典的文本特征表示[J].计算机研究与发展,2005,42(12):2155-2160.
    邓志鸿,唐世渭,张铭,等Ontology研究综述[J].北京大学学报(自然科学版),2002,38(5):730-738.
    董振东,董强,郝长伶.知网的理论发现[J].中文信息学报,2007,21(4):3-9.
    冯志伟,主题与分类[J].中国科技术语,2009,11(1):7-10.
    胡明涵.面向领域的文本分类与数据挖掘关键技术研究[D]:[博士].沈阳:东北大学.
    黄承慧,印鉴,侯昉.一种结合词项语义信息和TF-IDF方法的文本相似度量方法[J].计算机学报,2011,34(5):856-864.
    黄河燕,张克亮,张孝飞.基于本体的专业机器翻译术语词典研究[J].中文信息学报,2007,21(1):17-22.
    李善平,尹奇(?),胡玉杰,等.本体论研究综述[J].计算机研究与发展,2004,41(7):1041-1052.
    李素建,宋涛,高杰,等.一种基于使用差异的词语领域性分析方法[J].中文信息学报,2009,23(6):72-78.
    刘华.一种快速获取领域新词语的新方法[J].中文信息学报,2006,20(5):17-23.
    刘桐菊,于浩,杨沐昀.基于TFIDF的专业领域词汇获取的研究[C][J].第一届学生计算语言学研讨会,2002.
    刘宇鹏,李生,赵铁军.基于WordNet词义消歧的系统融合[J].自动化学报,2010,36(11):1575-1580.
    陆玉昌,鲁明羽,李凡,等.向量空间法中单词权重函数的分析和构造[J].计算机研究与发展,2002,39(10):1205-1210.
    彭京,杨冬青,唐世渭,等.一种基于语义内积空间模型的文本聚类算法[J].计算机学报,2007,30(8):1354-1363.
    苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859.
    孙麟,牛军钰.基于领域相关词汇提取的特征选择方法[J].小型微型计算机系统,2007,28(5):895-899.
    孙霞,郑庆华,王朝静,等.一种基于生语料的领域词典生成方法[J].小型微型计算机系统,2005,26(6):1088-1092.
    唐焕玲,孙建涛,陆玉昌.文本分类中结合评估函数的TEF-WA权值调整技术[J].计算机研究与发展,2005,42(1):47-53.
    王博.2009.文本分类中特征选择技术的研究[D]:[博士].长沙:国防科学技术大学.
    徐峻岭,周毓明,陈林,等.基于互信息的无监督特征选择[J].计算机研究与发展,2012,49(2):372-382.
    徐燕,李锦涛,王斌,等.基于区分类别能力的高性能特征选择方法[J].软件学报,2008,19(1):82-89.
    俞士汶,朱学锋,段慧明,等.汉语词汇语义研究及词汇知识库建设[J].第七届汉语词汇语义学研讨会,2006.
    战学刚,林鸿飞,姚天顺.中文文献的层次分类方法[J].中文信息学报,1999,13(6):20-25
    张海军.2011.基于大规模语料的中文新词识别技术研究[D]:[博士].合肥:中国科学技术大学.
    中国互联网络信息中心.2013.中国互联网络发展状况统计报告[EB/OL].北京:CNNIC.
    周浪,张亮,冯冲,等.基于词频分布变化统计的术语抽取方法[J].计算机科学,2009,36(5):177-180.
    周茜,赵明生,扈旻.中文文本分类中的特征选择研究[J].中文信息学报,2004,18(3):17-23.
    朱虹,刘扬.词汇语义知识库的研究现状与发展趋势[J].情报学报,2008,12:870-877.
    邹加棋,陈国龙,郭文忠.基于图模型的中文文档分类研究[J].小型微型计算机系统,2006,27(4):754-757.
    Aggarwal C C, Gates S C, Yu P S. On the merits of building categorization systems by supervised clustering[C]//Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM,1999:352-356.
    Agirre E, Alfonseca E, Hall K, et al. A study on similarity and relatedness using distributional and wordnet-based approaches[C]//Proceedings of Human Language Technologies:The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics,2009:19-27.
    Banerjee S, Ramanathan K, Gupta A. Clustering short texts using wikipedia[C]//Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM,2007:787-788.
    Beil F, Ester M, Xu X. Frequent term-based text clustering[C]//Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM,2002: 436-442.
    Bentivogli L, Forner P, Magnini B, et al. Revising the wordnet domains hierarchy:semantics, coverage and balancing[C]//Proceedings of the Workshop on Multilingual Linguistic Ressources. Association for Computational Linguistics,2004:101-108.
    Beyer K, Goldstein J, Ramakrishnan R, et al. When is "nearest neighbor" meaningful?[M] //Database Theory-ICDT'99. Springer Berlin Heidelberg,1999:217-235.
    Bloehdorn S, Hotho A. Text classification by boosting weak learners based on terms and concepts[C]//In Proceeding of the Fourth IEEE International Conference on Data Mining (ICDM'04). IEEE,2004:331-334.
    Borrajo L, Romero R, Iglesias E L, et al. Improving imbalanced scientific text classification using sampling strategies and dictionaries[J]. Journal of Integrative Bioinformatics,2011,8(3):176.
    Bourigault D. Surface grammatical analysis for the extraction of terminological noun phrases[C]//Proceedings of the 14th Conference on Computational Linguistics-Volume 3. Association for Computational Linguistics,1992:977-981.
    Bunke H, Shearer K. A graph distance metric based on the maximal common subgraph[J]. Pattern Recognition Letters,1998,19(3):255-259.
    Cai L, Hofmann T. Hierarchical document categorization with support vector machines[C]// Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management. ACM,2004:78-87.
    Carpineto C, Michini C, Nicolussi R. A concept lattice-based kernel for svm text classification[M] //Formal Concept Analysis. Springer Berlin Heidelberg,2009:237-250.
    Castillo M, Real F, Rigau G. Automatic assignment of domain labels to WordNet[C]//Proceeding of the 2nd International WordNet Conference.2004:75-82.
    Ceci M, Malerba D. Classifying web documents in a hierarchy of categories:a comprehensive study[J]. Journal of Intelligent Information Systems,2007,28(1):37-78.
    Cesa-Bianchi N, Gentile C, Zaniboni L. Incremental algorithms for hierarchical classification[J]. The Journal of Machine Learning Research,2006,7:31-54.
    Chuang S L, Chien L F. A practical web-based approach to generating topic hierarchy for text segments[C]//Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management. ACM,2004:127-136.
    Dinesh R, Harish B S, Guru D S, et al. Concept of status matrix in classification of text documents[C]//Proceedings of Indian International Conference on Artificial Intelligence.2009: 2071-2079.
    Dong T, Shang W, Zhu H. An improved algorithm of bayesian text categorization[J]. Journal of Software,2011,6(9):1837-1843.
    Drouin P. Term extraction using non-technical corpora as a point of leverage[J]. Terminology, 2003,9(1):99-115.
    Dumais S, Chen H. Hierarchical classification of Web content[C]//Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM,2000:256-263.
    Frank E, Paynter G W, Witten I H, et al. Domain-specific keyphrase extraction[C]. In Proceedings of IJCAI'99:668-673.
    Estevez P A, Tesmer M, Perez C A, et al. Normalized mutual information feature selection[J]. IEEE Transactions on Neural Networks,2009,20(2):189-201.
    Fernandez M, Cantador I, Lopez V, et al. Semantically enhanced Information Retrieval:an ontology-based approach[J]. Web semantics:Science, Services and Agents on the World Wide Web,2011,9(4):434-452.
    Fung B C M, Wang K, Ester M. Hierarchical document clustering using frequent itemsets[C]// Proceedings of the SIAM International Conference on Data Mining.2003,30(5):59-70.
    Gabrilovich E, Markovitch S. Overcoming the brittleness bottleneck using Wikipedia:Enhancing text categorization with encyclopedic knowledge[C]//In Proceedings of the National Conference on Artificial Intelligence(A AAI'06),21(2):1301-1306.
    Gabrilovich E, Markovitch S. Wikipedia-based semantic interpretation for natural language processing[J]. Journal of Artificial Intelligence Research,2009,34(2):443.
    Gao B, Liu T Y, Feng G, et al. Hierarchical taxonomy preparation for text categorization using consistent bipartite spectral graph copartitioning[J]. IEEE Transactions on Knowledge and Data Engineering,2005,17(9):1263-1273.
    Gheyas I A, Smith L S. Feature subset selection in large dimensionality domains[J]. Pattern Recognition,2010,43(1):5-13.
    Gruber T R. A translation approach to portable ontology specificationsJ]. Knowledge Acquisition, 1993,5(2):199-220.
    Guyon I, Elisseeff A. An introduction to variable and feature selection[J]. The Journal of Machine Learning Research,2003,3:1157-1182.
    Hammouda K M, Kamel M S. Phrase-based document similarity based on an index graph model[C]//Proceedings of the 2002 IEEE International Conference on Data Mining(ICDM'O2). IEEE,2002:203-210.
    Han X, Liu J, Shen Z, et al. An optimized k-nearest neighbor algorithm for large scale hierarchical text classification[C]//Joint ECML/PKDD PASCAL Workshop on Large-Scale Hierarchical Classification.2011:2-12.
    Hashimoto C, Kurohashi S. Construction of domain dictionary for fundamental vocabulary[C]// Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. Association for Computational Linguistics,2007:137-140.
    Hashimoto C, Kurohashi S. Blog categorization exploiting domain dictionary and dynamically estimated domains of unknown words[C]//Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies:Short Papers. Association for Computational Linguistics,2008:69-72.
    Hotho A, Staab S, Stumme G. Wordnet improve text document clustering[C]//ICDM 2003. Third IEEE International Conference on Data Mining. IEEE,2003:541-544.
    Hu X, Zhang X, Lu C, et al. Exploiting Wikipedia as external knowledge for document clustering [C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM,2009a:389-396.
    Hu X, Sun N, Zhang C, et al. Exploiting internal and external semantics for the clustering of short texts using world knowledge[C]//Proceedings of the 18th ACM Conference on Information and Knowledge Management. ACM,2009b:919-928.
    Huang A, Milne D, Frank E, et al. Clustering documents with active learning using Wikipedia[C]// Eighth IEEE International Conference on Data Mining(ICDM'08). IEEE,2008:839-844
    Huang A, Milne D, Frank E, et al. Clustering documents using a Wikipedia-based concept representation[M]//Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg,2009:628-636.
    Iosif E, Potamianos A. Unsupervised semantic similarity computation between terms using web documents[J].IEEE Transactions on Knowledge and Data Engineering,2010,22(11): 1637-1647.
    Khreisat L. A machine learning approach for Arabic text classification using N-gram frequency statistics[J]. Journal of Informetrics,2009,3(1):72-77
    Knight K, Luk S K. Building a large-scale knowledge base for machine translation [C]//Proceedings of the National Conference on Artificial Intelligence. JOHN WILEY & SONS LTD,1994:773-778.
    Koller D, Sahami M. Hierarchically classifying documents using very few words[J]. ICML,1997.
    Kolte S G, Bhirud S G. WordNet:a knowledge source for word sense disambiguation[J]. Internal Journal of Recent Trends in Engineering,2009,2:213-217.
    Kolcz A, Teo C. Feature weighting for improved classifier robustness[C]//CEAS'09:Sixth Conference on Email and Anti-Spam.2009.
    Kosmopoulos A, Gaussier E, Paliouras G, et al. The ECIR 2010 large scale hierarchical classification workshop[C]//ACM SIGIR Forum. ACM,2010,44(1):23-32.
    Kumar S, Ghosh J, Crawford M M. Hierarchical fusion of multiple classifiers for hyperspectral data analysis[J]. Pattern Analysis & Applications,2002,5(2):210-220
    Lee D L, Chuang H, Seamons K. Document ranking and the vector-space model[J]. Software, IEEE,1997,14(2):67-75.
    Lewis D D. An evaluation of phrasal and clustered representations on a text categorization task[C]//Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM,1992:37-50.
    Lewis D D, Yang Y, Rose T G, et al. Rcvl:A new benchmark collection for text categorization research[J]. The Journal of Machine Learning Research,2004,5:361-397.
    Li J, Zhang K. Keyword extraction based on tf/idf for Chinese news document[J]. Wuhan University Journal of Natural Sciences,2007,12(5):917-921.
    Li T, Zhu S, Ogihara M. Hierarchical document classification using automatically generated hierarchy[J]. Journal of Intelligent Information Systems,2007,29(2):211-230.
    Li Y, Chung S M, Holt J D. Text document clustering based on frequent word meaning sequences[J]. Data & Knowledge Engineering,2008,64(1):381-404.
    Wu H, Wang H, Zong C. Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora[C]//Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics,2008: 993-1000.
    Magnini B, Cavaglia G. Integrating subject field codes into WordNet[C]//Proceedings of LREC-2000, Second International Conference on Language Resources and Evaluation.2000: 1413-1418.
    Magnini B, Strapparava C, Pezzulo G, et al. Using domain information for word sense disambiguation [C]//The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems. Association for Computational Linguistics,2001: 111-114.
    Magnini B, Strapparava C, Pezzulo G, et al. Comparing ontology-based and corpus-based domain annotations in WordNet[C]//Proceedings of the First International WordNet Conference.2002a: 21-25.
    Magnini B, Strapparava C, Pezzulo G, et al. The role of domain information in word sense disambiguation[J]. Natural Language Engineering,2002b,8(04):359-373.
    McCallum A, Nigam K. A comparison of event models for naive bayes text classification [C]// AAAI-98 workshop on learning for text categorization.1998a,752:41-48.
    McCallum A, Rosenfeld R, Mitchell T, et al. Improving text classification by shrinkage in a hierarchy of classes[C]//Proceedings of the Fifteenth International Conference on Machine Learning.1998b:359-367.
    McCallum A, Nigam K, Rennie J, et al. Building domain-specific search engines with machine learning techniques[J].//In Proceedings of the AAAI-99 Spring Symposium on Intelligent Agents in Cyberspace.1999.
    Mihalcea R. Using wikipedia for automatic word sense disambiguation[C]//Proceedings of NAACL HLT.2007,2007:196-203.
    Miller G A. WordNet:a lexical database for English[J]. Communications of the ACM,1995, 38(11):39-41.
    Navigli R. Word sense disambiguation:A survey[J]. ACM Computing Surveys (CSUR),2009, 41(2):10.
    Navigli R, Faralli S, Soroa A, et al. Two birds with one stone:Learning semantic models for Text Categorization and Word Sense Disambiguation[C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM,2011:2317-2320.
    Perinan-Pascual C, Arcas-Tunez F. Deep semantics in an NLP knowledge base[C]//Proceedings of the 12th Conference of the Spanish Association for Artificial Intelligence.2007:279-288.
    Punera K, Rajan S, Ghosh J. Automatically learning document taxonomies for hierarchical classification[C]//Special Interest Tracks and Posters of the 14th International Conference on World Wide Web. ACM,2005:1010-1011.
    Qi X, Davison B D. Hierarchy evolution for improved classification[C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM,2011: 2193-2196.
    Qiu X, Huang X, Liu Z, et al. Hierarchical text classification with latent concepts[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics.2011:598-602.
    Resnik P. Using information content to evaluate semantic similarity in a taxonomy[C]. In Proceedings of the 14th International Joint Conference on Artificial Intelligence(IJCAI'95). 1995:448-453.
    Resnik P. Semantic Similarity in a Taxonomy:An information-based measure and its application to problems of ambiguity in natural language[J]. Journal of Artificial Intelligence Research, 1999,11:95-130.
    Rogati M, Yang Y. High-performing feature selection for text classification[C]//Proceedings of the Eleventh International Conference on Information and Knowledge Management. ACM,2002: 659-661.
    Rousu J, Saunders C, Szedmak S, et al. Kernel-based learning of hierarchical multilabel classification models[J]. The Journal of Machine Learning Research,2006,7:1601-1626.
    Ruotsalo T, Hyvonen E. A method for determining ontology-based semantic relevance [C]// Database and Expert Systems Applications. Springer Berlin Heidelberg,2007:680-688.
    Salton G, Wong A, Yang C S. A vector space model for automatic indexing[J]. Communications of the ACM,1975,18(11):613-620.
    Salton G, Buckley C. Term-weighting approaches in automatic text retrieval[J]. Information Processing & Management,1988,24(5):513-523.
    Savoy J. Feature weighting approaches in sentiment analysis of short text[D]. Universite Joseph Fourier, France,2012.
    Schenker A, Last M, Bunke H, et al. Classification of web documents using a graph model[C]//In Proceedings of the Seventh International Conference on Document Analysis and Recognition. IEEE,2003:240-244.
    Sebastiani F. Machine learning in automated text categorization [J]. ACM Computing Surveys (CSUR),2002,34(1):1-47.
    Silla Jr C N, Freitas A A. A survey of hierarchical classification across different application domains[J]. Data Mining and Knowledge Discovery,2011,22(1-2):31-72.
    Sun A, Lim E P. Hierarchical text classification and evaluation [C]//ICDM 2001, Proceedings of the 2001 IEEE International Conference on Data Mining. IEEE,2001:521-528.
    Tang L, Zhang J, Liu H. Acclimatizing taxonomic semantics for hierarchical content classification[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM,2006:384-393.
    Tang L, Rajan S, Narayanan V K. Large scale multi-label classification via metalabeler[C]// Proceedings of the 18th International Conference on World Wide Web. ACM,2009:211-220.
    Tsoumakas G, Katakis I, Vlahavas I. Effective and efficient multilabel classification in domains with large number of labels[C]//Proceedings of the ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD'08).2008:30-44.
    Ulanov A, Sapozhnikov G, Lyubomishchenko N, et al. Enhancing accuracy of multilabel classification by extracting hierarchies[C]//Proceedings of the 22nd International Workshop on Database and Expert Systems Applications. IEEE Computer Society,2011:203-207.
    Utsuro T, Kida M, Tonoike M, et al. Towards automatic domain classification of technical terms: Estimating domain specificity of a term using the Web[M]//Information Retrieval Technology. Springer Berlin Heidelberg,2006:633-641.
    Vens C, Struyf J, Schietgat L, et al. Decision trees for hierarchical multi-label classification[J]. Machine Learning,2008,73(2):185-214.
    Waltinger U, Mehler A, L6sch M, et al. Hierarchical classification of OAI metadata using the DDC taxonomy[M]//Advanced Language Technologies for Digital Libraries. Springer Berlin Heidelberg,2011:29-40.
    Wang P, Hu J, Zeng H J, et al. Using Wikipedia knowledge to improve text classification[J]. Knowledge and Information Systems,2009,19(3):265-281.
    Wettschereck D, Aha D W, Mohri T. A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms[J]. Artificial Intelligence Review,1997,11(1-5): 273-314.
    Wu F, Zhang J, Honavar V. Learning classifiers using hierarchically structured class taxonomies[M]//Abstraction, Reformulation and Approximation. Springer Berlin Heidelberg, 2005:313-320.
    Yang S M, Wu X B, Deng Z H, et al. Relative term-frequency based feature selection for text categorization[C]//In Proceedings of 2002 International Conference on Machine Learning and Cybernetics. IEEE,2002,3:1432-1436.
    Yang X Q, Sun N, Sun T L, et al. The application of latent semantic indexing and ontology in text classification[J]. International Journal of Innovative Computing, Information and Control,2009, 5(12):4491-4499.
    Yang Y, Pedersen J O. A comparative study on feature selection in text categorization [C]// Proceedings of ICML97, International Conference on Machine Learning. Morgan Kaufmann Publishers,1997:412-420.
    Yang Y, Lu Q, Zhao T. Chinese term extraction using minimal resources[C]//Proceedings of the 22nd International Conference on Computational Linguistics. Association for Computational Linguistics,2008:1033-1040.
    Zahedi M, Sarkardei A. Using MI method for feature weighting to improve text classification performance[J]. World of Computer Science and Information Technology Journal(WCSIT), 2011,1(3):92-95.
    Zhang W, Yoshida T, Tang X. TFIDF, LSI and multi-word in information retrieval and text categorization[C]//IEEE International Conference on Systems, Man and Cybernetics. IEEE, 2008a:108-113.
    Zhang W, Yoshida T, Tang X. Text classification based on multi-word with support vector machine[J]. Knowledge-Based Systems,2008b,21(8):879-886.
    Zhang W, Yoshida T, Tang X. A comparative study of TF* IDF, LSI and multi-words for text classification[J]. Expert Systems with Applications,2011,38(3):2758-2765.
    Zheng Z, Wu X, Srihari R. Feature selection for text categorization on imbalanced data[J]. ACM SIGKDD Explorations Newsletter,2004,6(1):80-89.
    Zhou S, Guan J. Chinese documents classification based on N-grams[M]//Computational Linguistics and Intelligent Text Processing. Springer Berlin Heidelberg,2002:405-414.
    Zhu J, Chen W. Some studies on Chinese domain knowledge dictionary and its application to text classification[C]. Proceedings of SIGHAN4,2005.
    Zins C, Santos P L. Mapping the knowledge covered by library classification systemsfJ]. Journal of the American Society for Information Science and Technology,2011,62(5):877-901.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700