用户名: 密码: 验证码:
文本中知识的获取
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
人类通过文字来描述世界、表达思想,文本是人类智慧传承的重要媒介。随着知识经济时代的到来,文档知识管理在学术界和企业界引起了广泛关注。但是文档知识管理系统面临着几个重要问题:如何识别文档主题,如何识别文档中心词;如何对用户所关心的内容进行个性化的关键性提示;如何精确返回用户希望得到信息。关键词获取技术和信息抽取技术是文本处理中的重要技术,可以在一定程度上解决上述问题。本文对基于语义词典的单文本关键词获取技术,信息抽取技术中的规则生成机制进行了研究,主要的研究工作和研究成果包括:
     1)基于语义网络与UW-PageRank算法的词义消歧
     提出了基于语义网络和UW-PageRank结合的知识词义消歧算法,能够对文档中出现的任何词语(同时包含在知识库内)进行实时消歧处理,不需要语料库,无须训练。
     针对中文文本,以HowNet为语义知识库,以义原为节点,义原间的相关性为边的权重构造无向赋权网络,表达文本内容。使用UW-PageRank算法评价义原的权重,进而计算义项的权重;对每一个词语来说,权重最高的义项即为其含义。分别采用全文标注试验与SENSEVAL-3评测集对算法进行了评价。
     针对英文文本,以WordNet为语义知识库,以Synset为节点,Synset间的相关性为边的权重构造无向赋权网络,表达文本内容;使用UW-PageRank算法评价Synset的权重;根据Synset的权重并结合共指词义现象、词义常用性等因素进行词义消歧。在SemCor数据集对算法进行了评测。
     2)基于语义网络与UW-PageRank算法的关键词抽取
     提出了基于语义网络与UW-PageRank算法的单文本关键词抽取算法。在词义消歧的基础上,文本中的所有词语都具有确定的词义,对语义网络进行剪裁,去掉词语的其他义项,此时语义网络中的节点即为该词在文本中的义项,然后使用UW-PageRank公式挖掘出重要的词义,其对应的词语即为文本关键词。
     在对中英文科技论文的手工标注数据集上,与Tf方法进行比较,结果表明了算法的有效性。
     3)启发式的汉语信息抽取规则生成算法——RGA-CIE
     提出了一种启发式的汉语信息抽取系统的规则生成算法——RGA-CIE(RuleGeneration Algorithm for Chinese Information Extraction)。采用有监督的自底向上规则学习过程,能够根据中文的特点进行启发式的逐步泛化,同时采用Laplacian~*算子作为评价生成规则的效果。Laplacian~*算子能够很好的平抑覆盖率与准确率的矛盾;采用语义扩展进一步提高规则的覆盖效果。在自主开发的财经新闻信息抽取系统上,对RGA-CIE算法性能进行评测,生成规则的准确率为0.84,召回率为0.82,性能优于手工编制的规则。此外,将信息抽取技术应用于本体的实例获取,在北京旅游信息查询系统(Travelingin Beijing,TBJ)的领域本体构建过程中起了重要的作用。
Text is one of the most important media for people to describe the world, express their thoughts and diffuse knowledge. Coming with knowledge economy, more and more attention has been paid on text knowledge management by researchers and engineers. But there are still some problems for text knowledge management systems: How to acquire the subject of the texts? How to extract the topic words of the texts? How to high-light personalized important information for different people? How to provide exact information for users? Keyword extraction and information extraction may help to solve these problems, which are important technologies in text processing. This paper focused on keyword extraction from single document and rule generation for information extraction. And main achievements are as following:
     1) Word sense disambiguation based on semantic networks and UW-PageRank
     This paper proposes a word sense disambiguation method based on semantic networks and UW-PageRank, which is able to disambiguate all the words in whole text at one time without corpus and training.
     For Chinese, we use HowNet as knowledge base and build undirected weighted graph which use sememes as vertices and relatedness of sememes as weighted edges. Then UW-PageRank is applied on the graph to score the importance of sememes. Score of each definition of one word can be computed from the score of sememes it contains. Then, the highest scored definition is assigned to the word. This algorithm is tested with text indexing experiment and SENSEVAL-3.
     For English, we use WordNet as knowledge base and build undirected weighted graph which use synsets as vertices and relatedness of synsets as weighted edges. Then UW-PageRank is applied to score the importance of synsets. The highest scored synset is assigned to the word. This algorithm is tested with SemCor corpus.
     2) Keyword extraction based on semantic networks and UW-PageRank
     This paper proposes a keyword extraction method based on semantic networks and UW-PageRank. After word sense disambiguation, one sense is assigned to one word, so the semantic graph can be pruned according to the results with only "right" sense. Then, UW-PageRank is applied to mining the most important senses, i.e. keywords.
     We test our algorithm on manually tagged Chinese and English papers and comparing with Tf algorithm, our algorithm performs better.
     3) Heuristic rule generation algorithm for Chinese information extraction: RGA-CIE
     This paper proposes a heuristic rule generation algorithm for Chinese information extraction: RGA-CIE, which is domain independent for free text of Chinese. RGA-CIE applies supervised learning with bottom-up strategy, which is a rule generalization processwith a heuristic method to decide rule generalization path and Laplacian~* formula toevaluate the performance of rules. And semantic extension is also applied to improve the flexibility of rules. The learned rules have been tested on Commercial News Information Extraction System, and achieve a performance of 0.84 as precision and 0.82 as recall, which is better than the manually wrote rules. We also applied information extraction technology on ontology instance learning and made great contribute to Traveling in Beijing System.
引文
[1]钟义信,信息科学原理,第三版,中国北京,北京邮电大学出版社,2002,p.181-205
    [2]钟义信,关于“信息-知识-智能转换规律”的研究,电子学报,2004年4月,第32卷第4期,p.601-605.
    [3]钟义信,论“信息-知识-智能转换规律”,北京邮电大学学报,2007年2月,第30卷第1期,p.1-8.
    [4]钟义信,人工智能理论:从分立到统一的奥秘,北京邮电大学学报,2006年6月,第29卷第3期,p.1-6.
    [5]钟义信,意识机:理论与模型,电子学报,2000年10月,第29卷10期,p.41-44.
    [6]钟义信,知识论:核心问题——信息、知识、智能的统一理论,电子学报,2001年4月,第29卷第4期,p.526-530.
    [7]钟义信,知识论框架:通向信息-知识-智能统一的理论,中国工程科学,2000年9月,第2卷第9期,p.51-64.
    [8]钟义信,知行学引论——信息知识智能的统一理论,中国工程科学,2004年6月,第6卷第6期,p.2-8.
    [9]Yang Y.,Pedersen J.O.,A comparative study on feature selection in text catego rization,1997.
    [10]Maryam A.,Dorothy E.L.,Review:knowledge management and knowledge management systems:conceptual foundations and research issues Management Information Systems Quarterly,2001.25(1),p.107-136.
    [11]王君,樊治平,一种基于Web的企业知识管理系统的框架模型,东北大学学报(自然科学版),2003年2月,第24卷第2期,p.182-185.
    [12]衣英楠,数字文档管理系统中知识检索的研究,[Dissertation],济南,山东大学,2005
    [13]Ciravegna F.,Challenges in Information Extraction from Text for Knowledge Management IEEE Intelligent Systems and Their Applications,2001.16(6),p.88-90.
    [14]卜书庆,《中国分类主题词表》(第二版)及其电子版手册,北京图书馆出版社,2006,p.4-10
    [15]张茵储,图书馆信息学,中国人民大学出版社,2007,p.75-84
    [16]Mudamalle M.R.,Natural language versus controlled vocabulary in information retrieval:A case study in soil mechanics Journal of American Society fo Information Science,1998.49(10),p.881-887.
    [17]Chu H.,Information Representation and Retrieval in the Digital Age Information Today,2003
    [18]Moens M.F.,Automatic Indexing and Abstracting of Document Texts Kluwer Academic Publishers,2002
    [19]Soergel D.,Indexing and Retrieval Performance:The Logical Evidence Journal of the American Society for Information Science,1994.45(8),p.589-599.
    [20]KJacob E.,Shaw D.,Sociocognitive perspectives on representation Annual Review of Information Science and Technology,1998.33,p.131-185.
    [21]Stokoe C.,Oakes M.P.,Tait J.,Word Sense Disambiguation in Information Retrieval Revisited in Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval Toronto,Canada,2003,p.159-166
    [22]Ramakrishnan G.,Bhattacharyya P.,Using Wordnet Based Semantic Sets for Word Sense Disambiguation and Keyword Extraction in International Conference on Knowledge Based Computer Systems(KBCS 2002)Mumbai India,2002
    [23]Krulwich B.,Burkey C.,Learning user information interests through the extraction of semantically significant phrases in M.H.a.H.Hirsh Proceedings of AAAI 1996 Spring Symposium on Machine Learning in Information Access.California,1996
    [24]郑家恒,卢娇丽,关键词抽取方法的研究,计算机工程,2005年9月,第31卷第18期,p.194-196.
    [25]Turney P.D.,learning to extract keyphrase from text,in NRC Technical Report ERB-1057.1999,National Research Council,Canda.
    [26]Frank E.,Paynter G.W.,Witten I.H.,Domain-Specific Keyphrase Extraction in Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence San Mateo,CA,1999,p.668-673
    [27]Witten I.H.,Paynter G.W.,Frank E.etc.,KEA:Practical automatic keyphrase extraction in Proceedings of the 4th ACM conference on digital libraries Berkeley California US,1999,p.254-256
    [28]Chien L.-F.,PAT-Tree-Based Keyword Extraction for Chinese Information Retrieval in Proceedings of the 20th annual international ACM SIGIR Philadelphia,Pennsylvania,United States,1997,p.50-58
    [29]李素建,王厚峰,俞士汶等,关键词自动标引的最大熵模型应用研究,计算机学报,2004年9月,第27卷第9期,p.1192-1197.
    [30]李有梅,基于词义的关键词抽取方法研究,情报理论与实践,2000年2月,02期,p.81-83.
    [31]索红光,刘玉树,曹淑英,一种基于词汇链的关键词抽取方法,中文信息学报,2006年6月,第20卷第6期,p.25-30.
    [32]Matsuo Y.,Ohsawa Y.KeyWodd:Extracting Keyworlds from a Document as a Small Word in Proceedings of Dsiscovery Science 4th International Conference 2001 p.271-281
    [33]张敏,耿换同,王煦法,一种利用BC方法的关键词自动提取算法研究,小型微型计算机系统,2007年1月,第28卷第1期,p.189-192.
    [34]Mihalcea R.,Graph-based Ranking Algorithms for Sentence Extraction,Applied to Text Summarization in the Companion Volume to the Proceedings of 42nd Annual Meeting of the Association for Computational Linguistics Barcelona,2004,p.170-173
    [35]Mihalcea R.,Tarau P.,Figa E.,Pagerank on Semantic Networks,with Application to Word Sense Disambiguation in Proceedings of the 20th International Conference on Computational Linguistics Geneva,2004,p.1126-1132
    [36]Tarau P.,Mihalcea R.,Figa E.,Semantic document engineering with WordNet and pagerank in Proceedings of the Acm Conference on Applied Computing New Mexico,2005,p.782-786
    [37]Grishman R.,Information Extraction:Techniques and Challenges Springer-Verlag,1997,p.10-27
    [38]Appelt D.,Israel D.J.,Introduction to Information Extraction Technology in IJCAI-99 Stockholm Sweden,1999
    [39]俞士汶,李.陈.,信息抽取研究综述,计算机工程与应用,2003,第39卷第10期,p.1-5.
    [40]Chieu H.L.,Ng H.T.,Lee Y.K.,Closing the Gap:Learning-Based Information Extraction Rivaling Knowledge-Engineering Methods in Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics Sapporo,Japan,2003,p.216-223
    [41]Seymore K.,McCallum A.,Rosenfeld R.,Learning hiddenMarkov model structure for information extraction in Proceedings of the AAAI'99Workshop on Machine Learning forInformation Extraction Orlando,1999,p.37-42
    [42]刘云中,林亚平,陈治平,基于隐马尔可夫模型的文本信息抽取,系统仿真学报,2004年3月,第16卷第3期,p.507-509
    [43]Pietra S.D.,Pietra V.D.,Mercer R.L.etc.,Adaptive Language Modeling Using Minimum Discriminant Estimation in Proceedings of the Speech and Natural Language DAPPA Workshop,1992
    [44]Berger A.L.,Pietra S.A.D.,Pietra V.J.D.,A Maximum Entropy Approach to Natural Language Processing Computaional Linguistics,1996.22(1),p.39-71.
    [45]Lafferty J.,McCallum A.,Pereira F.,Conditional Random Field:Probabilistic Models for Segmentingand Labeling Sequence Data in Proceedings of the International Conference on Machine Learning Williamstown,MA,USA,2001,p.282-289
    [46]李向阳,苗壮,自由文本信息抽取技术,情报科学,2004年7月,第22卷第7期,p.815-821.
    [47]董振东,董强,知网http://keenage.com,2000.
    [48]董振东,董强,郝长伶,知网的理论发现,中文信息学报,2007年4月,第21卷第4期,p.3-9.
    [49]董振东,董强,知网和汉语研究,当代语言学,2001年1月,第三卷第1期,p.33-44.
    [50l 梅家驹,同义词词林,上海辞书出版社,1983
    [51]姚天顺,张俐,高竹,WordNet综述,语言文字应用,2001.(1),p.27-32.
    [52]Brin S.,Page L.,The Anatomy of a Large-scale Hypertextual Web Search Engine Computer Networks and ISDN Systems,1998.30(1-7),p.107-117.
    [53]Page L.,Brin S.,Motwani R.etc.,The PageRank Citation Ranking:Bring Order to the Web Technical report,Standford Digital Library Technologies Project,1998.
    [54]Brin S.,Page L.,Motwami R.etc.,The PageRank Citation:Ranking:Bringing Order to the Web Computer Science Department,Stanford University,Technical Report,1999.
    [55]Leise K.B.T.,The $25,000,000,000 Eigenvector:The Linear Algebra behind Google SIAM Review,2006.48(3),p.569-581.
    [56]Mihalcea R.,Tarau P.,TextRank:Bring Order into Texts in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing Barcelona,Spain,2004,p.404-411
    [57]Grnber T.R.,A translation approach to portable ontology specifications.1992,Stanford University system laboratory,Tech Rep:Logic-92-1.
    [58]Neches R.,Fikes R.E.,Gruber T.R.etc.,Enabling Technology for Knowledge Sharing AI Magazine,1991.12(3),p.36-56.
    [59]Gruber T.R.,Towards Principles for the Design of Ontologies Used for Knowledge Sharing International Journal of Human Computer Studies,1995.43,p.907-928.
    [60]Studer R.,Fensel V.R.B.D.,Knowledge Engineering,Principles and Methods Data and Knowledge Engineering,1998.25(12),p.161-197.
    [61]Guarino N.,Semantic Machining:Formal Ontological Distinctions for Information Organization Extraction,and Integration.,in Information Extraction:A Multidisciplinary Approach to an Emerging information Technology.1997,Springer-Verlag Weinstein.,p.139-170.
    [62]Perez A.G.,Benjamins V.R.,Overview of Knowledge Sharing and Reuse Components:Ontologies and Problem Solving Methods in Proceedings of the IJCAJ-99 workshop on Ontologies and Problem Solving Methods (KRR5),1999,p.1-15
    [63]Bylander T.,Chandrasekaran B.,Generic Tasks in Knowledge-based Reasoning:The Right Level of Abstraction for Knowledge Acquisition,in Knowledge Acquisition for Knowledge Based Systems.,1998,Academic Press:London
    [64]陆汝铃,金芝,陈刚,面向本体的需求分析,软件学报,2000年8月.第11卷第8期,p.1009-1017.
    [65]徐振宁,张维明,陈文伟,基于Ontology的智能信息检索,计算机科学2001年6月,第28卷第6期,p.21-26.
    [66]吴国文,顾宁,施伯乐,利用本体簇生成标准概念模式,计算机研究与发展,2001,第38卷第12期,p.1499-1504.
    [67]Klein M.,Fensel D.,Ontology versioning for the semantic Web in Proceedings of International Semantic Web Working Symposium(SWWS)New York,USA,2001,p.197-212
    [68]Hall P.,Amin N.,Domain Knowledge Evolution in Business and I.T.systems Change Database and Expert Systems Applications,2000 Sept.4(8),p.823-827.
    [69]Vaduva A.,Vetterli T.,Metadata Management for Data Warehousing:an Overview International Journal of Cooperative Information Systems,2001.10(3),p.273-298.
    [70]Chang C.,Garcia-Molina H.,Conjunctive Constraint Mapping for Data Translation in Proceedings of the Third ACM International Conference on DIGITAL libraries Pitsburgh,Pa,1998,p.49-58
    [71]S.Chawathe,Garcia-Molina H.,Ireland J.H.K.etc.,The TSIMMIS Project:Integration of Heterogeneous Information Sources in 16th Meeting of the Information Processing Society of Japan,1994,p.7-18
    [72]Guha R.V.,Contexts:A Formalization and Some Applications [Dissertation],Stanford University,1991
    [73]Weinstein E.,Binningham W.,Creating Ontological Metadata for Digital Library Content and Services International Journal on Digital Libraries,1998,p.20-37.
    [74]Noy N.F.,Musen M.A.,SMART:Automated Support for Ontology Merging and Alignment in Proceedings of the Twelfth Workshop on Knowledge Acquisition,Modeling,and Management(KAW' 99)Banff,Canada,1999
    [75]Noy N.F.,Musen M.A.,An Algorithm for Merging and Aligning Ontologies:Automation and Tool Support in Proceedings of the Workshop on Ontology Management at the Sixteenth National Conference on Artificial Intelligence(AAAI-99)Orlando,FL,1999
    [76]李涓子,汉语词义排歧方法研究[Dissertation],北京,清华大学,1999
    [77]Specia L.,A Hybrid Model for Word Sense Disambiguation in English-Portuguese Machine Translation in Proceedings of the 8th Research COlloquium of the UK Special-interest Group in Computational Linguistics Manchester,2005,p.71-78
    [78]Ramakrishnanan G.,Bhattacharyya P.,Text Representation with WordNet Synsets Using Soft Sense Disambiguation in Proceedings of 8th International Conference on Applications of Natural Language to Information Systems Burg,2003,p.214-227
    [79]Montoyo A.,Suarez A.,Rigau G.etc.,Combining Knowledge- and Corpus-based Word-Sense-Disambiguation Methods Journal of Machine Learning Research,2005.23,p.299-330.
    [80]卢志茂,刘挺,李生,统计词义消歧的研究进展,电子学报,2006年2月,第34卷第2期,p.333-343.
    [81]Bruce R.,Wiebe J.,Word Sense Disambiguation Using Decomposable Models in Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics(ACL'94)Las Cruces,US,1994,p.139-145
    [82]Suarez A.,Palomar M.,Feature Selection Analysis for Maximum Entropy-based WSD in Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics Mexico City,Mexico,2002,p.146-155
    [83]Yarowsky D.,Decision Lists for Lexical Ambiguity Resolution:Application to Accent Restoration in Spanish and Fresh in Proceedings of the 32th Annual Meeting of the Association for Computational Linguisitcs(ACL'1994)Las Cruces,1994
    [84]Agirre E.,Martinez D.,Decision Lists for English and Basque in Proceedings of the SENSEVAL-2 Workshop in conjuction with ACL'200]/EACL'2001 Toulouse,France,2001
    [85]Escudero G.,Marquez L.,Rigan G.,Boosting Applied to Word Sense Disambiguation in Proceedings of the 12th Conference on Machine Learning ECML2000 Barcelona,Spain,2000
    [86]Cabezas C.,Stevens P.R.J.,Supervised Sense Tagging using Support Vector Machines in Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems(SENSEAL-2)Toulouse,France,2001
    [87]卢志茂,刘挺,郎君等,神经网络和贝叶斯网络在汉语词义消歧上的对比研究,高技术通讯,2004年8月,p.15-19.
    [88]刘风成,黄德根,姜鹏,基于AdaBoost MH算法的汉语多义词消歧中文信息学报,2006年3月,第20卷第3期,p.6-13.
    [89]Wang X.,Matsumoto Y.,Trajectory Based Word Sense Disambiguation in Proceedings of the 20th international conference on Computational Linguistics Geneva,Switzerland,2004
    [90]Qin Y.,Wang X.,Rule Selection in Word Sense Disambigutaion Using Adaboost in 2005 International Conference on Natural Language Processing and Knowledge Engineering Proceedings(NLP-KE 2005)Wuhan,2005,p.26-29
    [91]Kohomban U.,Lee W.S.,Learning Semantic Classes for Word Sense Disambiguation in Proceedings of the 43rd Annual meeting of the Association for Computational Linguistics Ann Arbor,2005,p.34-41
    [92]全昌勤,何婷婷,姬东鸿等,基于指示词的词义消歧方法,计算机工程,2005年8月,第31卷第16期,p.48-50.
    [93]陈浩,何婷婷,姬东鸿,基于k-means聚类的无导词义消歧,中文信息学报,2005年4月,第19卷第4期,p.10-16.
    [94]卢志茂,刘挺,丁江伟等,基于依存分析和贝叶斯网络的无指导汉语词义消歧,高技术通讯,2004年2月,p.7-11.
    [95]刘挺,卢志茂,李生,一个全文词义自动标注系统的实现,哈尔滨工业大学学报,2005年12月,第37卷第12期,p.1603-1605.
    [96]Lesk M.,Automated Sense Disambiguation Using Machine-readable Dictionaries:How to Tell a Pine Cone from an Ice Cream Cone in Proceedings of the 1986 SIGDOC Conference,Association for Computing Machinery Toronto,Canada,1986,p.24-26
    [97]Wilks Y.,Fass D.,Guo C.-m.etc.,Providing Machine Tractable Dictionary Tools Semantics and the Lexicon Kluwer Academic Publishers,1993,p.341-401
    [98]Ramakrishnan G.,Prithviraj B.,Bhattacharyya P.,A Gloss-centered Algorithm for Disambiguation in Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text Barcelona,Spain,2004,p.217-221
    [99]Rigau G.,Atserias J.,Agirre E.,Combining unsupervised lexical knowledge methods for word sense disambiguation in Proceedings of the 35th annual meeting on Association for Computational Linguistics Madrid,Spain,1997,p.48-55
    [100]Sussna M.,Word Sense Disambiguation for Free-Text Indexing using a Massive Semantic Network in Proceedings of the Second International Conference on Information and Knowledge Based Management(CIKM'93)Arlington,VA,1993,p.67-74
    [101]Agirre E.,Rigau G.,Word Sense Disambiguation Using Conceptual Density in Proceedings of the 16th International Conference on Computational Linguistic(COLING'96)Copenhagen,Denmark,1996,p.16-22
    [102]Agirre E.,Rigau G.,A Proposal for Word Sense Disambiguation Using Conceptual Distance in Proceedings of the First International Conference on Recent Advances in Natural Language Processing Tzigov Chark,Blugaria,1995
    [103]Kleinberg J.M.,Authoritative sources in a hyperlinked environment Journal of the ACM(JACM),1999.46(5),p.604-632.
    [104]Heckerman D.,A tutorial on Learning with Bayesian Networks,in Learning in Graphical Models 1999,MIT Press:Cambridge,MA,USA,p.301-354
    [105]闫蓉,张蕾,一种新的含义词义消歧方法,计算机技术与发展,2006年3月,第16卷第3期:,p.22-25.
    [106]龚永恩,袁春风,武港山,基于语义的词义消歧算法初探,计算机应用研究,2006年3月,p.41-43.
    [107]Rada R.,Mili H.,Bickell E.etc.,Development and application of a metric on semantic nets IEEE Transactions on Systems,Man and Cybernetics,1989.19,p.17-30.
    [108]Leacock C.,Chodorow M.,Combining local context and WordNet similarity for word sense identification Mit Press,1998,p.265-283
    [109]Wu Z.,Palmer M.,Verb semantics and lexical selection in Proceedings of 32nd Annual Meeting of the Association for Computational Linguistics Las Cruceexico,1994,p.133-138
    [110]Carpuat M.,Wu D.,Evaluating the Word Sense Disambiguation Performance of Statistical Machine Translation in Second International Joint Conference on Natural Language Processing:Companion Volume to the Proceedings of Conference Jeju Island,Republic of Korea,2005,p.120-125
    [111]Turney P.D.,Coherent keyphrase extraction via Web mining in Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence(IJCAI-03)Acapulco,Mexico,2003,p.434-439
    [112]Turney P.D.,Mining the Web for synonyms:PMI-IR versus LSA on TOEFL in Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001)Freiburg,Germany,2001,p.491-502
    [113]杨文峰,李星,PAT-TREE统计语言模型与关键词自动提取,计算机工程与应用,2001,第15期,p.17-19.
    [114]Yang W.,Chinese keyword extraction based on max-duplicated strings of the documents in Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval Tampere Finland,2002,19.439-440
    [115]程岚岚,何丕廉,孙越恒,基于朴素贝叶斯模型的中文关键词提取算法研究,计算机应用,2005年12月,第二十五卷第12期,p.2780-2782.
    [116]刘群,李素建,基于《知网》的词汇语义相似度计算,Computaional Linguistics Chinese Language Processing,2002,第7卷第2期,p.59-76.
    [117]Watts D.J.,Strogatz S.H.,Collective dynamics of 'small world' networks Nature,1998.6.,393,p.440-442.
    [118]McCallum A.,Freitag D.,Pereira F.,Maximum Entropy Markov Models for Information Extraction and Segmentation in Proceedings of 17th International Conference on Machine Learning San Francisco,CA,2000,p.591-598
    [119]Pinto D.,McCallum A.,Wei X.etc.,Table Extraction Using Conditional Random Fields in Proceedings of the 2003 annual national conference on Digital government research Boston,MA,2003,p.1-4
    [120]Sha F.,Pereira F.,Shallow parsing with conditional random fields in Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology Edmonton,Canada,2003,p.134-141
    [121]Zhu G.,Bethea T.J.,Krishna V.,Extracting relevant named entities for automated expense reimbursement in Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining San Jose,California,USA,2007,p.1004-1012
    [122]Viola P.,Narasimhan M.,Learning to extract information from semi-structured text using a discriminative context free grammar in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval Salvador,Brazil,2005,p.330-337
    [123]Peng F.,McCallum A.,Information extraction from research papers using conditional random fields Information Processing and Management:an International Journal,July 2006.42(4),p.963-979.
    [124]李蕾,周延泉,王菁华,基于全信息的中文信息抽取系统及应用,北京邮电大学学报,2005年12月,第28卷第6期,p.48-51+64
    [125]Sodland S.,Learning Information Extraction Rules For Semi Structured And Free Text Machine Learning Special issue on natural language learning Pages,1999.34,p.233-272.
    [126]Kushmerick N.,Wrapper induction Efficiency and Expressiveness Artificial Intelligence Journal,2000.118(1-2),p.15-68.
    [127]Riloff E.,Automatically Constructinga Dictionary for Information Extraction Task in Proceedings of llth National Conference on Artificial Intelligence,1993,p.811-816
    [128]Kim J.,Moldovan D.,Acquisition of Linguistic Patterns for Knowledge-based Information Extraction IEEE' Transactions on knowledge and Data Engineerin,1995.7(5),p.713-724.
    [129]Soderland S.,Fisher D.,Aseltine J.etc.,CRYSTAL:Inducing a Conceptual Dictionary.in Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence,1995,p.1314-1321
    [130]Huffman S.,Learning Information Extraction Patterns from Example in Proceedings of IJCAI 1995 Workshop on New Approaches to Learning for Natural Language Processing,1995,p.127-142
    [131]Riloff E.,Shoen J.,Automatically Acquiring Conceptual Answer Patterns Without an Annotated Corpus in Proceedings of the Third Workshop on Very LargeC orpora,1995,p.148-161
    [132]Chai J.Y.,.Biermann A.W.,The Use of Lexical Semantics in Information Extraction in Proceedings of Workshop on Automatic IE and Building of Lexical Semantic Resources(ACL-97)Madrid,1997,p.61-70
    [133]Yangarber R.,Grishman R.,Tapanainen P.etc.,Automantic Acquisition of Domain Knowledge of Information Extraction in Proceedings of the18th International Conference on Computational Linguistics(COLING 2000)Saarbriicken Germany,2000,p.940-946
    [134]M.Mitchell著,曾华军,张银奎等译,机器学习,机械工业出版社,2003年1月第一版,p.14-37
    [135]姚天顺,自然语言理解——一种让机器懂得人类语言的研究,清华大学出版社,2002年第二版,p.186-190
    [136]Freitag D.,Machine Learning for Information Extraction in Informal Domain[Dissertation],Pittsburgh,PA,1998
    [137]郑家恒,王兴义,李飞,信息抽取模式自动生成方法的研究,中文信息学报,2004年7月,第18卷第一期,p.48-54.
    [138]黄红华,俞勇,CWIWSK-从半结构化中抽取信息的归纳规则方法,上海交通大学学报,2003年3月,第37卷第三期,p.424-427.
    [139]姜吉发,自由文本的信息抽取模式获取的研究[Dissertation],北京,中国科学院研究生院(计算技术研究所),2004

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700