基于中文文本的本体构建方法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

NSTL服务站

基于中文文本的本体构建方法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Ontology Construction Methods Based on Chinese Text
作者：刘威
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：本体 ; 本体学习 ; 本体自动扩充 ; 混合策略
英文关键词：Ontology ; Ontology Learning ; Automatic Ontology Population ; Mixed Strategy
学位年度：2008
导师：王慧强
学科代码：081202
学位授予单位：哈尔滨工程大学
论文提交日期：2008-02-01

摘要

语义Web的存在、研究、和运作的基础是形式化本体。本体是对可共享概念的一个形式化的明确说明,它包含对某个领域的概念及概念间的关系的描述和约束。自20世纪90年代提出这个概念以来,本体受到了国内外越来越多的关注,但本体研究实际上还处于初步阶段,其理论和方法都有待于进一步完善。特别是现阶段的本体构建需要耗费大量的人力、物力和财力,时间周期也很长。因此,本体的有效构建成为本体研究乃至语义Web研究的瓶颈。探讨构建领域本体的有效途径,成为了一个无法回避的问题。
     本文围绕中文本体的构建方法进行了讨论和研究。首先对本体和本体学习基础知识进行了简单的介绍,给出了当今国内外本体构建的主要方法以及评价标准,介绍了目前比较流行的几种本体学习工具。
     其次,针对传统本体资源构建方式的不足,本文提出了基于统计和规则混合策略的本体获取方法,描述了整个方法的框架和两个关键子模块框架,并对此方法进行了合理性分析。然后讨论了在这个框架下的几个关键技术问题:语料获取与预处理,术语抽取,关系抽取,并分别对这些问题的解决方案作了详细介绍。
     再次,本文提出了基于决策树的本体自动扩充方法,将本体自动扩充的主要任务定位在实例的概念分类上,从已有的本体库中获取实例作为训练样本构建规则的决策树,这组规则可以用于指导丰富本体知识。
     最后,对本文提出的本体获取方法进行了初步的试验,对试验结果进行了分析,评价了这种方法的优缺点。
The existence, research, and operation of Semantic Web are based on formalized ontology. Ontology is formalized, explicit description and constraint on the shared concepts and their relations. Since ontology was proposed from 90's in 20 centuries, it has been concerned by more and more domestic and international scholars. However, research on ontology has just begun, and its theory, methods need further development. Especially, in the current time, ontology construction cost much labors, money and time. So the ontology construction with high efficiency becomes the neck of ontology and Semantic Web research. How to construct ontology effectively is an unavoidable problem.
     This paper focuses on the acquisition of Chinese ontology with discussion and study. Firstly, the ontology and ontology learning are summarized, including the major methods and evaluation standard of ontology learning, and some kind of ontology learning tools that are widely used currently.
     Secondly, to solve the problems of traditional methods of constructing ontology, the method based on statistic and rule has been proposed in the paper. The general and the two key sub-frameworks of the method are built, and the feasibility analyses are carried out. The key technical problems under the frameworks, such as the acquisition and pre-process of corpus, the extraction of nomenclatures and relations are discussed. What is more, every solution of the problem is described in detail.
     Thirdly, the method of automatic ontology population based on decision tree is presented in this paper. The major task of automatic ontology population is the conceptual classification of instances. The instances are acquired from the ontology and they are used as training samples to build a decision tree. This group of rules can be used to guide and enrich ontology knowledge.
     Finally, based on the investigation above, a primary test is carried out, and the evaluation of the method has been made after analyzing the test result carefully.

引文

[1]T Berners-Lee,J Hendler,O Lassila.The semantic Web.Scientific American,2001,284(5):34-43P
    [2]T Berners-Lee.Semantic Web road map.1998.http://www.w3.org/DesignIssues/SemanticHtml
    [3]GruberTR.A Translation Approach to Portable Ontology Specifications Knowledge Acquisition,1993,5(3):199-220P
    [4]Fonseca,F.Egenhofer,Agouris P.Using.Ontologies for Intergrated Geographic Information Systems.Transactions in GIS,2002,6(1):231-257P
    [5]Neches R,Fikes R E,Gruber T R.Enabling Technology for Knowledge Sharing.AI Magazine,1991,12(3):16-56P
    [6]RDF[EB/OL].http://www.w3.org/RDF/2004-8-17
    [7]DAML[EB/OL].2004.http//www.daml.org/asd
    [8]Gruber T R.Towards Principles for the Design of Ontology Used for Knowledge Sharing.International Journal of Human-Computer Studies,1995,43(5):907-928 P
    [9]罗盛芬,孙茂松.基于字符串内部结合紧密度的汉语自动抽词实验研究.中文信息学报,2003,17(3):9-14页
    [10]郑家恒,卢姣丽.关键词抽取方法的研究.计算机工程,2005,31(18):194-196页
    [11]马鹏举,朱东波,丁玉成等.基于模糊层次分析方法(F-AHP)的盟员优化选择算法[J].西安交通大学学报,1999,33(7):108-110页
    [12]Hearst M A.Automated discovery of WordNet relations[A].Fellbaum C,ed WordNet:An Electronic Lexical Database[C].Cambridge,MA:MIT Press,1998:131-151P
    [13]Srikant R,Agrawal R.Mining generalized association rules[A].proc Very LargeDataBase[C].SanFrancisco:MorganKaufmannPublishers,1995:407-419P
    [14]邓志鸿,唐世渭,张铭,杨冬青,陈捷.Ontology研究综述.北京大学学报(自然科学版),2002,38(5):730-738页
    [15]James Allen.自然语言理解.北京:电子工业出版,2005
    [16]明仲.本体的继承及构造方法研究.中山大学博士学位论文,2004:16-25页
    [17]F.Baader,C.Lutz,M.Milicic,U.Sattler,and F.Wolter.A description logic based approach to reasoning about web services.In Proceedings of the WWW 2005 Workshop on Web Service Semantics(WSS2005),Chiba City,Japan,2005
    [18]Guarino N.Semantic Matching:Formal Ontological Distinctions for Information Organization,Extraction and Integration.A Multidisciplinary Approach to an Emerging Information Technology.1997,12:139-173P
    [19]Yigal Arens,Chin Y.Chee,Chun-Nan Hsu,and Craig A.Knoblock Retrieving and integrating data from multiple information sources International Journal of Cooperative Information Systems,1993,2(2):127-158P
    [20]陆汝钤.世纪之交的知识工程与知识科学.第一版.北京:清华大学出版社.2001:447-465页
    [21]倪政林.基于分类查询的语义Web服务本体发现研究.合肥工业大学硕士学位论文,2006
    [22]倪欢,语义门户网站本体查询技术研究.河海大学硕士学位论文,2006
    [23]唐杰,梁邦勇,李涓子等.语义Web中的本体自动映射.计算机学报,2006,29(11):1956-1976页
    [24]杜文华.本体构建方法比较研究.情报杂志,2005,24(10):24-25页
    [25]方卫东,袁华,刘卫红.基于Web挖掘的领域本体自动学习.清华大学学报(自然科学版),2005,(S1)
    [26]Farquhar A,Fikes R,Rice J.The Ontolingua server:A tool for collaborative ontology construction.Journal of Human- Computer Studies,1997,46(6):707-727P
    [27]Jian Sun,et al.Chinese Named Entity Identification Using Class-based Language Model.Proceedings of the 19th International Conference on Computational Linguistics 2002
    [28]GuoDong Zhou and Jian Su.Named Entity Recognition using an HMM-based Chunk Tagger.In Proceedings of the 40th Annual Meeting of the ACL,Philadelphia,2002:473-480P
    [29]Sekine S,Grishman R,and Shinou A decision tree method for finding and classifying names in Japanese texts.Proceedings of the Sixth Workshop on Very Large Corpora,Montreal,Canada,1998
    [30]黄卿贤,胡谷雨,王立峰.本体的概念、建模与应用[J].解放军理工大学学报(自然科学版),2005,(02)
    [31]徐力斌,刘宗田,周文,宋二伟.基于WordNet和自然语言处理技术的半自动领域本体构建.计算机科学,2007,34(6):219-222页
    [32]Andrew Brent Williams,Costas Tsatsoulis.An Instance-based Approach for Identifying Candidate Ontology Relations within a Multi-Agent System.IJCAI'99
    [33]Y Ding,S.Foo.Ontology Research and Development.Journal of Information Science,North-Holland,2002
    [34]Harith Alani,Sanghee Kim,David E.Millard,Mark J.Weal,Wendy Hall,Paul H.Lewis,and Nigel R.Shadbolt.Automatic Ontology-Based Knowledge Extraction from Web Documents.IEEE Intelligent System,2003,18(1)
    [35]孔敬.本体学习:原理、方法与相关进展.情报学报,2006,25(6):657-665页
    [36]U.Shah,T.Finin and J.Mayfield.Information Retrieval on the Semantic Web.Proceedings of the eleventh international conference on Information and knowledge management,McLean,Virginia,USA,2002
    [37]黄美丽,刘宗阳.基于形式概念分析的领域本体构建方法研究.计算机科学,2006,33(1):210-212,239页
    [38]Wang J,Wen J,Lochovsky F,MaW.Instance-Based schema matching for web databases by domain-specific query probing.In:Mario AN,et al,eds.Proc.of the VLDB 2004.San Francisco:Morgan Kaufmann Publishers,2004
    [39]Missikoff M,Navigli R,Velardi P.Integrated approach for web ontology learning and engineering.IEEE Computer,2002,35(11):60-63P
    [40]Navigli R,Velardi P,Gangemi A.Ontology learning and its application to automated term inology translation.IEEE Intelligent Systems,2003,18(1):22-31P
    [41]陈坚,何洁月.RDF可信度扩展在领域本体构建中的应用.计算机技术与发展,2006,16(1):120-122页
    [42]苗壮,张亚非,陆建江.本体的半自动构建技术.解放军理工大学学报(自然科学版),2006,7(5):426-431页
    [43]Xu,Kurz D,Piskorski J,Schmeier S.A domain adaptive approach to automatic acquisition of domain relevant terms and their relations with bootstrapping.In:Proc.of the LREC2002
    [44]Han J,Kamber M,Fan M,Meng XF.Data Mining:Concepts and Techniques.Beijing:China Machine Press,2001(in Chinese).
    [45]John Davies,Dieter Fensel and Frank van Harmelen.Towards The Semantic Web:Ontology-driven Knowledge Management.Data and Knowledge Engineering.2002,25(102):145-160P

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700