用户名: 密码: 验证码:
基于本体的知识表示及信息检索研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着网络信息的快速增长,基于关键词的传统信息检索技术已逐渐不能满足人们的需求。而知识检索注重知识和语义的匹配,具有较高的查准率和查全率,成为了人工智能领域及信息检索领域的研究热点。
     本体良好的概念层次结构恰好适合知识表示,可以充分描述领域知识模型,反映概念间的语义关系,且支持逻辑推理,因而基于本体的知识检索可以更好的实现语义检索及提高检索准确度。
     本文在对本体的知识表示、本体描述语言、编辑工具及建模方法的深入分析及比较的基础上,设计了计算机科学与技术领域本体的构建过程,依次获取计算机领域本体的知识表示元素:概念、继承关系、属性关系、实例等,对每个步骤中所涉及的算法与技术进行了分析与实现。主要包括:首先在ICTCLAS开源系统基础上二次开发实现批量语料的分词与去除无用词的预处理;其次采用特征词权重计算算法TF-IDF实现了对计算机领域语料库的特征词提取,从而获得计算机领域候选概念集;然后通过计算概念间的相关度构建概念向量,并采用夹角余弦公式计算概念间的相似度,经过人工聚类获得计算机领域继承关系的知识层次结构;最后基于概念继承关系,获取概念与概念间的对象属性、概念的数据属性及属性的限制。完成了以上领域本体知识元素的获取后,采用protege构建了计算机领域本体并进行评价。
     基于计算机领域本体,本文探讨了基于本体的信息检索关键技术。首先分析比较了数据检索、全文检索与知识检索技术的检索特点,指出基于本体的知识检索的优势。其次在本体的通用推理规则和本体典型关系的推理规则的基础上构建了计算机领域本体的一系列领域推理规则,为知识检索系统的推理功能提供支持。最后基于本体提出了一种启发式的查询式扩展算法和流程,以保证信息检索的查全率。
     最后在理论技术研究的基础之上,设计并实现了基于计算机领域本体的论文检索系统实验原型。系统提供了条件检索和导航检索两种检索方式,系统具有良好的语义推理及查询式扩展功能,同时也验证了本文理论技术的正确性。
With the rapid growth of network information,the traditional information retrieval based on keywords technology has gradually can't satisfy people's needs.And knowledge retrieval emphasizing knowledge and semantic matching, with high precision and recall, became the research hotspot of the fileds of artificial intelligence and information retrieval.
     Ontology's good concept hierarchical structure is suitable for knowledge representation. Ontology can fully describe domain knowledge model,reflecting the semantic relations between concepts and supporting logical inference. So the knowledge retrieval based on ontology can be better implementing semantic retrieval and improve retrieval accuracy.
     Based on analysis and comparison of the knowledge representation of ontology, ontology description language, editing tools and modeling method,this thesis presented the computer science and technology domain ontology building process,which ordinal achieved computer domain ontology's knowledge expression elements:concepts, inheritance relationships, attribute relations, instances and so on, and the algorithm and technologies involved were analyzed and implemented for each step.Firstly,a second development based on ICTCLAS open source system, implemented pretreatment of the corpus including word segmentation and removing stop words.Secondly,the feature words were extrcated from computer filed corpus by weight calculation algorithm method TF-IDF,thus they made up computer science filed's candidate concepts.Then through the calculation method of degree of correlation,builded concept of vector model, and used cosine formula to compute the concept similarity,after artificially clustering, the computer domain knowledge inheritance relationship hierarchical structure was obtained.Finally based on concept inheritance relationship, obtained object attribute between concepts, data attribute of concepts and restrictions. After obtained the above domain ontology knowledge representation elements,using protege builded computer domain ontology and evaluated it.
     Based on computer domain ontology, this thesis studied the key technologies of information retrieval based on ontology.First analysised and compared retrieval characteristics of data retrieval,fulltext retrieval and knowledge retrieval technology, and this thesis indicated that the advantages of knowledge retrieval based on ontology.Secondly on the basis of general reasoning rules and typical relationship reasoning rules, builded a series of field computer domain ontology reasoning rules,supporting reasoning functions of knowledge retrieval system.Finally, this thesis proposed a heuristic inquires extended algorithm and processes based on ontology, to ensure the recall rate of information retrieval.
     Finally on the basis of theory technical research and computer domain ontology, designed and implemented a paper retrieval system. The system provided conditions retrieval and navigation retrieval the two retrieval methods. It has functions of good semantic reasoning and query expansion, also verified the correctness of the theory and technology.
引文
[1]中国互联网网络信息中心.第27次中国互联网络发展状况统计报告.http://www.cnnic.cn/research/bgxz/tjbg/201101/t20110120_20302.html.2011-01-18.
    [2]陈康,武港山.基于Ontology的信息检索技术研究[J].中文信息学报,2005(02).
    [3]林碧霞,尹治本.基于领域本体的垂直搜索引擎模型的研究.铁路计算机应用[J].2010,19(11).
    [4]高志强,潘越,马力等.语义Web原理及应用[M].机械工业出版社,2009.9
    [5]ArpirezJ,PerezAG,LozanoA,etal.(Onto)2agent:An Ontology-based WWWBroker to Select Ontologies.In:Go-mez-PerezA,BenjaminsVR,eds.Proceedings of the Workshop on Application of Ontologies and Problem-SolvingMethods UK,1998,16-24
    [6]Ontobroker.http://ontobroker.aifb.uni-karlsruhe.de
    [7]SKC.http://www-db.stanford.edu/skc
    [8]董慧,杨宁,余传明等.基于本体的数字图书馆检索模型研究(Ⅰ)——体系结构解析[J].情报学报,,2006.6:269-275
    [9]董慧,余传明,姜赢等.基于本体的数字图书馆检索模型研究(Ⅱ)——语义信息的提取[J].情报学报,2006.8:451-461
    [10]董慧,余传明,杨宁等.基于本体的数字图书馆检索模型研究(Ⅲ)——历史领域资源本体构建[J].情报学报,2006.10:564-574.
    [11]董慧,余传明,徐国虎等.基于本体的数字图书馆检索模型研究(Ⅳ)——历史领域知识推理机制[J].情报学报,2006.12:666-678.
    [12]陆汝钤,石纯一,张松懋,等.面向Agent的常识知识库.中国科学(E),2000,30(5):453-463
    [13]曹存根.国家知识基础设施的意义.中国科学院院刊,2001,16(4):255-259
    [14]金芝.基于Ontology的自动需求获取.计算机学报,2000,23(5):493-499
    [15]李善平,尹奇韡,胡玉杰,郭鸣,付相君.本体论研究综述[J].计算机研究与发展,2004,(07)
    [16]杜小勇,李曼,王珊.本体学习研究综述[J].软件学报,2006,(09)
    [17]顾芳,曹存根.知识工程中的本体研究现状与存在问题[J].计算机科学.2004,(10)
    [18]胡运发.数据与知识工程导论[M].北京:清华大学出版社.2003
    [19]王钰,袁晓红,石纯一.关于知识表示的讨论[J].计算机学报.1995,18(3):212-224
    [20]史忠植,王文杰.人工智能[M].国防工业出版社.2007
    [21]张仰森,黄改娟.人工智能教程[M].高等教育出版社.2008
    [22]金聪,戴上平.人工智能教程[M].清华大学出版社.2007
    [23]Neches R,Fikes R E,Gruber T R,et al.Enabling Technology for Knowledge Sharing.AI Magazine,1991,12(56):80-91
    [24]Gruber T R.A Translation Approach to Portable ontology Specifications.Knowledge Acquisition,1993(5):199
    [25]Borst W N.Construction of Engineering Ontologies for Knowledge Sharing and Reuse.PhD thesis.University Twente,Enschede,1997:67-72
    [26]Studer R,Benjamins V R,Fensel D.Knowledge Engineering,Principles and Methods.Data and Knowledgeing.l998,25(122):161-197
    [27]Perez A G.,Benjamins V R.Overview of Knowledge Sharing and Reuse Components: Ontologies and Problem Solving Methods[C].In:Stockholm V R,Benjamins B,Chandrasekaran A,eds.Proceedings of the IICAI99 workshop on Ontologies and Problem Solving Methods(KRR5)1999,1-15.
    [28]戴维民等.语义网信息组织技术与方法[M].上海:学林出版社,2008,12.
    [29]Grigoris Antoniou,Frank van Harmelen.Web Ontology Language:OWL.In:Staab S,Studer Reds.Handbook on Ontologies in Information Systems[M]. Springer-Verlag,2003,67-92.
    [30]陆建江,张亚非等.语义网原理与技术[M].科学出版社,2007,3
    [31]李景.本体理论在文献检索系统中的应用研究.北京:北京图书馆出版社,2005:54-55
    [32]宋炜,张铭.语义Web简明教程[M],高等教育出版社,2004
    [33]The ProtegeProject:http://protege.stanford.edu,2002.
    [34]Allemang, Dean.;Hendler, James A.Semantic web for the working ontologist:effective modeling in RDFS and OWL[M]人民邮电出版社,2009
    [35]Gruber T.Towards Principles for the Design of Ontologies Used for Knowledge Sharing[J].International Journal of Human-Computer Studies,1995,43:907-928.
    [36]Natalya F.Noy,Deborah L.McGuinness.Ontology Development 101:A Guide to Creating Your First Ontology[EB/OL]. http://Protege.stanford.edu/publications/ontology_development/ontology101.pdf,2008,12.
    [37]Mike Ushold,Micheal Gruninger. Ontologies Principles,Methods and Applications [J]. Knowledge Engineering Review,1996,11 (2).
    [38]Gruninger,M.and Fox,M.S..Methodology for the Design and Evaluation of Ontologies, Workshop on Basic Ontological Issues in Knowledge Sharing[J].IJCAI-95,Montreal.1995.
    [39]FERNANDEZ,M.,GOMEZ-PEREZ,A.and JURISTO,N.METHONTOLOGY:From Ontological Art Towards Ontological Engineering[C].AAAI-97 Spring Symposium on Ontological Engineering,Stanford University,March 24-26th,1997.
    [40]何克清,何扬帆,王翀.本体元建模理论与方法及其应用[M].科学出版社,2008
    [41]张效祥.计算机科学技术百科全书(第二版)[M].清华大学出版社.2005.11
    [42]王能琴.计算机科学技术汉语叙词表[M].清华大学出版社,1990
    [43]卜书庆.中国分类主题词表电子版[M].北京图书馆出版社.2006
    [44]中国图书馆图书分类法编辑委员会编.中国图书馆图书分类法[M].书目文献出版社.1990.2
    [45]中科院ICTCLAS官方网http://ictclas.org/
    [46]哈尔滨工业大学研发的语言技术平台LTP官方网站http://ir.hit.edu.cn/phpwebsite/index.php?module=announce&ANN_id=116&ANN_user_op=view
    [47]孙建军,成颖等.信息检索技术[M].科学出版社,2004,10
    [48]施聪莺,徐朝军,杨晓江TFIDF算法研究综述[J].计算机应用,2009(29)
    [49]白硕.语言学知识的计算机辅助发现[M].北京:科学出版社,1995
    [50]何琳.古农学本体的半自动构建及检索研究[D].南京农业大学博士论文,2007
    [51]周宁,张玉峰.信息可视化与知识检索[M].科学出版社,2005.10
    [52]张明星.基于课程本体的语义检索研究及应用[D].重庆大学硕士论文,2010
    [53]Kaikuo Xu, Yu Chen, Yexi Jiang, Rong Tang, Yintian Liu, Jie Gong. A Comparative Study of Correlation Measurements for Searching Similar Tags. ADMA 2008
    [54]Chang-Shing Lee,Yuan-Fang Kao,Yau-Hwang Kuo,Mei-Hui Wang.Automated ontology construction for unstructured text documents[J].Data & Knowledge Engineering 60(2007)547-566
    [55]廖明宏.本体论与信息检索.计算机工程,2000,26(2):57
    [56]Jena说明文档http://jena.sourceforge.net/
    [57]Kevin Wilkinson,Craig Sayers etc.Efficient RDF Storage and Retrieval in Jena2HP Laboratories Palo AltoHPL-2003-266 December 16th,2003

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700