用户名: 密码: 验证码:
基于语义元数据的分布式异构数据库集成研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着生物工程领域各种技术的飞速发展,生物数据呈指数级增长。如何对这些分布、异构、自治的生物数据库进行快速有效的整合查询成为生物研究专家面临的一个难点。
     为了解决目前生物数据集成查询中遇到的问题,作者所在课题组提出了一个基于语义元数据的数据资源整合方案。该方案将要查询的各分布数据库的元数据按照统一的标准集成到一个元数据库中,使用一个领域本体与元仓库建立映射生成语义元数据,利用语义元数据解决异构数据库之间的结构异构和语义异构,实现对各生物数据库的集成查询。该方案的最终目标是通过对有关内容的研究,解决数据资源整合的共性问题,建立一个通用的数据共享与整合平台,形成面向特定主题的、元数据集中、基础数据分布的虚拟中心数据库,支持在多个领域的应用。
     目前课题组已经建立了相应的元数据库,开发了元数据导入与管理工具。在此基础上,本文主要对以下内容进行了研究:
     1)利用本体知识库与数据库E/R模型的相似性,提出把本体与元数据建立映射生成语义元数据,并把语义元数据用于数据集成,用于解决多个数据库间的结构异构和语义异构两个方面的难题。
     2)研究了如何利用本体构建知识库的推理功能,通过在生成语义元数据时建立的本体与元数据的映射,对用户的查询进行推理扩展,从而帮助提高系统的查全率和查准率。3)基于多个数据源的物理分布和逻辑分布性的特点,设计了有效的查询计划生成算法,根据此算法,可以把用户的查询转化为一个对多个数据源的查询计划,并通过执行这个计划,不仅能保证用户查询结果的精确性,而且能保证用户查询结果的完整性。
     通过以上研究,本文设计并实现了基于语义元数据的分布数据库集成原型系统SeMDIS,用户使用此系统可以基于本体对分布的异构数据库实现透明访问。通过对系统的应用证明达到了研究目的,为课题下一步的研究打下了基础。
With the development of many technologies in biologic research, biological data show a rapid increase on exponential series. The integration of the query for distributed heterogeneous autonomous databases becomes a major problem to biologist.
     An integral scheme of resource data based on semantic metadata is proposed in order to solve the current problem in query of biologic data. In our integral scheme, the metadata distributed in all databases will be integrated to one metadata database in a unified standard, and semantic metadata will be built by mapping an ontology to the metadata database. The semantic metadata will be used to solve the structural heterogeneity and the semantic heterogeneity among the distributed heterogeneous autonomous databases. The final goal of the scheme is to solve the common problem of integrating data resources by a sharable integrating data platform. This platform helps to form a virtual center database that faces the specific topic, the focused metadata and disturbed basic data, and thus to support relating researches in various fields.
     At present, our team has built relative metadata databases, tapped the tools to import and manage metadata. Based on the team project, this paper focuses mainly on the following aspects.
     1. Research on the comparability between the KB (knowledge base) of the ontology and the E/R model of the relational database and propose to build the mapping of them to produce semantic metadata; The use of the semantic metadata to solve the structural heterogeneity and the semantic heterogeneity among the distributed heterogeneous autonomous databases;
     2. Research on how to expand the query to improve the query recall and query precision by using the reasoning ability of KB;
     3. The design of a high-powered algorithm based on the physical distributing and logistic distributing of the databases, which can change the query to a query plan and make the query result integrated and accurate.
     The author has finished the design of SeMDIS (Database Integration System Based on Semantic Metadata). By using SeMDIS, the users can query the distributed heterogeneous autonomous databases pellucidly. Its application show that SeMDIS is significant to solve the problem in data integration and make a good basement for the progress of the team.
引文
[1] HGP. http://www.ornl.gov/sci/techresources/Human_Genome/home.shtinl.
    [2] Dennis A. Benson. Hene Karsch-Mizrachi. David J. Lipman. James Ostell. And David L. Wheeler. GenBank. Nucleic Acids Res, 2003, 31(1):23-27.
    [3] Brigitte Boeckmann. Amos Bairoch. Rolf Apweiler, Marie-Claude Blatter. Anne Estreicher, Elisabeth Gasteiger, Maria J. Martin. Karine Michoud ClaireO'Donovan, Isabells Phan. Sandrine Pilbout. and Michel Schneider. TheSWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.Nucleic Acids Res, 2003, 31(l):365-370.
    [4] John Westbrook. Zukang Feng, Li Chen. Huanwang Yang. Helen M. Berman. The protein Data Bank and structural genomics. Nucleic Acids Res, 2003, 31(1):489-491.
    [5] NCBI GenBank Statistics. http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html.
    [6]贾焰,王志英,韩伟红,李霖.分布式数据库技术.国防工业出版社,2001.7.
    [7]孔祥翠,耿玉水,赵中华.异构数据库集成技术的研究.山东轻工业学院学报,2008,22(2).
    [8]都志辉.网格计算:支持全球化资源共享与协作的关键技术.华中科技大学出版社.2005.6.
    [9]云计算.hUp://baike.baidu.corn/view/1316082.htm.
    [10]宋晓宇,王永会.数据集成与应用集成.中国水利水电出版社,2008.7.
    [11]邓志鸿,唐世渭,张铭,杨冬青,陈捷.Ontology研究综述.北京大学学报(自然科学 版),2002,38(5).
    [12]Doina Caragea, Jie Bao, Jyotishman Pathak. Adrian Silvescu, Carson Andorf.Drena Dobbs, Vasant Honavar. Information Integration from SemanticallyHeterogeneous Biological Data Sources. Database and Expert SystemsApplications, 2005.
    [13]Gene Ontology, http://www.geneontologty.org, 2005.2.
    [14]PRO. http://pir.georgetown.edu/pro/.
    [15]MBO. http://igd.rz-berlin.mpg.derwww/oe/mbo.html.
    [16]Marchler-Bauer A., Anderson J. B., Fedorova N., DeWeese-Scott C, Geer L. Y.,Hurwitz D.. Jackson J. J., Jacobs A., Lanczycki C, Liebert C. A., Madej T.,Marchler G. H., Mazumder R.. Nikolskaya A., Panchenko A. R., Shoemaker B. A..Song J.. Sridhar Rao B., Thiessen P. A., Vasidevan S., Wang Y., Yamashita R. A..Yin J. and Bryant S. H.. MMDB: Entrez's 3D-structure database. Nucleic AcidsRes, 2002(30):249-252.
    [17]INDUS, http://www.cild.iastate.edu/software/indus.html.
    [18]TAMBIS. http://www.cs.manchester.ac.uk/~stevensr/tambis/index.html.
    [19]C. A. Goble, R. Stevens, G. Ng., S. Bechhofer, N. W. Paton, P. G. Baker, M. Peim,A. Brass. Transparent access to multiple bioinformatics information sources. IBMSYSTEMS JOURNAL, 2001,40(2).
    [20]Jacob Kohler, Steffen Schulze-Kremer. SEMEDA (Semantic Meta-Database):Ontology Based Semantic Integration of Biological Databases. In silico biology, 2002,2.
    [21]曹顺良,张忠平,李荣,朱杨勇,李亦学.BioDW-一个生物信息学数据集成系统.微计算机应用,2006,26(1):184~187.
    [22]Dartgrid. http://ccnt.zju.edu.cn/projects/dartgrid/intro.html.
    [23]Patricia G. Baker, Carole A. Goble, Sean Bechhofer, Norman W. Paton, Robert Stevens, Andy Brass. An ontology for bioinformatics applications. Oxford University Press, 1999,115(6):510-520.
    [24]陈京民.数据仓库与数据挖掘技术(第二版).电子工业出版社,2007.
    [25]Hector Garcia-Molina,.Jeffrey D.IJIIman,Jennifer Widen著.数据库系统实现.杨冬青,唐世谓,徐其钧等译.机械工业出版社,2007.6.
    [26]王珊,萨师煊.数据库系统概论.高等教育出版社,2006.5.
    [27]岳丽华,杨冬青,龚育昌,唐世谓,徐其钧.数据库系统全书.机械工业出版社,2003.  10.
    [28]Gruber TR.. A translation approach to portable ontology specifications. Tech nical Report, Knowledge System Laboratory,1993:57-70.
    [29]陆建江.语义网原理与技术.科学出版社,2007-3.
    [30]冯志勇,李文杰,李晓红.本体论工程及其应用.清华大学出版社,2007.5.
    [31]Franz Baader, Diego Calvanese, Deborah McGuinness, Daniele Nardi. Peter Patel-Schneider. The Description Logic Handbook: Theory, Implementation a nd Applications. Cambridge University Press,2003.4.
    [32]OMG. Common Warehouse Metamodel Specification version 1.0. http://www.o mg.org/docs/formal/00-04-03.pdf.
    [33]黎建辉,佘怀化,阎保平.基于元数据的关系数据库语义集成方法.计算机工程,2008,34(6):54~56.
    [34]Jolm Poole著.公共仓库元模型-数据仓库集成标准导论.彭蓉,何璐璐等译.机械工业出版社,2004-3.
    [35]李姗姗.基于CWM的元数据管理的研究——元数据转换工具的设计与实现.国防科技大学硕士学位论文,2003.
    [36]林毅.基于元数据的蛋白质组数据资源整合关键技术研究与应用平台开发——元数据自动提取、导入与检索工具的设计.国防科技大学硕士学位论文,2008.
    [37]汪昌健.基于CWM的元数据管理的研究——建模工具的设计与实现.国防科技大学硕士学位论文,2003.
    [38]刘文杰.基于元数据的蛋白质组数据资源整合关键技术研究与应用平台开发一一元数据库及元数据查询工具研究与开发.国防科技大学硕士学位论文,2008.
    [39]郭超.基于元数据的分布式异构数据库集成查询工具研究与开发.国防科技大学硕士学位论文,2009.
    [40]沈建人.查准率和查全率之间的关系.情报探索,2006.4.
    [41]RDF.hRp:Hwww.w3.org/RDF/.
    [42]OWL.http://www.w3.ore/TR/owl-features.
    [43]石莲,孙吉贵.描述逻辑综述.计算机科学.2006,33(1):194~225.
    [44]Jena.http://jena.sourceforge.net/.
    [45]刘文杰,宁洪,王挺,林毅.面向蛋白质组学数据库的元数据提取与导入工具.计算机工程与科学,2009.1.
    [46]SPARQL.hRp://www.w3.org/TR/rdf-sparql-query/.
    [47]顾华建.基于本体的语义查询技术研究.河海大学硕士论文.2007.
    [48]JavaCC. https://javacc.dev.java.net/.
    [49]Extjs. http://www.extjs.com/.
    [50]Dwr. http://directwebremoting.org/dwr/index.html.
    [51]Agustina Buccella, Alejandra CechichAn. Ontology Approach to Data Integration.JCS&T, 2003,3(2).
    [52]Vaida Jakoniene, Patrick Lambrix. Ontology-based integration for bioinformatics.Proceedings of the 31st VLDB Conference,Trondheim. Norway, 2005.
    [53]Dejing Dou. Paea LePendu. Ontology-based Integration for Relational Databases.ACM 1-59593-108-2/06/0004,2006.
    [54]N. F. Noy.. Semantic Integration: A Survey of ontology-Based Approaches.SIGMOD Record, 2004,33(4):65-70.
    [55]A. Doan, A. Y. Halevy.Semantic-Integration Research in the Database community.AI Magazine, 2005,26(l):83-94.
    [56]H. Wache, T. Vogele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann andS. Hubner. Ontology-Based Integration of Information—A Survey of ExistingApproaches. IJCAI,2001.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700