用户名: 密码: 验证码:
基于本体的异构数据集成技术研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着数据库和网络技术的发展和广泛应用,数据的存储和表示呈现出分布性、异构性的特点。许多大型企业或中央机构往往需要将这些分布、异构的数据按某种需求进行集成,实现数据共享,以供用户以统一的视图进行查询和分析。其中如何解决异构数据源之间存在的系统异构、语法异构和语义异构问题就成为一个非常重要的研究课题。传统的数据集成方法由于缺少对数据的形式化语义描述,解决语义异构比较困难。而“本体”(Ontology)是共享概念模型的明确的形式化规范说明,适合作为数据集成中的通用语义模型来描述异构数据语义。
     本论文首先针对国家发改委经济运行信息网络系统中的数据集成需求,分析了其中的异构性问题。并在现有的数据集成技术的基础上,采用基于本体的异构数据集成方法实现异构数据语义层集成。其中,通过Web Service技术封装数据源,解决系统异构;利用XML技术统一数据格式,解决语法异构;引入本体来描述数据源的语义信息,并在局部本体和全局本体之间建立映射来实现语义关联,解决语义异构。
     本论文对基于本体的异构数据集成中的本体构建、本体映射和查询处理技术进行了研究和总结。其中重点研究了本体映射技术,提出了一种基于相似度综合计算的本体映射方法。该方法通过语法距离相似度计算、语义词典相似度计算和结构相似度计算等方法计算本体中概念、属性之间的相似度,以发现本体映射关系,实现数据源之间的半自动化地语义匹配。
     本论文结合上述技术,设计并实现了一个基于本体的异构数据集成系统(OBDIS)。进行了需求分析;描述了系统架构、功能模块;并详细说明了系统的实现情况,包括关键模块的实现、关键类和方法说明以及用户界面介绍。
With the development and application of database and network technology, massive data are stored in heterogeneous and distributed databases or files. Lots of enterprises and central office demand to integrate the distributed and heterogeneous data to realize data share. Then the heterogeneous data can be queried and analyzed by users as a unified view. Therefore, how to resolve the system heterogeneity, syntax heterogeneity and semantic heterogeneity between data sources becomes a critical research subject. However, the tranditional integration methods have difficulties to resolve the semantic heterogeneity because they can not formally describe the semantics of data. Ontology, as a formal explicit specification of a shared conceptualization, is able to represnt the knowledge of specific domain effectively. It can be used as the universal semantic model to express the semantics of heterogenous data. The thesis investigates the requirement of data integration in the Economic Operation Information Network System of National Development and Reform Commission. After analyze existing data integration technology, an ontology-based heterogeneous data integration method is put forward. The method uses Web Service to resolve the system heterogeneity and uses XML to resolve syntax heterogeneity. Further, ontology and ontology mapping technology are used to realize semantic association and resolve the semantic heterogeneity.
     The critical technologies of the method, including ontology construction, ontology mapping and query process are researched in the thesis。The ontology mapping technology is mainly concerned. Then a mapping discovery algorithm based on the compositive concept similarity computation is proposed. In the algorithm, similarity between concepts and properties in ontologies are computed by means of syntax distance, semantic dictionary, struct similarity and constraint computation to find the mapping relations half-automatically.
     At last, the thesis gives the design and impletation of the Ontology Based Data Integration System (OBDIS) with the critical technologies. It demonstrates the functions, architecture and all the modules of the system; introduces the implementation of key modules, classes and user interfaces.
引文
1 陈跃国, 王京春. 数据集成综述. 计算机科学. 2004, 31(5):48~51.
    2 Amann B, Beeri C, Fundulaki I. Querying XML Sources Using an Ontology Based Mediator. Proceedings of CoopIS/DOA/ODBASE. Springer-Verlag Berlin Heidelberg, 2003:429~448.
    3 Sheth A P, Larson J A. Federated Database Systems for Managing Distributed Heterogeneous and Autonomous Databases. ACM Computing Surveys. 1990, 22(3):183~236.
    4 Chawathe S, Garcia-Molina H, Hammer J, The TSIMMIS project: Integration of Heterogeneous Information Sources. 16th Meeting of the Information Processing Society of Japan, 1994.
    5 Wiederhold G. Mediators in The Architecture of Ruture Information Systems. IEEE Computer, 1992, 25(1):38~39.
    6 Inmon W H, Building the Data Warehouse. John Wiley & Sons, Inc. New York.1996:5~60.
    7 Studer R, Benjamins V R, Fensel D. Knowledge Engineering: Principles and Methods. Data and Knowledge Engineering, 1998, 25(122):161~197.
    8 Jannink J, Mitra P, Neuhold E, An Algebra for Semantic Interoperation of Semistructured Data. Proceedings of 1999 IEEE Knowledge and Data Engineering Exchange Workshop, Chicago, 1999.
    9 Goasdou F, Lattes V. The use of carin language and algorithms for information integration: The Picsel Project. International Journal of Cooperative Information Systems (IJCIS), 1999, 9(4):383~401.
    10 Mena E, Illarramendi A, Sheth AP, OBSERVER: An Approach for Query Processing in Global Information Systems based on Interoperation across Pre-existing Ontologies ,Distributed and Parallel Databases, 2000,8(4).
    11 刘海滨, 李冠宇. 基于 Ontology 的信息集成研究综述. 计算机工程与应用. 2005, 41(25): 159~161.
    12 李瑞轩 , 卢正鼎等 . MDBS 多数据库系统原理与技术 . 电子工业出版社 , 2005:277~281.
    13 王宁, 陈滢, 俞本权等. 一个基于 CORBA 的异构数据源集成系统的设计. 软件学报. 1998, 9(5): 376~382.
    14 孟小峰. Web 信息集成技术研究. 计算机应用与软件, 2003, 20(11):32~36.
    15 邓 志 鸿 , 唐 世 渭 , 张 铭 等 . Ontology 研 究 综 述 . 北 京 大 学 学 报 ( 自 然 科 学版),2002,38(5):728~730.
    16 Bergamaschi v, Castano S, Capitani S. MOMIS: An Intelligent System for the Integration of Semistructured and Structured Data. INTERDATA, 1998.
    17 Buccella A, Cechich A, An Ontology Approach to Data Integration. NR Brisaboa Journal of Computer Science and Technology, 2003.
    18 杨建武, 陈晓鸥. XML 相关标准综述.计算机科学. 2002, 29(2): 25~28.
    19 岳昆, 王晓玲, 周傲英. Web 服务核心支撑技术:研究综述. 软件学报, 2004,15(3): 428~442.
    20 Gruber T R. A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 1993,5: 199~220.
    21 邓志鸿, 唐世渭, 张铭等. Ontology 研究综述. 北京大学学报(自然科学版). 2002,38(5): 730~738.
    22 宋炜, 张铭. 语义网简明教程. 高等教育出版社. 2004. 55~142.
    23 Berners-Lee T, Hendler J, Lassila O. The Semantic Web. Scientific American, 2001, 284(5):34~43.
    24 Gruber T R. Toward principles for the design of ontologies used for knowledge sharing, International Journal of Human-Computer Studies, 1995.
    25 Cruz F I, Xiao H. An Ontology-based Framework for Semantic Interoperability betweenXML Sources, Eighth International Database Engineering & Applications Symposium (DEAS 2004), July,2004.
    26 Antoniou G, Harmelen F V. Web Ontology Language: OWL. 2004. www.w3.org/TR/owl-features.
    27 Rahm E, Bernstein P. A. A Survey of Approaches to Automatic Schema Matching. The VLDB Journal. 2001, 10(4):334~350.
    28 Kalfoglou Y, Schorlemmer M. Ontology Mapping: The State of the Art. The Knowledge Engineering Review Journal, 2003.
    29 唐杰, 梁邦勇. 语义 Web 中的本体自动映射. 计算机学报. 2006, 11:5~16.
    30 Noy N, Musen M. The PROMPT Suite: Interactive Tools for Ontology Merging and Mapping. International Journal of Human-Computer Studies, 2003:18~45.
    31 Macedche A, Motik B. MAFRA-A Mapping Framework for Distributed Ontologies. Web Intelligence and Agent System, 2003, l: 235~248.
    32 Do H H, Rahm E. COMA—A System for Flexible Combination of Schema Matching Approaches. Proc of the 28th Intl. Conf. on Very Large Database, 2002: 610~621.
    33 Madhavan J, Bernstein PA, Rahm E. Generic Schema Matching with Cupid. Proc of the
    27th Intl. Conf. on Very Large Database, 2001:80~89.
    34 Doan A H, Madhavan J, Domingos P. Learning to Map between Ontologies on the Semantic Web. Proc World Wide Web Conf ACM Press, 2002: 662~673.
    35 Gilleland M. Levenshtein Distance, in Three Flavors. http://www.merriampark.com/ld.htm.
    36 颜伟, 荀恩东. 基于 WordNet 的英语词语相似度计算. 第二届全国学生计算语言学研讨会论文集, 北京, 2004.
    37 Pantel P, Lin D, Discovering word senses from text. Proceedings of the 2002 ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Edmonton, 2002:18~79.
    38 肖文芳. 基于相似度计算的本体映射研究与实现. 中南大学硕士论文. 2007:21~34.
    39 RDF Primer. http://www.w3.org/TR/rdf-primer/.
    40 XQuery 1.0: An XML Query Language. W3C Working Draft. 2007, http://www.w3.org/TR/xquery/.
    41 王志军, 郭学俊. 基于本体的 XML 语义集成研究. 计算机技术与发展. 2006. 16(8).57~59.
    42 Seaborne A. Jena Tutorial. http://jena.sourceforge.net/.
    43 Noy N F, Sintek M, Decker S. Creating Semantic Web contents with Protégé 2000,Intelligent Systems, IEEE. 2001, 16:60~71.
    44 李军怀, 周明全. XML 在异构数据集成中的应用研究,计算机应用. 2002, 9:10~12.
    45 刘国华, 张忠平, 岳晓丽等. 数据库新理论、方法及技术导论. 电子工业出版社. 2006:108~135.
    46 杨长辉, 基于 XML Web Service 的异构数据集成系统研究与应用. 重庆大学硕士论文. 2006. 31~35.
    47 Lenzerini M. Data integration: A Theoretical Perspective. Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART, 2002.
    48 Noy N F. Semantic integration: A Survey of Ontology-based Approaches. ACM SIGMOD Record. 2004, 33(4):65~70.
    49 孙鑫. Java Web 开发详解. 电子工业出版社. 2007:366~407.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700