用户名: 密码: 验证码:
基于Deep Web的图书信息集成与查询系统
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
该系统是使用在手机上的图书搜索系统,通过搜索,可以为用户提供基本的图书信息查询,并将查询结果显示在手机屏幕上,方便手机用户查阅。在本文中,笔者提供了一种基于Deep Web(深网)技术的网络爬虫以实现对特定主题的网络信息的收集、整合,该爬虫被设计成一个基于JAVA语言的多线程的多级队列爬虫,在这个队列中采用HTMLParser工具和正则表达式技术对抓取的URL进行处理和存储。在URL队列的设计上引入了Berkeley DB,实现了队列的高效存取,并将抓取到的数据存入MySql数据库。笔者采用基于Lucene技术对处理后的信息建立索引,在成功收集资源并建立索引后,笔者基于软件Android,当今最重要的手机开发平台建立了一个搜索界面,以方便用户使用手机通过Web搜索到与特定主题相关的资源。
     该系统可以为手机使用者提供方便快捷的信息服务,用户可以随时获取所查询的图书的各类信息,为用户带来了便利。
The system is used in cellphones for searching books. Through searching, it can provide phone users with basic information of books, and show the result on the phone’s screen. That is convenient for phone users. In this paper, the author present a web crawler based on the technology of Deep Web to complete the network collection and integration on a particular theme. The reptile is designed to be a multi-thread multilevel queued reptile based on the JAVA language, and in the queue, the HTMLParser tool and regular expressions technology are used to process and store the grabbed URL. The Berkeley DB is introduced in the design of the URL queue. That realizes the efficient access of the queue and deposits the grabbed data in the MySql databases. The author uses the technology based on Lucece to establish the index of the processed information. After successfully collecting resources and establishing index, based on the Android software, nowadays’most important cellphones development platform, the author sets up a searching interface as a convenience to users so that through the Web, they can use cellphones to find the resources related to the specific topics.
     The system can provide phone users with convenient and quick information service. The users can access all kinds of information on the books they are searching in any time. That’s so convenient for users.
引文
[1] Bergman M K.The Deep Web:Surfacing hidden value [J].Tech rep,Bright Planet LLC,Dec.2000.
    [2] Deep Web Technology.Accessible at http://www.deepWebtech.com/,Oetober 2005.
    [3] Invisiable.com.Accessible at http://www.invisiable.com/,October 2005.
    [4] MetaQuerier Researeh Group.Accessible at http://metaquerier.es.uiuc. edu/,October 2005.
    [5] He H,Meng W,Yu C.T,Wu Z:WISE-Integator:an automatic integrator of Web search inierfaces for e-commerce.In:Proceedings of the 29th International Conference on Very Large Data Bases,Berlin,2003,357-368.
    [6] Robert B Doorenbos,Oren EtZioni,and Daniels Weld.A scalable comparison shopping agent for the World-Wide Web.In Proceedings of the First Intenational Confence on Autonomous Agenis,Pages 39-48,Marina delRey, CA,Februry 1997.
    [7] Hasan Davulcu,Juliana Freire,Michael Kifer,I.V.Ramakrishnam. A layered architecture for querying dynamic Web content [C].In SIGMOD'99 Proceedings, Philadelphia,PA,May 1999,P191-502.
    [8] S.Raghavan, H Garcia-Molina. Crawling the hidden Web [C].In Proceedings of the 27th International Conference on Very Large Data Bases.Roma, Italy,001,P129-138.
    [9] QProber Research Group [CP/OL].Accessible at http://qprober.cs.columbia.edu/.
    [10] L Barbosa and J Freire. Siphoning hidden-Web data through keyword-based interfaces. In SBBD 2004.
    [11]张卫丰,徐宝文,周晓宇等.Web搜索引擎综述[J].计算机科学,2001,28(9), 24-28.
    [12] Jeff Heaton.Programming Spiders,Bots,and Aggregators in JAVA[M].北京:电子工业出版社,2002,2-3.
    [13] MICHAEL K.BERGMAN.The Deep Web: Surfacing Hidden Value[EBOL].http:// download.csdn.net/down/2878713/wangcyonline2000,Monday,September 24,2001.
    [14]黄颖,黄治平.HTMLParser提取网页信息的设计与实现[J].江西理工大学学报,2007,28(6).
    [15]邱哲,符滔滔,王学松.开发自己的搜索引擎:Lucene+Heritrix[M].北京:人民出版社,2009,384-386.
    [16] Otis Gospodnetic.Lucene in Action[M].MANNING PUBN,2010,23-45.
    [17]汪涛,樊孝忠,顾益军,刘林.基于概念分析的主题爬虫设计[J].北京理工大学学报,2004,24(10).
    [18]邓志鸿,唐世渭,张铭等.Ontology研究综述.北京大学学报(自然科学版). 2002,38(5):730-738.
    [19] T R Gruber,A translation approach to portable ontologies,Knowledge Acquisition. 1993,5(2):199-220.
    [20] N Guarino,Formal Ontology:Conceptual Analysis and Knowledge Representation, International Journal of Human-Computer Studies,1995,43(2/3): 625-640.
    [21]邱哲,符滔滔.开发自己的搜索引擎[M].北京:人民邮电出版社,2007年.
    [22]李晓明,闫宏飞,王继民.搜索引擎——原理、技术与系统[M].北京:科学出版社,2006年.
    [23]黄晓斌.网络信息挖掘[M].北京:电子工业出版社,2005年.
    [24]于天恩.迅速搭建全文搜索平台[M].北京:清华大学出版社,2007年.
    [25]饶伟红.网络信息资源管理于检索[M].北京:电子工业出版社,2004年.
    [26]李刚,宋伟,邱哲.征服Ajax+Lucene构建搜索引擎[M].北京:人民邮电出版社,2006年.
    [27]卢亮,张博文.搜索引擎原理、实践与应用[M].北京:电子工业出版社,2007年.
    [28]徐宝文,张卫丰.搜索引擎及信息获取技术[M].北京:清华大学出版社,2003年.
    [29] Rael Dornfest,Tara Calishain.GOOGLE HACKS[M].北京:电子工业出版社,2006年.
    [30] John Battelle.The Search.Portfolio Hardcover[M],2005年.
    [31]王黎,吴越胜,冉小.Flex+Jsp Web应用开发实战详解[M].北京:清华大学出版社,2010年.
    [32]邱彦林.Flex第一步[M].北京:清华大学,2007年.
    [33]石志国.完全手册JSP网路开发详解[M].北京:电子工业出版社.
    [34]吴文渊,曾振柄,符红光.基于Ontology的平面几何知识库设计.计算机应用,2002年3月,第22卷第3期.
    [35] Erhard Rahm,Philip A Bernstein.A Survey of Approaches to Automatic Schema Matching [J].VLDB Journal,2001,10(4):334-350.
    [36]郭宏志.Android应用开发详解[M].电子工业出版社.2010.
    [37]朱桂英.Android开发应用从入门到精通[M].中国铁道出版社.2011.
    [38] Shane Conder,Lauren Darcey.Android移动应用开发从入门到精通[M](美).人民邮电出版社.2010.
    [39]张元亮.Android开发应用实战详解[M].中国铁道出版社.2011.
    [40]韩超,梁泉.Android系统级深入开发:移植与调试[M].电子工业出版社.2011.
    [41]吴亚峰,索依娜.Android核心技术与实例详解[M].电子工业出版社.2010.
    [42]张利国,代闻,龚海平.Android移动开发案例详解[M].人民邮电出版社.2010.
    [43]张利国,龚海平,王植萌.Android移动开发入门与进阶[M].人民邮电出版社.
    [44] 2009Grigoris Antoniou,Frank van Harmelen.Web Ontology Language:OWL, April 2003 http://www.cs.vu.nl/~frankh/postscript/OntoHandbook03OWL.pdf.
    [45] OWL Web Ontology Language Overview.http://www.w3c.org/tr/owl-features /.
    [46] Nicola Guarino,Formal ontology and information systems.In:Nicola Guarino,eds, Proc of the 1st Int Conf on Formal Ontologies in Information Systems(FOIS’98).IOS Press,1998:3-15.
    [47] Gruber T.Towards Principles for the Design of Ontologies Used for Knowledge Sharing [J].International Journal of Human-Computer Studies, 1995,43(5/6),907-928.
    [48] Ralm E,Bernstein P.On Matching Schemas Automatically [J].VLDB Journal, 2001,10(4):31-36.
    [49] Li W,Clifton C.SemInt:a tool for identifying attribute correspondences in heterogeneous databases using neural network. Data Knowledge, 2000,33:49-84.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700