用户名: 密码: 验证码:
基于因特网的动态规范词表的系统构建研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
进入21世纪,伴随着因特网的迅速发展,网络信息以几何级数飞速增长,新的学科、概念、主题层出不穷,而且信息日趋复杂,质量参差不齐。关键词检索依然是主流方式,这种方式返回信息量大,但符合要求的少。为了更准确、更有效地查找到自己所需要的网络资源,建立专门针对某一学科领域的检索工具,即动态规范词表,已经成为网络资源检索的迫切需要。
     本文在对当前使用传统词表面临的问题研究分析基础上,提出了一种网络动态规范词表的构建方法。网络动态规范词表是在普通传统词表的基础上,添加计算机元素,在计算机辅助之下构成的人机结合的新系统。本文首先对《中国分类主题词表》纸质版及电子版进行分析,并分析比较国外著名网络化词表的优劣;其次,研究动态规范词表的基本理论,包括语词来源、选词依据、词表控制程度及词表功能;随后,通过人工管理及计算机自动管理,分别从管理层面及技术层面对词表的构建方式加以阐述,可以帮助用户选择合适的检索词,构造出准确的检索式;并在此基础上,构建动态规范词表系统模型;最后,对动态规范词表的应用做以简单介绍,动态规范词表可以在网络搜索引擎中智能化地为用户提供更优的检索策略,供检索使用,减少误检和检索效率低的问题,以此来提高搜索引擎的检索质量,提升用户满意度。
Entering the 21st century, with the rapid development of the Internet, the network information increase quickly and complicatedly, new disciplines, concepts and themes emerge endlessly and variously, but the quality is unknown. Keywords retrieval way is still the mainstream manner, but this kind of method returns less required information. In order to find network resources more accurately and effectively, establishing retrieval tools on a particular subject areas --dynamic standard vocabulary, has becoming urgent to network resources retrieval.
     Based on analyzed the problems caused by using traditional vocabulary in the current society, this paper put forward a kind of method to construct the network dynamic standard vocabulary. Network dynamic standard vocabulary is a new system which based on traditional vocabulary, and infused computer technology. Firstly, there is analysis of paper-based edition and electronic edition vocabulary, meanwhile, comparing the advantages and disadvantages of foreign vocabularies; secondly, there is basic theory research, including words source, words selecting basis, vocabulary control degree and vocabulary function; then, discussing the construction ways of vocabulary by means of artificially management and automatic computer management, which can help users to select proper search words, construct accurate retrieval model; and on this basis, building vocabulary system model; finally, there is simple introduction of vocabulary application, which can provide more intelligent retrieval strategy in the network search engines to reduce search problems, also it can improve the quality of search engines and the users satisfaction.
引文
[1]李育嫦.词表在网络信息检索中的应用分析[J].情报理论与实践,2006(2):161-163.
    [2]曹树金,郭菁.网络叙词表的组织结构及优化模式研究[J].图书情报工作,2005(3):31-35.
    [3] Bella Hass Weinberg. The NISO Standard for Controlled Vocabularies: A Blueprint for Revision[J].Bulletin of the American Society for Information Science and Technology,2009(10):42-47.
    [4] Monica Blake. NISO initiative for next generation of standards for controlled vocabularies[J]. The Electronic Library,2003:397-398.
    [5] Richard G. Cote, Philip Jones, Lennart Martens, Rolf Apweiler and Henning Hermjakob.The Ontology Lookup Service: more data and better tools for controlled vocabulary queries[J]. Nucleic Acids Research,2008(5):W372–W376.
    [6] Shulamit Avraham, Chih-Wei Tung, Katica Ilic. The Plant Ontology Database: a community resource for plant structure and developmental stages controlled vocabulary and annotations[J].Nucleic Acids Research,2008:D449–D454.
    [7]叙词[EB/OL].http://baike.baidu.com/view/1295514.htm,2011-3-5.
    [8]王瑛.从分类主题一体化看《中国分类主题词表》的前景[J].美中教育评论,2005(7):79.
    [9]张琪玉,侯汉清.情报检索语言实用教程[M].武汉:武汉大学出版社,2004.190-192.
    [10]王瑛.从分类主题一体化看《中国分类主题词表》的前景[J].美中教育评论, 2005(7):80.
    [11]卜书庆,贺玲勇.《中国分类主题词表》电子版研制概述[J].国家图书馆学刊, 2006(2): 10-14.
    [12]卜书庆,汪东波.网络时代《中国分类主题词表》的发展与应用[J].图书情报工作,2005(7):25-28.
    [13]HASSET[EB/OL]. http://www.data-archive.ac.uk/find/hasset-thesaurus, 2011-3-5.
    [14]LACBD Thesaurus[EB/OL]. http://www.usc.edu/libraries/archives/arc/lacbd/subject/index,2011-3-5.
    [15]Mesh Browser[EB/OL]. http://www.nlm.nih.gov/mesh/,2011-3-5.
    [16]UKAT[EB/OL]. http://www.ukat.org.uk/,2011-3-5.
    [17]UNESCO Thesaurus[EB/OL]. http://www2.ulcc.ac.uk/unesco/index.htm,2011-3-5.
    [18]Tseng Y H. Automatic thesaurus generation for Chinese document[J].Journal of the American society for information science and technology,2002,53(13):1130-1138.
    [19]Guntzer V. Automatic thesaurus construction by machine learning from retrieval sessions[J].Information Processing & Management,1989,25(3):265-273.
    [20]曹树金,罗春荣.信息组织的分类法与主题法[M].北京:北京图书馆出版社,2000.319.
    [21]曹树金,罗春荣.信息组织的分类法与主题法[M].北京:北京图书馆出版社,2000.320.
    [22]张琪玉.情报语言学基础[M].武汉:武汉大学出版社,1997.162-163.
    [23]曹树金,罗春荣.信息组织的分类法与主题法[M].北京:北京图书馆出版社,2000.295-296.
    [24]曹树金,罗春荣.信息组织的分类法与主题法[M].北京:北京图书馆出版社,2000.296-300.
    [25]曹树金,罗春荣.信息组织的分类法与主题法[M].北京:北京图书馆出版社,2000.300.
    [26]陆勇,侯汉清.用于信息检索的同义词自动识别及其进展[J].南京农业大学学报,2004(9):87-93.
    [27]宋明亮.汉语词汇字面相似性原理与后控制词表动态维护研究[J].情报学报,1996,15(4):261-271.
    [28][29]仲云云,侯汉清,杜慧平.电子政务主题词表自动构建研究[J].中国图书馆学报,2008(3):97-102.
    [30]曹军.Google的PageRank技术剖析[J].情报杂志,2002(10):15-18.
    [31]吴淑燕,许涛.PageRank算法的原理简介[J].图书情报工作,2003(2):55-61.
    [32]PageRank算法[EB/OL].http://baike.baidu.com/view/1518.htm,2011-3-5.
    [33]Senellart, Pierre P. Extraction of information in large graphs[C]. Automatic search for synonyms. Technical Report 90,2001.
    [34]Senellart, Pierre P; Blondel, Vincent D. Automatic discovery of similar words[C].In Survey of Text Mining II, Springer-Verlag, New York:2008.25-44.
    [35]Blondel, Vincent D; Senellart, Pierre P. Automatic extraction of synonyms in a dictionary[C].In Proceedings of the SIAM Workshop on Text Mining,Arlington,USA:2002.
    [36]陆勇.面向信息检索的汉语同义词自动识别[M].南京:东南大学出版社,2009.109-110.
    [37]Manning C D, Schutze H.苑春法等译.统计自然语言处理基础[M].北京:电子工业出版社,2005.
    [38]郑毅,吴斌,史忠植.基于概念空间的文本检索系统[J].计算机工程与应用,2002(12).
    [39]章成志,白振田.文本自动标引与自动分类研究[M].南京:东南大学出版社,2009.53.
    [40]杜慧平,仲云云.自然语言叙词表自动构建研究[M].南京:东南大学出版社,2009.113.
    [41]张琪玉,侯汉清.情报检索语言实用教程[M].武汉:武汉大学出版社,2004.144.
    [42]N-gram算法原理[EB/OL]. http://baike.baidu.com/view/1394579.htm,2011-3-5.
    [43]仲云云,侯汉清,杜慧平.电子政务主题词表自动构建研究[J].中国图书馆学报,2008(3):97-102.
    [44]薛春香.网络环境中知识组织系统构建与应用研究[M].南京:东南大学出版社,2009.109.
    [45]曾建勋,常春.网络时代叙词表的编制与应用[J].图书情报工作,2009(4):8-11.
    [46][48]仲云云.电子政务主题词表的构建及应用研究[D]:[硕士学位论文].南京:南京农业大学,2007.
    [47]薛春香.网络环境中知识组织系统构建与应用研究[M].南京:东南大学出版社,2009.186.
    [49]李一.网络动态专业搜索引擎构建方法的研究[D]:[硕士学位论文].长春:东北师范大学,2008.
    [50]本体数据库理论[EB/OL].http://baike.baidu.com/view/29987.htm,2011-3-5.
    [51]F.W.Lancaster ,侯汉清,戴维民,陆宝树.情报检索词汇控制[M].上海:同济大学出版社,1992.
    [52]马张华,侯汉清.文献分类法主题法导论[M].北京:北京图书馆出版社,1999.
    [53]俞君立,陈树年.文献分类学[M].武汉:武汉大学出版社,2001.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700