用户名: 密码: 验证码:
中—英文跨语言问答式信息检索技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着计算机网络技术的发展,全球互联网用户快速增长,网络信息资源语种也日益多样,跨语言信息检索已成为越来越重要的研究课题,同时自动问答技术是自然语言处理领域中一个非常热门的研究方向,它综合运用了各种自然语言处理技术。跨语言问答式信息检索兼有跨语言信息检索及问答式信息检索两个方面的内容,对用户跨越语言壁垒,实现无障碍式交流具有一定的现实意义和实用价值。跨语言问答式信息检索系统是集自然语言处理技术和信息检索技术于一身的新一代搜索引擎。它的出现旨在提供更有力的信息获取工具,以应对信息爆炸带来的严重挑战。
     问答式信息检索一般包括三个主要组成部分:问题分析、信息检索和答案抽取,本文针对中-英文跨语言问答式信息检索,主要研究了以下三个方面的问题:
     1.中文问题分析。中文问题的分析是问答系统首先进行的重要工作,这个过程分析的效果对后面的处理过程有着重要的影响。问题分析部分包含下几部分工作:首先要对问题进行分词,名实体识别以及词性标注,然后确定问题的类型、提取出问题的关键词、依据问题的类型等因素对关键词进行适当的扩展。
     2.中-英跨语言检索模式。跨语言模式是连接双语信息的桥梁,跨语言模式研究的目的就是能够通过使用提问语种的提问式在信息系统中检索出符合要求的多种信息语种的相关信息。跨语言模式部分包含基于双语词典统计的跨语言模式,以及基于机器翻译工具的跨语言模式。
     3.英文答案抽取。英文答案的抽取是跨语言问答系统中最为重要的核心技术,也是决定系统效用以及精确度最关键的步骤。本文给出的英文答案抽取研究包含两方面内容:一个是返回包含问句答案的全部可能文档,即问句答案初选研究;另一个是在返回的可能包含问句答案的文档中,根据问句类型的不同情况,最后抽取出正确的答案,即问句答案抽取研究。
With the development of the internet, the number of internet users in the world has increased rapidly, simultaneously, the language categories of the internet information resources are gradually increasing. Cross-Language Information Retrieval has been a great important research field. Question Answering is also a hot research field in Natural Language Processing, which includes many kinds of NLP technologies. Cross-Language Information Retrieval and Question Answering are two aspects of Cross-language Question Answering. Cross-Language Question Answering makes users cross language barriers. It is of great practical significance. Cross-language Question Answering system combines natural language processing technology and information retrieval technology as the next generation of search engine. It aims to provide more effective tools for dealing with the serious of challenges which are brought by information explosion.
     Question Answering includes three main components: question analysis, information retrieval and answer extraction. In this thesis, attention should be payed to the Chinese-English Cross-Language Question Answering, which mainly includes the following three aspects:
     1. Chinese problem analysis. Chinese problem analysis is the first important step of question answering; the effects of the analysis on the process have an important impact for the follow process. Chinese problem analysis section contains several parts: First of all, word segmentation, POStagging and Named Entity Recognition, then determine the type of problem, and extract the keyword of the problem, at last, expand the keywords based on factors such as the type of the problem.
     2. Chinese-English Cross-Language retrieval mode. Cross-Language model is a bridge connecting the inter-language information. The purpose of Cross-Language retrieval mode is that information-retrieval systems can get accurate information satisfing the requirements of a wide range of information languages. Cross-Language model section contains the bilingual English model based on statistics information and the inter-language mode based on machine translation tools.
     3. The English answer extraction. The English answer extraction is the most important core technology in Cross-Language Question Answering, and it is also the most critical steps for deciding the effectiveness and precision of the system. In this paper, the English answer extraction section includes two aspects: one is to return documents which contain all the answers for questions, namely, preliminary research for choosing the answers to the questions. Another is to return documents which contain the final answers for questions, namely, research for extracting answers of problems.
引文
1郑实福,刘挺,秦兵,李生.自动问答综述.中文信息学报. 2002, 16(6): 46~52
    2 D. Hull, A. Xerox. Question Answering Track Report. NIST Special Publication. 1999: 743~752
    3 T. Clifton, W. Tehran. Question Answering Track. NIST Special Publication. 2004: 782~790
    4 H. Cui, K. Li, R. Sun. National University of Singapore at the TREC 13 Question Answering Main Task. NIST Special Publication. 2004: 34~41
    5 L. Wu, X. Huang, L. You. Fduqa on TREC 2004 QA Track. NIST Special Publication. 2004: 354~361
    6 J. Xu, A. Licuanan, R. Weischedel. TREC 2003 QA at BBN: Answering Definitional Questions. NIST Special Publication. 2003: 98~106.
    7任成梅.跨语言信息检索的发展与展望.图书馆学研究. 2006: 79~82
    8侯艳飞.跨语言信息检索研究.北京大学学位论文. 2003: 5~6
    9 T. Hedlund et al. Dictionary-based Cross-Language Information Retrieval: Learning Experiences from CLEF 2000-2002. Information Retrieval. 2004, (7): 99~119
    10 D. A. Hull, G. Grefenstette. Querying Across Language: a Dictionary-based Approach to Multilingual Information Retrieval. Proceedings of the 19th International Conference on Research and Development in Information Retrieval. Zurich, 1996: 49~57
    11 B. Lisa, W. B. Croft. Statistical Methods for Cross-Language Information Retrieval. Kluwer Academic Publishers. 1998: 23~40
    12 D. W. Oard. Interactive Cross-Language Document Selection. Information Retrieval. 2004, (7): 205~228
    13 J. Savoy. Cross-Language Information Retrieval: Experiments based on CLEF 2000 Corpora. Information Processing and Management. 2003, (39): 75~115
    14聂建云,陈江.利用平行网页建立中英文统计翻译模型.中文信息学报. 2001, 15(1): 1~10of Saarland. 1997: 21~27
    43 B. Merialdo. Tagging English Text with a Probabilistic Model. Computational Lingustics. 1994, 20(2): 1~29
    44孙春葵,李蕾,杨晓兰,钟义信.基于知识的文本摘要系统研究与实现.计算机研究与发展. 2000, 37(7): 874~881
    45陈庆伟,刘军.基于Lucene的网站全文搜索的设计与实现.科技情报开发与经济. 2005, 15(15): 242~243
    46黄国才.跨语言综合搜索引擎设计.现代图书情报技术. 2001, 88(4): 31~33

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700