用户名: 密码: 验证码:
基于动态自适应语言模型的手机中文输入系统的研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
手持设备的广泛应用对汉字输入技术的要求越来越高。本文通过对智能输入技术的研究,构建了动态自适应语言模型,并将此模型与输入系统实现技术相结合,实现了一个基于动态自适应语言模型的中文输入系统,最终将此系统应用于手机平台上。
     首先,本文讨论了数码输入法的编码方案,介绍了智能输入技术中普遍采用的n元语法模型以及本文为了解决数据稀疏问题所使用的数据平滑算法和评价系统优劣的性能测试指标。在此基础上提出了一个动态自适应语言模型框架。
     其次,为了使语言模型具有领域特征,本文提出了构造领域语言模型的方法,确定了加载领域语言模型的策略和领域语言模型的初始规模;同时改进了用户语言模型,使其能够动态适应不同的用户;提出了一种模型融合方法,此方法不仅有效融合了动态自适应语言模型,而且能解决领域语言模型的数据稀疏问题。
     然后,将融合了领域语言模型的动态自适应语言模型应用于中文输入系统中,并且测试领域语言模型的加载对中文输入系统性能的影响。结果显示,该系统的性能有了明显提高,这表明本文提出的动态自适应语言模型具有较好的实用价值。
     最后,本文在分析了Android的系统架构以及其输入法接口和输入特性的基础上,详细描述了基于该平台实现一个中文输入系统的过程。
With the rapid development of handheld devices, the Chinese input technology is becoming more and more important. In this thesis, we bring about an intelligent input system of dynamic self-adaptation language model by the way of the dynamic self-learning language model combined with the digital input technology.At the same time, this thesis imports the system to the mobile platform.
     Firstly, the thesis discusses the encoding scheme of Chinese input method, introduces the n-gram model which is widely used in intelligent input technology, two smoothing algorithms used int the paper to resolve the sparse data of models and the test standard of capability, and presents a dynamic self-adaptation model.
     Secondly, the thesis presents a method which is used to construct special-domain language model in order to make the dynamic self-adaptation model have the characteristics of special-domain, determines the method how to load the special-domain language model and the original scales of the special-domain language model, and increases the ability of the special-user language model to adapt to different users dynamically. It also generates the dynamic self-adaptation language model by using the model fusion method which is put a forward, which can also resolve the sparse data of the special-domain language model.
     Thirdly, the dynamic self-adaptation language model containing the special-domain language model is applied to Chinese input system.This thesis tests the performance of Chinese input system which uses the dynamic self-adaptation language model.The experimental results show that the system gets a preferable performance. This thesis applies the dynamic self-adaptation language model to the mobile’s input system, which will be of great practical signifincance.
     Finally, after analysing the system architecture of android and introduces the input method and the characteristics of the Android platform, the thesis describes the progress of implementing the input system.
引文
[1]中华人民共和国工信部.2011年1月份通信业运营情况.2011年3月1日.
    [2]袁哲.我国数字键盘汉字输入技术的现状和发展机遇.2010年1月28日.
    [3]中华人民共和国工业和信息化部[EB/OL].http://www.miit.gov.cn/n11293472/.
    [4]张磊,褚昆,郭黎利.基于互信息的语言模型回退算法[J].应用科技.2009,36(04):28-35.
    [5]黄永文,何中市.基于互信息的统计语言模型平滑技术[J].中文信息学报.2005,19(04):45-51.
    [6] Gao J F,Goodman J S,Li M J,Lee K F.Toward a Unified Approach to Statistical Language Modeling for Chinese . ACM Transactions on Asian Language Information Processing.2002,1(1):3-33.
    [7]刘秉权,王晓龙.一种面向用户的语言模型及其机器学习方法[J].哈尔滨工业大学学报.2004,36 (2):78-93.
    [8]陈莉莉,周竹荣.基于贝叶斯网络和互信息的检索用户模型[J].计算机工程与设计.2008,29(05):1057-1060.
    [9]刘茵.利用网页结构特征建立用户模型[J].电脑知识与技术.2010,23(06):157-160.
    [10]黄珺.统计和规则相结合的语言模型在中文输入法中的应用研究[D].西安电子科技大学研究生毕业论文.2008.
    [11]梁奇,郑方,徐明星等.基于trigram语体特征分类的语言模型自适应方法[J].中文信息处理学报.2006,17(07):85-90.
    [12]顾剑华,赵文耘,彭鑫.基于本体的领域特征建模过程研究[J].计算机应用与软件.2008,25(02):7-10.
    [13]王文荣,乔晓东,朱礼军.针对特定领域的新词发现和新技术发现[J].现代图书情报技术.2008,161(02):35-40.
    [14]何婷婷,张小鹏.特定领域本体自动构造方法[J].计算机工程.2007,33(22):235-238.
    [15] [美] Christopher D Manning,[德]Hinrich Schutze著.苑春发,李庆中等译.统计自然语言处理基础[M].北京:电子工业出版社,2005.
    [16]马少平,夏莹.基于词同现概率的拼音汉字自动转换方法[J].电子计算机与外部设备.1997,21(3):75-80.
    [17]吴军,王作英等.一种基于语言理解的输入方法[J].中文信息学报.1996,10(2):123-127.
    [18]徐志明,王晓龙,姜守旭.一种语句级汉字输入技术的研究[J].高技术通讯.2000,(1):124-128.
    [19]陈一凡,朱亮.汉字键盘输入智能处理软件综述[J].中文信息学报.2003,17(2):256-260.
    [20] Kenneth W Church , William A Gale . A Comparison of the Enhanced Good-Turing and Deleted Estimation Methods for Estimating Probabilities of English Bigrams[J].Computer Speech and Language.1991,2(5):19-54.
    [21] Slava M Katz.Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer[R].IEEE Transactions on Acoustics,Speech and Signal Processing.1987,35(3) :400-401.
    [22] Jelinek F,Mercer R L.Interpolated Estimation of Markov Source Parameters from Sparse Data.Proceedings of the Workshop on Pattern Recognition in Practice.North Holland,Amsterdam.1980:381-397.
    [23]杨琳,张建平.特定领域的汉语语言模型平滑算法比较研究[J].计算机工程与应用.2006,32(01):14-16.
    [24] Stanley F Chen,Joshua Goodman.An Empirical Study of Smoothing Techniques for Language Modeling [J].In Proceedings of the 34th Annual Meeting of the ACL.1996:310-318.
    [25] Stanley F Chen,Joshua Goodman.An Empirical Study of Smoothing Techniques for Language Modeling [R]. Technical Report TR-10-98,Center for Research in Computing Technology,Harvard University.1998.
    [26] Good,I J.The Population Frequencies of Species and the Estimation of Population Parameters [J].Biometrika 1953,40(3/4):237-264.
    [27] GB/T19246-2003.信息技术通用键盘汉字输入通用要求[S].
    [28]顾平,朱巧明,李培峰.智能型汉字数码输入技术的研究[J].中文信息学报.2006,20(4):100-105.
    [29] CMU SLM Toolkit:http://mi.eng.cam.ac.uk/prc14.
    [30] Clarkson P R , Rosenfeld R . Statistical Language Modeling using the CMU-Cambridge Toolkit[R].Proceedings ESCA Eurospeech.1997.
    [31] ICTCLAS:http://www.nlp.org.cn.
    [32] Cox I J,Miller M L,Minka T P.The Bayesian Image Retrieval System PicHunter Theory Implementation and Psychophysical Experiments[J].IEEE Transactions on Image Processing.2000,9(1):20-37.
    [33] Iyer R M,Mari O.Modeling Long Distance Dependence in Language:Topic Mixtures versus Dynamic Cache Models [J].IEEE Transactions on Speech and Audio Processing.1999.7(1):30-39.
    [34] Gildea D,Hofmann T.Topic Based Language Models using EM [A].In proceedings of 6th European Conference on Speech Communication and Technology (Eurospeech`99).1999:2167-2170.
    [35]张玉华,杨季文,周克兰.汉字编码输入法动态评测系统的设计与实现[J].计算机工程与应用.2006,42(25):196-201.
    [36]郭宏志.Android应用开发详解[D].北京:电子工业出版社,2010.
    [37] SDK文档:http://www.taranfx.com/android/reference/package.html.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700