用户名: 密码: 验证码:
蒙古语语音关键词检测技术的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
近几年随着计算机多媒体技术的快速发展,蒙古语的教育、影视、文化等诸多方面的语音资料越来越多,已形成了丰富的民族文化资源。如何对这些语音文档进行有效地检索和分类成为了蒙古文信息处理领域中的一个热点问题。语音关键词检测技术是根据用户给定的查询,从指定的语音数据集中返回与其对应的语音片段。本文针对蒙古语语音关键词检测任务中涉及到的一些相关技术进行了深入的研究,包括蒙古语大词汇量连续语音识别(LVCSR)技术、基于网格和混淆网络的蒙古语关键词检测技术和蒙古文字母到音素的转换技术等。本文研究的蒙古语语音关键词检测相关技术不仅具有重要的学术价值,并对维护国家安全及边疆少数民族地区的稳定,繁荣和发展少数民族文化具有重要意义。论文所做工作主要集中在以下几个方面:
     1.蒙古文属于黏着语,是词根缀加多个后缀的形式构成新词,通过这种方式可以生成大量的蒙古文单词,这给蒙古语大词汇量连续语音识别的研究工作带来了很大的困难。为了解决大规模蒙古文单词的识别问题,本文根据蒙古文的构词特点,提出了基于分割识别的蒙古语LVCSR方法。本文详细描述了蒙古语语音识别技术的基本原理,并且针对基于分割识别的蒙古语LVCSR方法,重新建立了声学模型和语言模型。实验结果表明,基于分割识别的蒙古语LVCSR方法可以较好的解决蒙古语大词汇量连续语音识别问题,并且在训练声学模型时,校正结尾后缀的发音会提高识别的准确率。本文提出的基于分割识别的蒙古语LVCSR方法对其他黏着语的语音识别和语音检测等领域的研究提供了新的思路和方法。
     2.本文将基于网格和混淆网络的关键词检测首次应用到蒙古语关键词检测任务中,并结合蒙古文的构词特点对集内词检测方法进行了改进。本文详细描述了基于词网格的蒙古语关键词检测方法中的后验概率的估计、搜索和置信度计算等问题。其次,介绍了网格转换混淆网络的方法和基于词混淆网络的蒙古语关键词检测中的索引的建立、关键词搜索和确认等方法。最后,结合蒙古文构词特点,提出了基于词干进行检测的集内词检测方法。实验结果表明,基于词混淆网络的检测方法从各个性能上都好于基于词网格的检测方法,并且改进的集内词检测方法有效的提高了系统的性能。
     3.为了解决蒙古语关键词检测任务中的集外词检测问题,本文提出了基于音素混淆网络的蒙古语关键词检测方法。蒙古语语音文件被解码成音素形式时正确识别率会变得很低,并会出现很多不符合韵律学的发音序列。为了提高系统的精准率和召回率,我们采用了音素混淆矩阵改进了关键词的置信度计算方法,并得到了较好的实验结果。本章首先介绍了对音素混淆网络文件建立索引的方法。其次,详细介绍了音素混淆矩阵。第三,描述了在音素混淆网络文件中搜索和确认音素串的方法。第四,介绍了蒙古语关键词检测系统的框架。最后对相关方法进行了实验比较。实验结果表明,基于音素混淆网络的蒙古语关键词检测方法可以较好的解决集外词的检测问题,并且采用音素混淆矩阵的置信度计算方法提高了系统的整体性能。
     4.为了解决蒙古文集外词到音素串的转换工作,本文提出了蒙古文字母到音素的转换方法。蒙古语关键词检测任务中对集外词进行检测时需要将集外词转换成对应的音素串形式,然后通过音素串进行查找,因此蒙古文字母到音素的转换系统变得极其重要。蒙古文的书面形式和口语发音不是一一对应的,会出现元音和辅音的增加、脱落及变换等现象,这给蒙古文字母到音素的转换工作带来了一定的难度。针对蒙古文字母到音素的转换问题本文提出了基于规则的蒙古文字母到音素的转换方法和基于联合序列模型的蒙古文字母到音素的转换方法。实验结果表明,利用联合序列模型的蒙古文字母到音素的转换方法要明显好于基于规则的蒙古文字母到音素的转换方法。并且,我们建立的基于联合序列模型的蒙古文字母到音素的转换系统的词误识率为16.32%,音素误识率仅为3.37%,基本达到了实用要求。
In recent years, with the development of computer multimedia technology, the speech data for Mongolian has increased rapidly in many fields such as edu-cation, film, culture, etc. These data are valuable national culture resources. On the other side, however, the effective retrieval and classification of these speech data become a hot topic of Mongolian information processing. Speech keyword spotting is a technology that tries to find the most similar voice clips from des-ignated speech dataset giving the user queries. In this thesis, we make a further study on some technologies that are specific for applying speech keyword spot-ting technology on the Mongolian language, including Mongolian Large Voca-bulary Continuous Speech Recognition (LVCSR) technology, Mongolian Speech keywords spotting technology based on lattice and confusion network and grapheme to phoneme conversion technology for Mongolian. The technolo-gies discussed in our thesis not only can promote prosperity and development in minority areas, but also have great importance in maintaining the national secu-rity and stability of minority areas. The main contributions of our research are described as follows:
     1. Mongolian is an agglutinative language, It is possible to produce a very large number of words from the root with suffixes, so that the study of Mongolian Large Vocabulary Continuous Speech Recognition is very diffi-cult. To overcome this difficulty, in this thesis, we propose a Segmenta-tion-based LVCSR approach, which recognizes Mongolian words according to the characteristic of Mongolian word-formation rule. We detailed the basic principles of the Mongolian speech recognition technology, and rebuilt the corresponding Acoustic Model and Language Model for the Segmenta- tion-based LVCSR approach. Experimental results show that our Segmenta-tion-based method can effectively solves the recognition problem of a very large number of Mongolian words. What's more, the pronunciation correc-tion of the ending suffixes before training the acoustic models can greatly improves the recognition accuracy. The idea proposed for Mongolian can be considered as a successful case, which can be referred to by Speech Recogni-tion and Detection research on other agglutinative languages.
     2. Our work firstly applies keyword spotting that is based on Lattice and Con-fusion Network to Mongolian keyword spotting task, and improves In-Vocabulary spotting method by considering the word-formation rule of Mongolian. Firstly, we describe the posterior probability estimation, key-words searching and the calculation of confidence measures in Mongolian speech keyword spotting method that is based on word lattice. Secondly, we introduce another Mongolian speech keyword spotting method that is based on word confusion network and, correspondingly, the indexing, keywords searching and confirming scheme it used. Finally, we propose an improved In-Vocabulary spotting method according to the word-formation rule of Mongolian. Experimental results show that the Mongolian speech keyword spotting method that based on word confusion network is better than that based on word lattice in all respects and that the improved In-Vocabulary spotting method effectively increases the system performance.
     3. To detect the large amount of Out-of-Vocabulary words, we propose a Mongolian keyword spotting method based on phoneme confusion network. If a speech file is decoded to phonemes form, it generally can not be recog-nized with high accuracy. What's worse, a lot of phoneme that even does not obey the prosody will appear. To improve the system precision and recall, we propose a new confidence calculation algorithm which is based on phoneme confusion matrix and achieved satisfied results. We firstly introduce the in-dex building approach for phoneme confusion network; Secondly, we depict the phoneme confusion matrix; Thirdly, we demonstrate the phonemes searching and confirming approach in phoneme confusion network; Fourthly, we propose a framework for Mongolian keyword spotting system; and finally give detailed experimental results comparison. Experimental results show that the Mongolian Out-of-Vocabulary words can be effectively recognized by our phoneme confusion network based spotting method. The overall sys-tem performance can also be greatly improve by using the phoneme confu-sion matrix based calculation method.
     4. We propose a Mongolian grapheme-to-phoneme conversion (G2P) method. When detecting an Out-of-Vocabulary word, it firstly need to be represented as a phoneme string and then detected as a bunch of ordered characters. To perform this process, a Mongolian G2P system is essential. The written form and pronunciation of Mongolian are not one-to-one correspondence since the existence of the addition, losing and mutation of vowels and consonants. This brings certain difficulty for the Mongolian G2P work. To overcome this dif-ficulty, we propose both a rule-based Mongolian G2P conversion method and a statistic-based Mongolian G2P conversion method (Joint Sequence Model). Experimental results show that the statistic-based method is significantly better than the rule-based one. The word error rate is16.32%and the pho-neme error rate is3.37%for Mongolian G2P conversion system based on Joint Sequence Model, which satisfies the most application requirements.
引文
[1]KASHINO K, KUROZUMI T, MURASE H. A quick search method for audio and video signals based on histogram Pruning[J]. IEEE Transactions on Multimedia,2003,5(3):pp.348-357.
    [2]张卫强,刘加.网络音频数据检索技术[J].通信学报,2007,28(12):pp.152-155.
    [3]FOOTE J. An overview of audio information retrieval [J]. Multimedia Systems,1999,7(1):2-10.
    [4]HANSEN J H L, HUANG R, ZHOU B, etc. Speechfind:advances in spoken document retrieval for a national gallery of the spoken word [C]. IEEE Transactions on Speech and Audio processing,2005,13(5):pp.712-730.
    [5]Guanglai Gao, Biligetu, Nabuqing, Shuwu Zhang. A Mongolian Speech Recognition System Based on HMM[C]. International Conference on Intelligent Computing 2006 (ICIC2006),2006, pp.667-676.
    [6]Qilao HS., Guanglai Gao. Researching of Speech Recognition Oriented Mongolian Acoustic Model[C]. In:CCPR 2008,2008, pp.406-411.
    [7]Feilong Bao, Guanglai Gao. Improving of Acoustic Model for the Mongolian Speech Recogni-tion System[C]. In:CCPR2009,2009, pp.616-620.
    [8]孟和吉雅,田会利,敖其尔.基于词干词缀的有限条词的蒙古语语音合成系统的研究[A].中国计算技术与语言问题研究——第七届中文信息处理国际会议论文集[C].2007.
    [9]孟和吉雅,白音门德,敖其尔.基于词干词缀的蒙古语语音合成方法[J].内蒙古大学学报(自然科学版).2008(06).
    [10]敖敏.基于韵律的蒙古语语音合成研究:[博士学位论文][D].内蒙古大学.2012.
    [11]Lawrence Rabiner, Biing-Hwang Juang.语音识别基本原理[M],北京:清华大学出版社,1999,pp.6-10.
    [12]N. Gouvianakis, C. Xydeas. Advances in analysis by synthesis LPC speech coders [J]. Journal of the Institution of Electronic and Radio Engineers.1987,57(6):pp.272-286.
    [13]R. Y. Yousif, W. A. Mahmoud, M.A.H. Abdul Karim. Speech recognition based upon feature combination using DTW[J]. A.M.S.E. review,1988,7(2):pp.35-43.
    [14]L.R.Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition [C], proceedings of the IEEE,1989,77(2), pp.257-286.
    [15]B. Tomas, Berg, Vladimir, Hamp, etc. Embedded ViaVoice [C]. Lecture Notes in Artificial In-telligence (Subseries of Lecture Notes in Computer Science),2004, pp.269-274.
    [16]Steve Young. The HTK Hidden Markov Toolkit:Design and Philosophy, Cambridge Uni-versity Engineering Department, United Kingdom,1994.
    [17]C. J. Chen, R. A. GoPinath, et al. A Continuous Speaker-Independent Putonghua Dictation System [C]. Proc. International Conference Signal Processing,1996:pp.821-824.
    [18]E. M. Mc Creight. The dragon computer system:an early overview Micro-architecture of VLSI Computers [C]. Proceedings of the NATO Advanced Study Institute,1985:pp.83-101P.
    [19]Kai-Fu Lee, Hsaio-wuen Hon, Mei-Yuh Hwang, S. Mahajan. Recent Progress and future out-look of the SPHINX speech recognition system[J]. Computer Speech and Language,1990, 4(1):pp.57-69.
    [20]包世恩.蒙古语非特定人大词汇量连续语音识别系统的研究与实现:[硕士学位论文][D].内蒙古大学.2005.
    [21]毕力格图.基于HMM建模的蒙古语连续语音识别系统的研究与实现:[硕士学位论文][D].内蒙古大学.2006.
    [22]艾霞.面向语音识别的蒙古语语言模型的研究:[硕士学位论文][D].内蒙古大学.2007.
    [23]哈斯其劳.面向语音识别的蒙古语声学模型的研究:[硕士学位论文][D].内蒙古大学.2008.
    [24]飞龙.蒙古语语音识别系统的研究与优化:[硕士学位论文][D].内蒙古大学.2009.
    [25]Bridle, J.S., An Efficient Elastic-Template Method for Detecting Given Words in Running Speech[C], Brit. Acoust. Soc. Meeting,1973. pp.1-4.
    [26]Christiansen, R.W., Rushforth, C.K., Detecting and Locating Key Words in Continuous Speech Using Linear Predictive Coding [J], IEEE Trans. on ASSP,1977.25(5):pp.361-367.
    [27]C.S. Myers, L.R. Rabiner. Connected digit recognition using a level-building DTW algorithm [C]. IEEE Transactions on Acoustics, Speech, and Signal Processing,1981, ASSP-29: pp.351-363.
    [28]Higgins, A.L., Wohlford, Robert E., Keyword Recognition Using Template Concatenation [C]. ICASSP,1985,3:pp.1233-1236.
    [29]WilPon, J.G., Lee, C.H., Rabiner, L.R. Application of Hidden Markov Models for Recognition of a Limited Set of Words in Unconstrained Speech [C], ICASSP,1989,3:pp.254-257.
    [30]A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki. The DET Curve in As-sessment of Detection Task Performance [C]. In Proc. Eurospeech. Rhodes, Greece.1997. pp:1895-1898.
    [31]Bourlard H, D'hoore B, Boite J-M. Optimizing recognition and rejection performance in word spotting systems [C]. IEEE International Conference on Acoustics, Speech, and Signal Processing. Adelaide,1994. pp.373-376.
    [32]Jones GJF, Foote JT, Sparck Jones K, Young SJ. Video Mail Retrieval:the effect of word spotting accuracy on Precision [C]. In:Proc. ICASSP95, IEEE CS Press, Piscataway,1995.1: pp.309-312.
    [33]Knill KM, Young SJ. Speaker dependent keyword spotting for hand-held devices. Technical Report, Cambridge University Engineering Department, Cambridge, UK.1994.
    [34]Aubert, X. and Ney, H., Large vocabulary continuous speech recognition using word graphs [C]. ICASSP,1995.1:pp.49-52.
    [35]Murveit, H., Butzberger, J.W., Digalakis, et al. Large-vocabulary dictation using SRI's decipher speech recognition system:Progressive-search techniques [C]. ICASSP,1993.2:pp.319-322.
    [36]D. R. H. Miller, M. Kleber, C. lin Kao,, et al.Rapid and accurate spoken term detection [C]. In Proc. Interspeech'07,2007, pp.314-317.
    [37]M. Bacchiani, et al, SCANMail:audio navigation in the voicemail domain[C]. In Proc. of the HLT Conf.,2000, pp.1-3.
    [38]J. H. L. Hansen, SpeechFind:Advances in spoken document retrieval for a national gallery of the spoken Word[C]. IEEE Trans. Speech Audio Process,2005.13(5):pp.712-730.
    [39]Hsin-min Wang, Shi-sian Cheng, and Yong-cheng Chen, The SoVideo Mandarin Chinese broadcast news retrieval system[C]. In ISCA Workshop on Multilingual Spoken Document Re- trieval (MSDR2003), HongKong.
    [40]欧智坚,罗骏,谢达东等.多功能语音/音频信息检索系统的研究与实现.全国网络与信息安全技术研讨会论文集[C].北京,2004.pp.106-112.
    [41]罗骏,欧智坚.一种高效的语音关键词检索系统[J].通信学报.2006,27卷2期:113~118
    [42]郝杰.网络电话语音识别及关键词捕捉算法的研究:[博士学位论文][D].北京,清华大学电子工程系,2001.
    [43]张鹏远,广播新闻语音的关键词检测系统[J],通信学报,2007,28(12):pp.131-135.
    [44]陈一宁.连续语音流中大词表关键词检测算法的研究:[博士学位论文][D].北京:清华大学电子工程系,2004.
    [45]韩疆,刘晓星,颜永红,张鹏远,潘接林.一种任务域无关的语音关键词检测系统[A].2005年全国网络与信息安全技术研讨会.2005.
    [46]郑铁然.基于音节网格的汉语语音文档检索方法的研究:[博士学位论文][D].哈尔滨工业大学.2008.
    [47]黄湘松,赵春晖,张磊,刘柏森.基于互信息置信度的网格连续汉语语音检索[J].计算机应用研究.2009,29卷12期:pp.4607-4619.
    [48]Meng S, Yu P, Seide F, Liu J., A study of lattice-based spoken term detection for Chinese spon-taneous speech[C]. In:Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding,2007, pp.635-640.
    [49]Meng S., Yu P., Seide F., Liu J., and Seide F., Fusing multiple systems into a compact lattice index for Chinese spoken term detection [C], In Proceedings of ICASSP'08,2008, pp.4345-4348.
    [50]L. Mangu, E. Brill, and A. Stolcke:Finding consensus in speech recognition:word error mini-mization and other applications of confusion networks[J]. Computer Speech and Language, 2000,14(4):pp.373-400.
    [51]J. Mamou, D. Carmel, and R. Hoory:Spoken document retrieval from call-center conversa-tions [C]. In:Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval,2006, pp.51-58.
    [52]Ville T. Turunen and Mikko Kurimo, Indexing Confusion Networks for MorPh-based Spoken Document Retrieval [C], SIGIR2007, pp.631-638.
    [53]孟猛,梁家恩,徐波.基于语音样本的口语电话语音关键词检出算法的改进研究[A].第三届全国信息检索与内容安全学术会议.2007.
    [54]孙成立.语音关键词识别技术的研究:[博士学位论文][D].北京邮电大学.2008.
    [55]黄湘松.基于混淆网络的汉语语音检索技术研究:[博士学位论文][D].哈尔滨工程大学.2010.
    [56]NIST, The spoken term detection (STD) 2006 evaluation plan,10th ed., National Institute of Standards and Technology (NIST),Gaithersburg, MD, USA, September 2006. [Online]. Availa-ble:http://www.nist.gov/speech/tests/std
    [57]GB 25914-2010,信息技术蒙古文名义字符、变形显现字符和控制字符使用规则[S].2011.
    [58]郑方.连续无限制语音流中关键词识别方法研究:[博士学位论文][D].北京:清华大学.计算机科学与技术系,1997.
    [59]D. A. James and S. J. Young. A fast lattice-based approach to vocabulary independent wordspotting [C]. IEEE International Conference on Acoustics, Speech and Signal Processing. Adelaide, Australia,1994, 1:pp.377-380.
    [60]Kishan Thambiratnam, Member, and Sridha Sridharan, Rapid and Accurate Spoken Term De-tection [C]. IEEE International Conference on Acoustic, Speech and Language processing, 2007,15(1):pp.346-357.
    [61]Kartik Audhkhasi Ashish Verma, Keyword search using modified minimum edit distance measure [C]. ICASSP2007,4:pp.929-932.
    [62]C. Chelba and A. Acero, Position specific posterior lattices for indexing speech [C]. In Proc. of ACL'05, Ann Arbor, Michigan, June 2005, pp.443-450.
    [63]Weintraub, M., LVCSR log-likelihood ratio scoring for keyword spotting [C], ICASSP1995,1: pp.297-300.
    [64]陈伟,李成荣,浦剑涛.基于LVCSR的关键词检测技术的研究[C],第八届全国人机语音通讯学术会议论文集,2005,pp.134-138.
    [65]J. Mamou, B. Ramabhadran, and O. Siohan. Vocabulary independent spoken term detection [C]. In Proceedings of ACM-SIGIR'07, Amsterdam, July 2007, pp.615-622.
    [66]F. Wessel, R. Schluter, K. Macherey, H. Ney. Confidence measures for large vocabulary conti-nuous speech recognition [C]. Proceeding of ICASSP 2001.USA.2001,9(3):pp.288-298P
    [67]K. Kashino, T. Kurozumi, H. Murase. Feature fluctuation absorption for a quick audio retrieval from long recordings [C]. Proceeding of the 15th International Conference on Pattern Recogni-tion,2000:pp.98-101.
    [68]F. Jelinek, R. L. Mercer, L. R. Bahl. A maximum likelihood approach to continuous speech recognition [C]. IEEE Trans. on Pattern Analysis and Machine Intelligence.1983:pp.179-190.
    [69]S. Z. Li. Content-based audio classification and retrieval using the nearest feature line me-thod[J]. IEEE Transactions on Speech and Audio Processing,2000,8(5):pp.619-625.
    [70]Picone J.W., Signal modeling techniques in speech recognition[J]. Proceedings of the IEEE, 1993,81(9):pp.1215-1247.
    [71]F.Zheng, G. Zhang, and Z. Song. Comparison of Different Implementations of MFCC[J]. Journal of Computer Science and Technology,2001,16(6):pp.582-589.
    [72]H. Hermansky. Perceptual linear predictive (PLP) analysis for speech[J]. J. Acoust. Soc. Amer., 1990, pp.1738-1752.
    [73]李净,徐明星.汉语连续语音识别中声学模型基元比较:音节、音素、声韵母[C].第六届全国人机语音通讯会议.2004,p267-280
    [74]高升,徐波,黄泰翼.基于决策树的汉语三音子模型[J],声学学报,2000,25(6).
    [75]Lee K-F.Context-Dependent Phonetic Hidden Markov Models for Speaker-Independent Con-tinuous Speech Recognition [J]. IEEE Trans. on Acoustics, Speech and Signal Processing,1990, 38(4), pp.599-609.
    [76]Reichl. W. and Chou W.. Robust decision tree state tying for continuous speech recognition [J]. IEEE Trans. Speech and Audio Proc,2000,8(5):pp.555-556.
    [77]高升.声学模型与搜索算法:[博士学位论文][D].中国科学院自动化研究所,2000.
    [78]张继勇.汉语语音识别中声学建模及参数共享策略的研究:[博士学位论文][D].北京:清华大学.2001.
    [79]B.I. Pawate, Eric Dowling. A HMM-based approach for segmenting continuous speech [C]. IEEE 1992:pp.1105-1110.
    [80]A.P. Dempster, N.M. Laird and D.B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm[J]. Journal of the Royal Statistical Society (Series B).1977,39(1):pp.1-38.
    [81]B.H. Juang, S. Katagiri. Discriminative Learning for Minimum Error Classification [J]. IEEE Transactions Acoustics, Speech and Signal Processing.1992,40(12):pp.3043-3054.
    [82]V. Valtchev, J.J. Odell, P.C. Woodland and S.J. Young. Lattice-based Discriminative Training for Large Vocabulary Speech Recognition [C]. Proceedings of the International Conference on Acoustics, Speech and Signal Processing,1996:pp.605-608.
    [83]Christopher D.Manning, Hinrich Schutze. Foundation of Statistical National Language Processing [M], Publishing House of Electronics Industru,2005, pp.72-90
    [84]颜龙.大词汇量汉语连续语音识别系统中若干问题的研究:[博士学位论文][D],北京邮电大学,2005.
    [85]Chen, S. F. etc. An Empirical Study of Smoothing Techniques for Language Modeling[C], Computer Speech and Language,Oct.,1998,pp.359-394
    [86]陈儒.基于统计语言模型的检索模型及其平滑技术的研究:[硕士学位论文][D],哈尔滨工业大学,2005.
    [87]D.B. Paul. Algorithms for an Optimal A* Search and Linearizing the Search in the Stack De-coder [C]. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing,1991:pp.693-696.
    [88]确精扎布.蒙古文编码[M],内蒙古大学出版,2000.
    [89]清格尔泰.蒙古语语法[M],内蒙古人民出版社,1992.
    [90]Young S, et al.:The HTK book (Revised for HTK version 3.4.1) [M]. Cambridge Universi-ty.2009.
    [91]A. Stolcke:SRILM-An Extensible Language Modeling Toolkit [C]. Proc. Intl. Conf. Spoken Language Processing, Denver, Colorado.2002.
    [92]Huang Xuedong, Acero Alex, Hon Hsiao-Wuen et al., Spoken language Processing:a guide to theory, algorithm and system development [M], Prentice HALL PTR,2001.
    [93]Jian Xue and Yunxin Zhao, Improved confusion network algorithm and shortest path search from lattice [C], ICASSP. Singapore,2006.2:pp.621-632.
    [94]HELEN MENG, STEPHANIE SENEFF, VICTOR ZUE. Phonological parsing for bi-directional letter-to-sound/sound-to-letter generation [C].In Proc. Workshop on Human Language Technology (HLT'94). Plainsboro, NJ:HLT,1994:pp.289-294.
    [95]KARI TORKKOLA. An efficient way to learn English grapheme-to-phoneme rules automati-cally [C]. USA:In Proc. International Conference on Acoustics, Speech, and Signal Processing. Minneapolis:ICASSP,1993:pp.199-202.
    [96]PAUL C. Bagshaw. Phonemic transcription by analogy in text-to-speech synthesis:Novel word pronunciation and lexicon compression [J]. Computer Speech & Language,1998,12(2): pp.119-142.
    [97]HELEN MENG. A hierarchical lexical representation for bi-directional spel-ling-to-pronunciation/pronunciation-to-spelling generation [J]. Speech Communication,2001, 33(3):pp.213-239.
    [98]MAXIMILIAN BISANI, HERMANN Ney. Joint sequence models for grapheme-to-phoneme conversion[J]. Speech Communication.2008,50(5):pp.434-451.
    [99]MAXIMILIAN BISANI, HERMANN Ney. Multigram-based grapheme-to-phoneme conver-sion for LVCSR [C]. In Proc. Eurospeech'03. Geneva,2003:pp.933-936.
    [100]D. Wang. Out-of-vocabulary spoken term detection [D]. Edinburgh:University of Edinburgh. 2010,pp.85-110
    [101]BELLEGARDA, J. R.. Unsupervised, language- independent grapheme-to-phoneme con-version by latent analogy[J]. Speech Communication,2005,46 (2):pp.140-152.
    [102]PAUL TAYLOR. Hidden Markov models for grapheme to phoneme conversion [C]. In Proc.Interspeech'05. Lisbon:Interspeech,2005:pp.1973-1976.
    [103]Press, W. H., Teukolsky, S. A., Vetterling, W. T., Flannery, B. P.. Numerical Recipes in C [M]. Cambridge University Press,1992:Ch.10.5.
    [104]G Gosztolya and L. Toth. Spoken term detection based on the most probable phoneme se- quence[C]. In Proceedings of the 2011 International Symposium on Applied Machine Intelli-gence and Informatics (SAMI) (IEEE), Slovakia,2011:101-106.
    [105]P. Yu, K. Chen, C. Ma, F. Seide. Vocabulary-independent indexing of spontaneous speech[J]. Speech Audio Process.2005,13(5):pp.635-643.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700