摘要
解码器引擎是语音识别系统的核心模块,而基于加权有限状态机(WFST)的解码器则是解码器的一种典型形式。分析静态WFST解码器在实际应用中的资源占用问题,提出一种在解码和词图生成过程中通过检测非活跃节点动态回收系统资源的策略。最后,在OpenKWS 15数据集上进行实验,证明该策略使解码器的内存消耗比不回收系统资源的解码器降低75%左右。
Decoder is the core module of speech recognition system, and the decoder based on the weighted finite-state transducers(WFST) is a typical form of decoder. We analyze the resource occupation of WFST-based static decoder in practice, and propose a strategy for dynamical recovery of system resources by detecting inactive nodes during decoding and lattice generation. Finally, we carry out experiments on the OpenKWS 15 dataset to show that the decoder with this strategy consumes about 75% less memory than decoders that do not reclaim system resources.
引文
[1] Young S. A review of large-vocabulary continuous-speech[J]. IEEE Signal Processing Magazine, 1996, 13(5): 45-57.
[2] Rybach D, Ney H, Schluter R. Lexical prefix tree and WFST: a comparison of two dynamic search concepts for LVCSR[J]. Audio, Speech, and Language Processing, IEEE Transactions on, 2013, 21(6): 1 295-1 307.
[3] Rybach D, Gollan C, Heigold G, et al. The RWTH Aachen University open source speech recognition system[C]//Tenth Annual Conference of the International Speech Communication Association. ISCA, 2009: 2 111-2 114.
[4] Povey D, Ghoshal A, Boulianne G, et al. The Kaldi speech recognition toolkit[C]//IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, 2011.
[5] Moore D, Dines J, Doss M M, et al. Juicer: a weighted finite-state transducer speech decoder[C]//International Workshop on Machine Learning for Multimodal Interaction. Springer Berlin Heidelberg, 2006: 285-296.
[6] Hori T, Nakamura A. Speech recognition algorithms using weighted finite-state transducers[J]. Synthesis Lectures on Speech and Audio Processing, 2013, 9(1): 1-162.
[7] Mohri M, Pereira F, Riley M. Weighted finite-state transducers in speech recognition[J]. Computer Speech & Language, 2002, 16(1): 69-88.
[8] Hopcroft J E, Motwani R, Ullman J D. Introduction to automata theory, languages, and computation[J]. ACM SIGACT News, 2001, 32(1): 60-65.
[9] 李伟, 吴及, 王智国. 一种快速的语音识别词图生成算法[J]. 清华大学学报(自然科学版), 2009 (S1): 1 254-1 257.
[10] Pan G, Lu C, Liu J. An exact word lattice generation method in the WFST framework[C]//Information Science and Technology (ICIST), 2016 Sixth International Conference on. IEEE, 2016: 394-398.
[11] Ljolje A, Pereira F, Riley M. Efficient general lattice generation and rescoring[C]//EUROSPEECH. ISCA, 1999: 1 251-1 254.
[12] Liu X, Chen X, Wang Y, et al. Two efficient lattice rescoring methods using recurrent neural network language models[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2016, 24(8): 1 438-1 449.
[13] Ortmanns S, Ney H, Aubert X. A word graph algorithm for large vocabulary continuous speech recognition[J]. Computer Speech & Language, 1997, 11(1): 43-72.