WFST解码器词图生成算法中的非活跃节点检测与内存优化

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

WFST解码器词图生成算法中的非活跃节点检测与内存优化

详细信息查看全文 | 推荐本文 |

英文篇名：Inactive-node detection and memory optimization in WFST decoder lattice generation algorithm
作者：丁佳伟 ; 刘加 ; 张卫强 ; 冯运波 ; 刘利军 ; 于乐
英文作者：DING Jiawei;LIU Jia;ZHANG Weiqiang;FENG Yunbo;LIU Lijun;YU Le;Department of Electronic Engineering, Tsinghua University;China Mobile Information Security Center;
关键词：语音识别解码器 ; 加权有限状态机 ; 工程应用 ; 内存回收
英文关键词：speech recognition decoder;;WFST;;engineering application;;memory recycling
中文刊名：中国科学院大学学报
英文刊名：Journal of University of Chinese Academy of Sciences
机构：清华大学电子工程系;中国移动通信信息安全管理与运行中心;
出版日期：2019-01-15
出版单位：中国科学院大学学报
年：2019
期：01
基金：国家自然科学基金(U1836219)资助
语种：中文;
页：112-117
页数：6
CN：10-1131/N
ISSN：2095-6134
分类号：TN912.34

摘要

解码器引擎是语音识别系统的核心模块,而基于加权有限状态机(WFST)的解码器则是解码器的一种典型形式。分析静态WFST解码器在实际应用中的资源占用问题,提出一种在解码和词图生成过程中通过检测非活跃节点动态回收系统资源的策略。最后,在OpenKWS 15数据集上进行实验,证明该策略使解码器的内存消耗比不回收系统资源的解码器降低75%左右。
Decoder is the core module of speech recognition system, and the decoder based on the weighted finite-state transducers(WFST) is a typical form of decoder. We analyze the resource occupation of WFST-based static decoder in practice, and propose a strategy for dynamical recovery of system resources by detecting inactive nodes during decoding and lattice generation. Finally, we carry out experiments on the OpenKWS 15 dataset to show that the decoder with this strategy consumes about 75% less memory than decoders that do not reclaim system resources.

引文

[1] Young S. A review of large-vocabulary continuous-speech[J]. IEEE Signal Processing Magazine, 1996, 13(5): 45-57.
    [2] Rybach D, Ney H, Schluter R. Lexical prefix tree and WFST: a comparison of two dynamic search concepts for LVCSR[J]. Audio, Speech, and Language Processing, IEEE Transactions on, 2013, 21(6): 1 295-1 307.
    [3] Rybach D, Gollan C, Heigold G, et al. The RWTH Aachen University open source speech recognition system[C]//Tenth Annual Conference of the International Speech Communication Association. ISCA, 2009: 2 111-2 114.
    [4] Povey D, Ghoshal A, Boulianne G, et al. The Kaldi speech recognition toolkit[C]//IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, 2011.
    [5] Moore D, Dines J, Doss M M, et al. Juicer: a weighted finite-state transducer speech decoder[C]//International Workshop on Machine Learning for Multimodal Interaction. Springer Berlin Heidelberg, 2006: 285-296.
    [6] Hori T, Nakamura A. Speech recognition algorithms using weighted finite-state transducers[J]. Synthesis Lectures on Speech and Audio Processing, 2013, 9(1): 1-162.
    [7] Mohri M, Pereira F, Riley M. Weighted finite-state transducers in speech recognition[J]. Computer Speech & Language, 2002, 16(1): 69-88.
    [8] Hopcroft J E, Motwani R, Ullman J D. Introduction to automata theory, languages, and computation[J]. ACM SIGACT News, 2001, 32(1): 60-65.
    [9] 李伟, 吴及, 王智国. 一种快速的语音识别词图生成算法[J]. 清华大学学报(自然科学版), 2009 (S1): 1 254-1 257.
    [10] Pan G, Lu C, Liu J. An exact word lattice generation method in the WFST framework[C]//Information Science and Technology (ICIST), 2016 Sixth International Conference on. IEEE, 2016: 394-398.
    [11] Ljolje A, Pereira F, Riley M. Efficient general lattice generation and rescoring[C]//EUROSPEECH. ISCA, 1999: 1 251-1 254.
    [12] Liu X, Chen X, Wang Y, et al. Two efficient lattice rescoring methods using recurrent neural network language models[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2016, 24(8): 1 438-1 449.
    [13] Ortmanns S, Ney H, Aubert X. A word graph algorithm for large vocabulary continuous speech recognition[J]. Computer Speech & Language, 1997, 11(1): 43-72.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700