基于自适应子带功率谱熵静音检测的G.729改进算法

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于自适应子带功率谱熵静音检测的G.729改进算法

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Improving G.729 by Adaptive Band-partitioning Spectral Entropy Voice Activity Detection
作者：章东升
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：G.729 ; 静音检测 ; 自适应子带功率谱熵 ; 鲁棒性
英文关键词：G.729 ; Voice Activity Detection ; adaptive multi-band spectral entropy ; robustness
学位年度：2009
导师：陈立伟
学科代码：081002
学位授予单位：哈尔滨工程大学
论文提交日期：2008-12-01

摘要

随着数字移动通信和多媒体通信技术的飞速发展,需要有低码率的语音编码器来解决宽带资源的限制。研究者们相继推出了多种基于参数编码和混合编码的中低码率的语音编码器,国际电信联盟根据当时的研究成果和需求不断推出新的语音编码标准,为语音编码的研究成果的广泛应用做出了巨大的推动作用。
     G.729算法是国际电信联盟于1996年获准通过的采用CS-ACELP技术的具有8kb／s编码速率的语音编码建议,广泛应用于多媒体通信、蜂窝移动通信、IP网络电话中。CS-ACELP编解码器基于码激励线性预测(CELP)语音编码模型,该模型包括LPC分析、矢量量化、共轭结构代数码本和感知加权滤波等关键技术。该算法压缩效率高,合成语音质量好,但算法本身较为复杂,没有充分利用通信中话音不连续的特性。而且G.729只能采用固定码率,不能根据网络实际情况进行调节。
     针对G.729没有利用谈话中间隙的特点,在没有增加算法复杂性的的基础上,提出了基于自适应子带功率谱熵的静音检测算法对G.729加以改进。自适应子带功率谱熵法是一种新的端点检测方法,在不同的背景噪声下,该方法具有很好的鲁棒性。
     本文首先详细介绍了G.729的编解码原理,在研究经典的静音检测技术基础上提出了自适应子带功率谱熵的方法,并且在实验室条件下进行了仿真测试,实验结果表明了在不增加G.729复杂度和运算时延基础上,充分利用了电话交谈中的间隙,提高了语音压缩算法的压缩率。
It is necessary to make low-rate speech coder resolve the limit of broadband resource with the very fast development of digital mobile and multimedia communication. As the harvests, there were a lot of mide-low rate code based on both parametric and mixed-code presented by scholars. Those are the basic of the ITU new recommendation for speech code. It is very important for those harvests applying widely and quickly.
     ITU-T Recommendation G.729 was approved in 1996 containing the description of an algorithm for the coding of speech signals at 8kb/s using Conjugate-Structure Algebraic-Code-Excited Linear-Prediction (CS-ACELP),and it is widely used in multimedia communications, cell mobile communications and IP net phone. The CS-ACELP coder is based on he Code-Excited Linear-Prediction (CELP) coding model consisting of several key points such as Linear Prediction Coding analysis, Vector Quantization, Conjugate-Structure Algebraic-Codebook, Perceptual Weighing, and so on. G.729 is the research object of this thesis because of its efficiency of compression and good quality of synthesis speech. But G.729 is the most complexity algorithm that ITU-T ever proposed, its bit rate is fixed and it haven't compressed the silence time between effective speeches.
     This thesis adopts adaptive multi-band spectral entropy Voice Activity Detection which won't add extra computing complexity to improve G.729 algorithm and enhance its compression ratio. Adaptive multi-band spectral entropy is a new method of Voice Activity Detection. In different background noises, it has good robustness.
     This thesis introduces the brief scope of current study of speech compression coding. The adaptive band-partitioning spectral entropy is proposed based on the research of G.729 and traditional algorithms of Voice Activity Detection. In lab circumstance, the results of simulation make out that the improving G.729 has good speech code performance in high noise environment.

引文

[1]Mark Marzinzik, Birger Kollmeier. Speech pause detection for noise spectrum estimation by tracking power envelope dynamics[J], IEEE Trans. Speech and Audio.2002,10(2):109-118
    [2]LI Qi et al, Robust endpoint detection and energy normalization for real-time speech and speaker recognition[J]. IEEE Trans. Speech and Audio.2002,10(3):146-157
    [3]SENDUR L, SELESMCK L W, Bivariate shrinkage functions for wavelet-based denoising exploiting interscale dependency[J]. IEEE Trans. on Signal Processing.2002,47(11):2744-2756
    [4]李娜,王中元,胡瑞敏.数字语音编码技术和标准介绍[J].中国新通信：技术版.2007,13(2)：39-43
    [5]鲍长春.高质量的4k／s散布脉冲CELP语音编码算法[J].电子学报.2003,31(2)：309-313
    [6]H. Murthy, F. Beaufays, L. Heck and M. Weintraub. Robust Text-Independent Speaker Identification over Telephone Channels[J]. IEEE Trans on Speech and Audio Processing.1999, 7(5):554-568
    [7]M. S. Zilovic, R. P. Ramachandran and R. J. Mammone. A Fast Algorithm for Finding the Adaptive Component Weighted Cepstrum for Speaker Recognition[J]. IEEE Trans on Speech and Audio Processing.1997,5(1):84-86
    [8]李志宏,张雪英,王安红.基于动态小波神经网络非线性预测的语音编码方法[J].电路与系统学报.2005,10(5)：89-92
    [9]赵力.语音信号处理[M].北京：机械工业出版社.2003：234-238
    [10]陈东,赵胜辉,匡镜明.基于高阶统计方法改进的自适应多速率话音激活检测算法[J].电子与信息学报.2003,25(4)：626-632
    [11]Franc.V, Hlavac.V. Multi-class Support Vector Machine. International Conference on Pattern Recognition.2002,2(11): 236-236
    [12]王炳锡,屈丹等.实用语音识别基础[M].北京：北京国防工业出版社.2005：105-146
    [13]沈勇,章艳,张昕婷.国际音频技术研究进展[J].电声技术.2007,31(1)：78-83
    [14]蒋文建,韦岗.基于掩蔽特性的噪声环境下语音识别新特征[J].声学学报.2001,26(6)：216-320
    [15]N Wah B W and Lin Dong. LSP-based multiple-description coding for real-time low bit-rate voice over IP[J]. IEEE Trans. on Multimedia.2005,7(l):167-178
    [16]D. A. Reynolds, R. Rose. Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models[J]. IEEE Trans on Speech and Audio Processing.1995,3(1):72-83
    [17]BYUN K J,EOSI,JEONG HB. An embedded ACELP speech coding based on the AMR-WB codec[J]. ETRI Journal. 2005,27(2):231-234
    [18]Filip Mulier. Guest editorial vapnik-hervonenkis (VC) Learning Theory and Its Applications[J]. IEEE Trans. On Neural Networks. 1999,10(5):985-987
    [19]V. Vapnik. The Nature of Statistical Learning Theory[J]. IEEE Trans. On Neural Networks.1997,8(6):1564-1564
    [20]鲍长春.数字语音编码原理[M].西安：西安电子科技大学出版社.2007：85-106
    [21]Shung-Yung Lung.Wavelet feature selection based neural networks with application to the text independent speaker identification[J]. Pattern Recognition.2006,39(8):1518-1521
    [22]BAO C C, LI H T. Vector Quantizer Used for Quantizing the ISF(Immittance Spectral Frequencies)Parameters and Its Apparatus[P]. China Invention Patent:2000710003193.6. 2007-02-28
    [23]Andre Gustavo Adami. Modeling prosodic differences for speaker recognition[J].Speech Communication.2007,49(4):56-82
    [24]李锦宇,王仁华.基于预搜索策略的ASELP语音编码算法[J].信号处理.2006,16(2)：126-132
    [25]蔡莲红,黄德智,蔡锐.现代语音技术基础与应用[M].北京：清华大学出版社.2003：89-103
    [26]胡广书.数字信号处理-理论、算法与实现[M].北京：清华大学出版社.1997：68-76
    [27]胡安勇.1.2k／s语音编码技术的研究[D].北京邮电大学硕士学位论文.2004：40-51
    [28]邱文.8kbit/s语音压缩编码的研究[D].东南大学工学硕士学位论文.1998：24-32
    [29]陆哲明.矢量量化编码算法及应用研究.哈尔滨工业大学博士学位论文.2001：25-32
    [30]李昌立,吴善培.数字语音-语音编码实用教程[M].北京：人民邮电出版社.2004：337-369
    [31]杨行峻,迟惠生.语音信号数字处理[M].北京：电子工业出版社1995：60-82
    [32]韩纪庆.张磊,郑铁然.语音信号处理[M].北京：清华大学出版社.2004：253-296
    [33]舒晖.G.729A语音编解码系统的研究[D].南京邮电学院硕士学位论文.2003：59-63
    [34]寻智峰.CS-ACELP语音编码系统的研究与实现[D].太原理工大学硕士论文.2003：27-39
    [35]S. Furui. Cepstral Analysis Technique for Automatic Speaker Verification. IEEE Trans on Acoustics, Speech, and Signal Processing[J].2002,29(2):254-271
    [36]M. S. Zilovic, R. P. Ramachandran and R. J. Mammone. Speaker Identification Based on the Use of Robust Cepstral Feature Obtained from Pole-Zero Transfer Functions[J]. IEEE Trans on Speech and Audio Processing.1998,6(3):260-267
    [37]栗学丽,丁慧,徐柏龄.基于熵函数的耳语音声韵分割法[J].声学学报.2005,30(1)：69-75
    [38]陈四根,和应民.一种基于信息熵的语音端点检测方法[J].应用科技.2002,28(3)：13-14
    [39]陈东,赵胜辉,匡镜明.基于高阶统计方法改进的自适应多速率话音激活检测算法[J].电子与信息学报.2003,25(5)：626-632
    [40]吴启晖,王金龙.基于谱熵的检测[J].电子与信息学报2001,23(10)：989-993
    [41]陈振标,徐波.基于子带能量特征的最优化语音端点检测算法研究[J].声学学报.2005,30(2)：171-176
    [42]于增贵.ITU-T语音编码标准介绍[J].通信技术.1997(4)：124-131
    [43]胡光锐,韦晓东.基于倒谱特征的带噪语音端点检测[J].电子学报.2000,28(10)：95-97
    [44]Song,Byung CheolRa,Jong Beom. A Fast Search Algorithm for Vector Quantization Using L2-Norm Pyramid of Codewords[J]. IEEE Transactions on Image Processing.2002,11(1):10-15
    [45]刘敬伟,肖熙.实用环境语音识别鲁棒性技术研究与展望[J].计算机工程与应用.2006,24(2)：7-12
    [46]张雪英,王安红.基于RNN的非线性预测语音编码[J].太原理工大学学报.2003,34(3)：270-272
    [47]D.Zhu, S.Nakamura, R.Wang. Maximum Likelihood Sub-Band Adaptation for Robust. Speech Recognition[J]. Speech Communication.2005,47(3):243-264
    [48]李晶皎.嵌入式语音技术及凌阳16位单片机应用[M].北京：北京航空航天大学出版社.2003：165-178
    [49]L. Qi, J. Zheng, A. Tsai. Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition[J]. IEEE Transactions on Speech and Audio Processing. 2002,10(3):146-157
    [50]Childers D G.Speech Processing and Synthesis Toolboxes[M].北京：清华大学出版社.2004：82-93
    [51]B. Davies P. Mermelstein. Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentence[J]. IEEE Trans. Acoustics, Speech, and Signal Processing.1999,28(4):357-366
    [52]Alex H J, Dave S. Professional ASP. NET1.0高级编程[M].北京：清华大学出版社.2002：142-183
    [53]K. Lamia, M. Arnaud. Towards Improving Speech Detection Robustness for Speech Recognition in Adverse Conditions[J]. Speech Communication.2003,40(3):261-276
    [54]M.Kristan,J.Pers,M.Perse,S.Kovacic.ABayes-Spectral-Entropy-Ba sed Measure Camera Focus Using a Discrete Cosine Transform[J]. Pattern Recognition Letters.2006,27(13):1431-1439
    [55]雷静.语音识别技术的研究及其基本实现[D].武汉理工大学硕士学位论文.2002：55-58

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700