基于多频带分析的语音增强研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

NSTL服务站

基于多频带分析的语音增强研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Studies on Speech Enhancement Based on Analyzing Multi-band
作者：程正
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：语音增强 ; 多频带 ; 听觉掩蔽 ; 噪声估计
英文关键词：Speech enhancement ; Multi-band ; auditory masking ; Noise estimate
学位年度：2008
导师：赵鹤鸣
学科代码：081002
学位授予单位：苏州大学
论文提交日期：2008-05-01

摘要

语音信号通常会受到背景噪声的干扰。受到污染的语音一方面会使人耳产生听觉疲劳,另一方面也会降低语音识别,语音编码等语音信号处理系统的性能。因此,语音增强具有广泛的应用背景和研究意义。
     现实环境中,噪声在频域中的分布是不均匀的,而传统谱减法是在整个频域进行语音增强处理,使用的是同一个谱减参数,难以取得较好的增强效果。针对传统谱减法残留“音乐噪声”的问题,文中采用了基于多频带分析谱减的语音增强方法,在每一帧,每个频带,自适应地调节谱减参数,有效地降低了“音乐噪声”。其中,频带的划分本文采用了线性划分和非线性Bark频带划分两种划分方法。并通过实验分析对比了两种方法的性能,实验结果显示,两种方法均能有效改善语音质量,且非线性的Bark频带划分方法要优于线性频带划分方法。
     其次,本文研究了人耳听觉掩蔽特性,并将听觉掩蔽效应应用于多频带谱减的语音增强方法中。根据掩蔽阈值确定谱减参数,对含噪语音信号进行再次语音增强。与传统的语音增强方法相比,该方法有效的抑制了“音乐噪声”,提高了人耳听觉的舒适度。
     另外,为了更加准确地估计噪声的统计特性,本文还研究了噪声环境下的语音端点检测和噪声估计方法,提出了基于追踪低频带能量的语音端点检测方法,改进了噪声估计方法,实验结果表明,该方法能够较好的估计缓变的非平稳噪声。
     最后本文设计并实现了一个基于多频带分析谱减的语音增强系统,在计算机仿真条件下,对含有不同信噪比的白噪声和工厂噪声的语音分别进行语音增强处理,经过主观和客观测试表明,该方法能够较好的处理受到白噪声和缓变的非平稳噪声污染的语音信号,抑制了背景噪声,提高了语音的可懂度。
Speech signal is often accompanied by the background noise which causes many negative affects, such as polluted speech makes listeners feel tired and it degrades the performance of speech signal process. Therefore, speech enhancement is an important technology of the speech signal process.
     In real world, noise is mostly colored and does not affect the speech signal uniformly over the entire speech spectrum. To reduce the“musical noise”produced by basic spectral subtraction, a multi-band spectral subtraction method for enhancing speech corrupted by white and colored noise is studied. The variation of signal-to-noise rate is taken into account to confirm the subtraction factor in each frequency band. And we analyzed the improvement of speech quality after speech enhancement by linear and non-linear multi-band bark scale frequency spacing approaches. Experimental results show that both methods can improve the speech quality while non-linear multi-band bark scale frequency spacing approaches is better than linear frequency spacing approaches.
     Then human auditory masking is studied where its characteristic is combined with the multi-band spectral subtraction method. It efficiently reduces the musical noise and improves the comfort of the human auditory.
     Besides, the speech pause detection and noise spectrum estimation are researched for accurate estimation the noise statistical characteristic. The method named speech pause detection for noise spectrum estimation by tracking band power is used to do the speech pause detection and improve the noise estimation.
     Finally,a speech enhancement system based on multi-band spectrum subtraction method is designed to process the speech corrupted by random white and factory noise in different SNR . It proved that the method can largely reduce musical noise and improve speech quality.

引文

[1] 赵力,语音信号处理,机械工业出版社,2003.3
    [2] M.Berouti, R.Schwartz, and J.Mskhoul, “Enhancement of speech corrupted by acoustic noise”, IEEE International Conference on Acoustics, Speech, and Signal Processing, 1979: 208-211
    [3] S.F.Boll, “Suppression of acoustic noise in speech using spectral subtraction”, IEEE Transactions on Acoustics Speech and Signal Processing, 1979. 27 (3): 113-120
    [4] S. Kamath and P. Loizou, “A multi-band spectral subtraction method for enhancing speech corrupted by colored noise ”, IEEE Transactions on Acoustics Speech Signal Process, 2002. 4:164-168
    [5] Ghanbari, Yasser, Karami-Mollaei and Mohammad Reza, “Improved multi-band spectral subtraction method for speech enhancement”, Sixth IASTED International Conference on Signal and Image Processing, 2004:225-230
    [6] Radu Mihnea Udrea and Cioching, “Speech enhancement using spectral over subtraction and residual noise reduction”, Proceedings of the Symposium, 2003: 165-169
    [7] Radu Mihnea Udrea, Silviu Ciochina and Dragos Nicolae Vizireanu, “Multi-band Bark scale spectral over-subtraction for colored noise reduction”, International Symposium on Signals, Circuits and Systems, 2005:311-314
    [8] Radu Mihnea Udrea, N.Vizireanu, S.Ciochina and S.Halunga, “Non-linear spectral subtraction method for colored noise reduction using multi-band Bark scale”, Signal Processing, May 2008. 88: 1299-1303
    [9] Y.Ephraim and D.Malah, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator” IEEE Transactions on Acoustic Speech, Signal Processing, 1984.32(10): 1109-1121
    [10] J. D. Johnston, “Transform coding of audio signals using perceptual noise criteria”, IEEE Journal on Selected Areas in Communications, Feb. 1988. 6: 314-323,
    [11] Nathalie Virag, "Single channel speech enhancement based on masking properties of the human auditory system", IEEE Transactions on Speech and Audio Processing, March 1999. 7(2)
    [12] Petersen and Boll, “Acoustic noise suppression in the context of a perceptual model”, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1981. 3:1086-1088.
    [13] J.S. Lim and A.V. Oppenheim “Enhancement and bandwidth compression of noisy speech” Proceeding of the IEEE, Dec.1979. 67(12):1586-1604
    [14] Y.Ephraim and H.L.Van Trees “A signal subspace approach for speech enhancement” IEEE Transactions on Speech and Audio Processing, July 1995. 3(4):251-266
    [15] Yi Hu, C .Philipos and Loizou, “A generalized subspace approach for enhancing speech corrupted by colored noise IEEE Transaction on Speech and Audio Processing”, July 2003.11(4)
    [16] 张雄伟,陈亮,杨吉斌,现代语音处理技术及应用,机械工业出版社,2003
    [17] H .Traunmullar, “Analytical expression for the tonotopic sensory scale” Acoustical Society of America , 1990. 88:97-100
    [18] Li Q, Zheng J and Zhou Q, “A robust real-time endpoint detector with energy normalization for ASR in adverse environments” International Conference on Acoustics, Speech, and Signal Processing, 2001:574-577.
    [19]张仁志,崔慧娟,基于短时能量的语音端点检测研究,电声技术,2005.11:52-56
    [20]刘淑华,胡强,覃团发,梁琳,基于自相关函数最大值的语音端点检测方法,电声技术,2006.12:47-50
    [21] James F.Kaiser, “On a simple algorithm to calculate the energy of a signal”, IEEE International Conference on Acoustics,Speech and Signal Processing; 1990:381-384
    [22] Maek Maezinzik and Briger Kollmeier, “Speech pause detection for noise spectrum estimation by tracking power envelope dynamics” IEEE Transactions On Speech and Audio Processing, February 2002.10(2)
    [23] R. Martin: “An efficient algorithm to estimate the instantaneous SNR of speechsignals”, Proceedings European Speech 93, Berlin, September 1993:1093-1096
    [24] Rainer Martin, Senior Member, “Noise power spectral density estimation based on optimal smoothing and minimum statistics” IEEE Transactions On Speech and Audio Processing, July 2001. 9(5)
    [25] Cohen and Berdugo , “Noise estimation by minima controlled recursive averaging for robust speech enhancement” IEEE Signal Processing Letters, 2002. 9(1): 12-15
    [26] C.Ris and S.Dupont, “Assessing local noise level estimation methods: application to noise robust ASR”, Speech Communication, April 2001. 34(1): 141-158
    [27] V. Stahl, A.Fischer and R. Bippus, “Quantile based noise estimation for spectral subtraction and Wiener filtering”, Proceedings of 25th IEEE International Conference Acoust. Speech Signal Process, June 2000:1875-1878
    [28] Steven F.Boll, “Improving Linear Prediction analysis of noisy speech by predictive noise cancellation”. IEEE International Conference on Acoustics, Speech and Signal Processing, 1977:10-12
    [29] Steven F.Boll and Robert E. Wohlford.Event “Driven speech enhancement”. IEEE International Conference on Acoustics, Speech and Signal Processing, 1983: 1152- 1155
    [30] P. Lockwood and J. Boudy, “Experiments with a nonlinear spectral subtractor, hidden Markov models and projection for robust recognition in cars”, Speech Communication, 1992. 11: 215-228.
    [31] K. Yao, E. Shi, etc., “Residual noise compensation for robust speech recognition nonstationary”. IEEE International Conference on Acoustics, Speech and Signal Processing, 2000. 2: 1125-1128
    [32] R.J.McAulay and M.L.Malpass, “Speech enhancement using a soft-decision noise suppression filter," IEEE Transaction. Acoustics, Speech, Signal Processing, Apr. 1980.28
    [33] M.R Schroeder, B.S Atal, and J.L.Hall, “Optimizing digital speech coders by exploiting properties of the human ear”, Acoust.Soc.Amer., Dec.1979. 66(6):1647-1651
    [34] R.P.Hellman, "Asymmetry of mashing between noise and tone" Percept and Psychophys, 1972. 11: 241-246
    [35] 蔡汉添,袁波涛, 一种基于听觉掩蔽模型的语音增强算法, 通信学报,2002.23(8):93-98,
    [36] 陶智,赵鹤鸣,龚呈卉,吴迪,基于谱减法的听觉模拟的语音增强,计算机工程与应用, 2005.41:57-59
    [37]刘海滨,吴镇扬,赵力,曾毓敏,非平稳环境下基于人耳听觉掩蔽特性的语音增强,信号处理,2003.19(4):303-307
    [38] Gunawan, Teddy Surya, Ambikairajah and Eliathamby “Single channel speech enhancement using temporal masking” IEEE Singapore International Conference on Communication 2004:250-254

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700