用户名: 密码: 验证码:
基于WGAN的语音增强算法研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Algorithm research of speech enhancement based on WGAN
  • 作者:王怡斐 ; 韩俊刚 ; 樊良辉
  • 英文作者:WANG Yifei;HAN Jungang;FAN Lianghui;Xi'an University of Posts & Telecommunications;
  • 关键词:语音增强 ; 生成对抗网络 ; 卷积神经网络 ; 深度学习
  • 英文关键词:speech enhancement;;generative adversarial nets;;convolution neural network;;deep learning
  • 中文刊名:CASH
  • 英文刊名:Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)
  • 机构:西安邮电大学;
  • 出版日期:2019-02-15
  • 出版单位:重庆邮电大学学报(自然科学版)
  • 年:2019
  • 期:v.31
  • 基金:国家自然科学基金重点资助项目(61136002)~~
  • 语种:中文;
  • 页:CASH201901018
  • 页数:7
  • CN:01
  • ISSN:50-1181/N
  • 分类号:140-146
摘要
带噪语音可看成由独立的噪声信号和语音信号经某种方式混合而成,传统语音增强方法需要对噪声信号和干净语音信号的独立性和特征分布做出假设,不合理的假设会造成噪声残留、语音失真等问题,导致语音增强效果不佳。此外,噪声本身的随机性和突变性也会影响传统语音增强方法的鲁棒性。针对这些问题,使用生成对抗网络来对语音进行增强,给出一种基于Wasserstein距离的生成对抗网络(Wasserstein generative adversarial nets,WGAN)的语音增强方法来加快训练速度和稳定训练过程。该方法无需人工提取声学特征,且使语音增强系统的泛化能力得以提升,在匹配噪声集和不匹配噪声集中都有良好的增强效果。实验结果表明,使用训练出的端对端语音增强模型后,语音信号的客观评价标准(perceptual evaluation of speech quality,PESQ)平均得到23.97%的提高。
        Noisy speech can be seen as a combination of an independent noise signal and a speech signal in some way. Traditional speech enhancement techniques need to make assumptions of the independence and feature distribution of noisy and clean speech signals. Unreasonable assumptions may cause problems such as residue noise and speech distortion,resulting in poor speech enhancement. In addition,the randomness and mutability of noise itself also affect the robustness of traditional speech enhancement methods. To solve these problems,this paper uses the generative adversarial network to enhance the speech,and gives a speech enhancement method based on the WGAN to accelerate the training speed and stabilize the training process. The method does not need to manually extract acoustic features,and it improves generalization capability of the speech enhancement system. There is a good enhancement effect in both the matched noise set and the unmatched noise set. The experimental results show that the PESQ is increased by an average of 23. 97% based on this end to end speech enhancement training model.
引文
[1] LI L,WU J,DING X,et al.Speech enhancement based on nonparametric factor analysis[C]//International Symposium on Chinese Spoken Language Processing. Tianjin,China:IEEE. Press,2017:1-5.
    [2] YOU C H,BIN M A. Spectral-domain speech enhancement for speech recognition[J]. Speech Communication,2017(94):30-41.
    [3] EPHRAIM Y,MALAH D. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator[J]. IEEE Transactions on Acoustics Speech&Signal Processing,2003,32(6):1109-1121.
    [4] WANG H Y,ZHAO X H,HAI-JUN G U. Speech enhancement using super gauss mixture model of speech spectral amplitude[J]. Journal of China Universities of Posts&Telecommunications,2011(18):13-18.
    [5] LIM J S,OPPENHEIM A V. Enhancement and bandwidth compression of noisy speech[J]. Proceedings of the IEEE,2005,67(12):1586-1604.
    [6] BOLL S F. Suppression of Acoustic Noise in Speech Using Spectral Subtraction[C]//Acoustics Speech&Signal Processing. Washington D C,USA:IEEE Press,1979:113-120.
    [7] SCALART P,FILHO J V. Speech enhancement based on a priori signal to noise estimation[C]//Acoustics,Speech,and Signal Processing,1996.on Conference Proceedings. USA,Atlanta,Georgia,1996 IEEE International Conference, IEEE Computer Society. Altenta Georiga OSA:IEEE Press,1996:629-632.
    [8] LU X G,TSAO Y,MATSUDA S,et al. Speech enhancement based on deep denoising Auto-Encoder[C]//The14thAnnual Conference of the International Speech Communication Association. Lyon:ISCA Archive,2013:436-440.
    [9] EPHRAIM Y,MALAH D. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator[J]. IEEE Transactions on Acoustics Speech&Signal Processing,2003,32(6):1109-1121.
    [10] BENGIO Y. Learning Deep Architectures for AI[J].Foundations&Trendsin Machine Learning,2009,2(1):1-127.
    [11] HINTON G E,OSINDERO S,TEH Y W. A Fast Learning Algorithm for Deep Belief Nets[J]. Neural Computation,2006,18(7):1527-1554.
    [12] XU Y,DU J,DAI L R,et al. An Experimental Study on Speech Enhancement Based on Deep Neural Networks[J]. IEEE Signal Processing Letters,2013,21(1):65-68.
    [13] PASCUAL S,BONAFONTE A,SERRAJ. SEGAN:Speech Enhancement Generative Adversarial Network[J].Interspeech,2017(8):3642-3646.
    [14] GOODFELLOW I J,POUGETABADIE J,MIRZA M,et al. Generative Adversarial Networks[J]. Advances in Neural Information Processing Systems,2014(3):2672-2680.
    [15]李策,赵新宇,肖利梅,等.生成对抗映射网络下的图像多层感知去雾算法[J].计算机辅助设计与图形学学报,2017,29(10):1835-1843.LI Ce,ZHAO Xinyu,XIAO Limei,et al. Generative Adversarial Mapping Nets with Multi-layer Perception for Image Dehazing[J]. Journal of computer aided design and graphics,2017,29(10):1835-1843.
    [16]王坤峰,苟超,段艳杰,等.生成式对抗网络GAN的研究进展与展望[J].自动化学报,2017(03):321-332.WANG Kunfeng,GOU Chao,DUAN Yanjie,et al. Generative Adversarial Networks:The State of the Art and Beyond[J]. Journal of automation,2017(03):321-332.
    [17] ARJOVSKY M,CHINTALA S,BOTTOU L. Wasserstein GAN[J]. Neural information processing systems foundation,2017(12):5768-5778.
    [18] EICKEN T V,BASU A,BUCH V,et al. U-Net:A User-Level Network Interface for Parallel and Distributed Computing[J]. Acm Sigops Operating Systems Review,1995,29(5):40-53.
    [19] MUKKAMALA M C,HEIN M. Variants of RMSProp and Adagrad with Logarithmic Regret Bounds[J]. International Machine Learning Society,2017(5):3917-3932.
    [20] VARGA A,STEENEKEN H J M. Assessment for automatic speech recognition:II. NOISEX-92:A database and an experiment to study the effect of additive noise on speech recognition systems[J]. Speech Communication,1993,12(3):247-251.
    [21] HU Guoning. 100 Nonspeech Sounds[EB/OL].[2017-12-04]. URL http://web. cse. ohio-state. edu/pnl/corpus/Hu Nonspeech/Hu Corpus.html.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700