用户名: 密码: 验证码:
基于稀疏表示的语音增强方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
语音在通信过程中不可避免的受到周围环境噪声的干扰,噪声过大时不仅使人们听不清对方的谈话内容,还容易使人们疲劳,产生烦躁的情绪。所以在接收端要进行去噪声处理,这就是语音增强技术或噪声抑制技术。在不引入新的噪声前提下如何有效的去除噪声又能保持语音不失真是语音增强的目的,从而提高语音信号的质量和可懂度。提高语音质量可以减少听者的疲劳,提高可懂度可以减少失真。语音增强广泛应用于语音识别、语音编码等系统中。在免提设备、助听器等领域的应用也越来越多。除此之外,在人机对话、机器翻译、蓝牙、智能家居等方面也发挥着越来越大的作用。经过几十年的发展,语音增强已经有许多成熟有效的算法,这些算法大致可分为基于谱减的语音增强方法、基于统计模型的语音增强方法、基于信号子空间的语音增强方法以及基于维纳滤波的语音增强方法。
     由于经傅里叶变换后的语音信号能量大多集中在低频部分,并且具有良好的去相关特性,语音增强方法中大部分算法都是在短时傅里叶变换域实现的。而有些方面的应用,如在语音编码中,功率谱估计器比幅度谱估计器能获得更好的性能。无论功率谱减法还是幅度平方谱估计器都假设带噪语音信号的功率谱可以表示为纯净语音和噪声功率谱之和,并且功率谱由幅度平方谱近似估计。基于这一假设,本文提出了基于功率谱稀疏表示的语音增强方法。稀疏表示是从一个过完备字典中选择少数原子以线性组合的形式来表示一个信号的全部或大部分信息,可以应用非负矩阵分解或压缩感知等技术获得最稀疏的线性组合。本文利用加非负限制的近似K奇异值分解(K-singular Value Decomposition,K-SVD)方法训练纯净语音的功率谱字典,然后利用最小角回归(LeastAngle Regression,LARS)算法获得纯净语音功率谱的稀疏表示,再把重构的功率谱应用于基于短时幅度谱的信号子空间方法(Signal SubspaceApproach Based on Short-time Spectral Amplitude,SSB-STSA)中,最后结合带噪语音信号的相位和傅里叶逆变换得到纯净语音信号的时域估计。LARS算法是根据估计的噪声功率谱设定一个合理参数作为终止准则的。带噪语音功率谱和稀疏重构得到的功率谱之差的l2范数如果小于设定的参数,算法就终止。由于该方法的噪声功率谱利用带噪语音初始段和静音部分,采用直接判决方法估计,所以只在平稳白噪声环境中可取得较好的增强效果。
     因为纯净信号与噪声信号谱之间的相关项不为零,所以把带噪语音信号的功率谱表示为纯净语音和噪声功率谱之和假设是不准确的。利用带噪语音、纯净语音和噪声谱之间的向量关系可以得到相关项的估计,这一估计可以表示为瞬时先验信噪比和瞬时后验信噪比的函数。本文基于相关项不为零的模型提出了新的基于功率谱稀疏表示的语音增强方法。采用最小控制递归平均(Minima Controlled Recursive Averaging,MCRA)方法对噪声的功率谱进行估计。把估计的噪声功率谱和相关项估计之和的l2范数作为LARS算法的终止准则,从而得到纯净语音功率谱的稀疏表示。字典仍然采用加非负限制的K-SVD算法训练得到。而且我们提出了一种利用当前帧而不是前一帧语音功率谱估计瞬时信噪比的方法。由于语音信号在前一帧和当前帧之间是时变的,对语音增强来说利用当前帧来估计瞬时信噪比是非常重要的。新的语音增强方法应用了更合理的语音模型和终止准则,所以适用于更多的噪声类型,尤其在低信噪比环境中获得了更好的估计性能。
     大多数语音增强方法都是在频域通过应用增益函数实现的,需要同时估计语音信号功率谱和噪声功率谱。这意味着语音增强系统的性能一部分取决于噪声功率谱估计的准确程度。传统的噪声功率谱估计方法通常利用带噪语音信号起始段或静音部分进行估计,而静音部分需要使用语音活动性检测算法进行检测。语音活动性检测算法只对平稳噪声检测结果较好,然而在低信噪比时误差较大。对于非平稳噪声,功率谱变化较快,所以在估计噪声功率谱时要及时对其进行更新。噪声功率估计过高或过低会产生降低可懂度或引入音乐噪声的不良影响。本文基于低复杂度低时延的无偏最小均方误差噪声功率谱估计方法,提出了基于语音存在概率的噪声功率谱估计方法。该方法基于幅度平方谱语音信号模型,利用由后验信噪比不确定性决定的后验语音存在概率来更新噪声功率谱的估计。该方法得到的噪声功率谱估计的最大值和无偏噪声功率谱估计方法接近,但提高了低估计值,所以很好地估计了噪声又避免了过高估计导致的信号失真。同时该方法又能快速跟踪噪声功率谱的变化,对平稳噪声和非平稳噪声都具有良好的估计效果。
     一般认为人耳对正弦信号的相位改变或相对相位的改变不敏感,也有学者认为语音信号中正弦成分相位变化或相位的突然变化会导致语音质量的下降,信号的相位包含了大量的信息。但是基于幅度谱的语音增强算法认为相位信息不能提高语音质量,因而只对幅度谱进行估计,忽略了相位信息。近年来,越来越多的学者关注语音增强中相位的重要性。本文在给定相位的最小均方误差(Minimum Mean-Square Error,MMSE)幅度谱估计器的基础上,提出了一种相位估计方法。该方法利用瞬时先验信噪比和瞬时后验信噪比推导出了相位差的特定表达式,进而利用反余弦函数和带噪语音的相位,得到了纯净语音的相位估计。本文算法是给定相位的MMSE幅度谱估计器补充和扩展,并且这一相位估计算法可以和其它幅度谱估计器相结合,从而提高增强后语音信号的质量。
Speech signal is inevitably degraded by ambient noise in speech communication. Inthe high levels of noise, people can’t hear each other’s conversation, but also feel fatigableand upset. So the noisy speech signal should be reduced noise when it is received, which isspeech enhancement technology or noise suppression technology. The purpose of speechenhancement is to improve the quality and intelligibility of degraded speech by reducingnoise efficiently as small as possible distortion and without introducing new noise. Theimprovement of speech quality can reduce the fatigue of listener. The improvement inintelligibility may reduce the distortion of speech signal. Speech enhancement technologyis widely used in speech recognition, speech coding systems. In hands-free device, hearingaid and other areas, the application is increasing. Additionally, it is also playing a more andmore important role in the man-machine dialogue, machine translation, Bluetooth, smarthome. During the past decades of development, there are many mature and efficient speechenhancement algorithms that can be broadly divided into four kinds of category: spectralsubtraction-based methods, statistical model-based algorithms, signal subspace-basedalgorithms and Wiener-filtering type methods.
     The speech signal is reduced the correlation and most of energy concentrates in lowfrequency, so most of speech enhancement algorithms are realized in short-time Fouriertransform domain. However, in some applications, such as in speech coding, optimalpower spectral density estimator might obtain better performance than that of amplitudespectrum estimator. Power spectrum subtraction method and magnitude-squared spectrumestimator are based on the assumption that the magnitude-square spectrum of the noisyspeech signal can be expressed as the sum of the clean speech and noisemagnitude-squared spectra, which is the approximation of power spectrum. Based on theassumption, we proposed a speech enhancement method based on the sparse representationof power spectrum. Sparse representation is the most compact representation that accountsfor most or all information of a signal in terms of a linear combination of only a smallnumber of atoms from an overcomplete dictionary, which can use the techniques fromnon-negative matrix factorization or compressed sensing to find the sparsest possible linearcombination. We use the approximation K-singular value decomposition (K-SVD)algorithm with nonnegative constraint to train the power spectrum dictionary of the cleanspeech and least angle regression (LARS) method to obtain the sparse representation of theclean power spectrum. The reconstructed power spectrum is used to signal subspaceapproach based on short-time spectral amplitude (SSB-STSA) estimator, and then theenhanced speech signal is obtained by combining the noisy phase and the inverse discrete Fourier transform. The termination rule of LARS algorithm is based on the reasonableparameter depended on the estimated noise power spectrum. If thel2norm of thedifference between the noisy and reconstructed speech power spectrum is less than theparameter, the iteration of the algorithm is terminated. Because the noise power spectrumis estimated by the decision-directed method in the beginning of the noisy speech, theproposed method only obtains the better performance in white noise environment.
     The cross term between the clean speech and noise spectra is not zero, so theassumption is inaccurate that the power spectrum of the noisy speech signal is the sum ofthe clean speech power spectrum and noise power spectrum. The cross term is estimated bythe vector relationships among the spectra of noisy speech, clean speech and noise incomplex plane, which is the function of the instantaneous versions of a priori and aposteriori signal-to-noise ratio (SNR). In this paper, we propose a new speech enhancementmethod based on the sparse presentation of the power spectrum using above speech model.We use the minima controlled recursive averaging (MCRA) method to estimate the noisepower spectrum. The2norm of the sum of the cross term and noise power spectrum isused as the termination rule of LARS algorithm. Then the sparse representation of theclean speech power spectrum is obtained. The dictionary is still trained by approximationK-SVD method with non negative constraint. Additionally, we present a new estimation ofthe instantaneous SNR through the speech power spectrum of current frame other than thatof previous frame. Since speech signal is time-varying between previous frame and currentframe, it is important for speech enhancement that the instantaneous SNR is estimated bythe speech power spectrum of current frame. Since the proposed method uses the morereasonable speech model and termination rule, it adapts to most of the noise environment,and obtains the better performance in low SNR condition.
     Most of speech enhancement methods are implemented using gain function infrequency domain, which need to estimate the speech power spectrum and noise powerspectrum simultaneously. This means that the performance of speech enhancement systemis partly decided by the accuracy of the estimate of the noise power spectrum. The noisepower spectrum is estimated traditionally by the beginning part or silence segments ofnoisy signal, which are detected by voice activity detector (VAD) method. The detectionresult is only well in stationary noise scenario, but there is more error in low SNR. Innonstationary noise environments, the power spectrum changes rapidly, so the estimationshould be updated as soon as possible. Using an overestimate or an underestimate of thetrue noise power spectrum will lead to reduce intelligibility or produce musical noise. Onthe base of the unbiased minimum mean-square error (MMSE) noise power estimationmethod with low complexity and low tracking delay, a noise power spectrum estimationmethod based on speech presence probability is proposed. Using the magnitude-squaredspectrum model, the new method updates the noise power spectrum estimation using the posteriori speech presence probability decided by a posteriori signal to noise ratiouncertainty. The maximum value of estimated power spectrum is closed to that of theunbiased estimation method and the lowest value is improved. It can estimate effectivelybackground noise without introducing speech distortion. The new method has thecharacters of tracking noise power spectrum accurately and following abrupt changesquickly in the noise spectrum. The quality of the enhanced speech signal is improved in acertain extent in stationary and nonstationary noise scenarios.
     It is well known that the ear doesn’t have any preference among the changes of thephase or changes in the relative phase in sinusoidal signals. However some researchersbelieve that the rapid fluctuation in the relative phase in the sinusoidal components of aspeech signal degrades the speech quality and the phase of a signal contains lots ofinformation. However the speech enhancement methods based on the amplitude spectrumignore the phase spectrum, which estimate the amplitude spectrum based on theassumption that the phase information can’t improve speech quality. Nowadays, more andmore researchers pay attention to the importance of the phase for speech enhancement. Wepropose an estimation of phase based on the MMSE spectral amplitude estimation giventhe phase. The specific expression of the phase difference is derived employing theinstantaneous versions of the a priori and a posteriori SNR and then the phase of the cleanspeech is estimated using the inverse of the cosine function and noisy phase. It is thecomplement and expansion of the algorithm based on the MMSE spectral amplitudeestimation given the phase. Moreover the proposed phase estimation method combinedwith other amplitude spectrum estimators can improve the quality of the enhanced speech.
引文
[1] Benesty J, Makino S, Chen J. Speech enhancement [M]. Springer,2005.
    [2] Cvijanovi N, Sadiq O, Srinivasan S. Speech enhancement using a remote wireless microphone[J]. IEEE Transactions on Consumer Electronics,2013,59(1):167-174.
    [3] Hersbach A A, Mauger S J, Grayden D B, et al. Algorithms to improve listening in noise forcochlear implant users [C]. IEEE International Conference on Acoustics, Speech, SignalProcessing,2013:428-432.
    [4] Mirzahasanloo T S, Kehtarnavaz N. A Generalized data-driven speech enhancement frameworkfor bilateral cochlear implants [C]. IEEE International Conference on Acoustics, Speech, SignalProcessing,2013:7269-7273.
    [5] Loizou P C. Speech enhancement: theory and practice [M]. CRC Press,2007.
    [6] Boll S F. Suppression of acoustic noise in speech using spectral subtraction [J]. IEEETransactions on Acoustics, Speech, and Signal Processing,1979,27(2):113-120.
    [7] Berouti M, Schwartz R, Makhoul J. Enhancement of speech corrupted by acoustic noise [C].IEEE International Conference on Acoustics, Speech, Signal Processing,1979,4:208-211.
    [8] Lockwood P, Boudy J. Experiments with a nonlinear spectral subtractor (NSS), Hidden Markovmodels and the projection, for robust speech recognition in cars [J]. Speech Communication,1992,11(2):215-228.
    [9] Sim B L, Tong Y C, Chang J S, et al. A parametric formulation of the generalized spectralsubtraction method [J]. IEEE Transactions on Speech and Audio Processing,1998,6(4):328-337.
    [10] Wang X, Wei J, Zhong X F. An improved spectral subtraction method based on modified MCRAnoise estimate and magnitude compensation [C]. IEEE International Symposium on IndustrialElectronics.
    [11] Miyazaki R, Saruwatari H, Inoue T, et al. Musical-noise-free speech enhancement based onoptimized iterative spectral subtraction [J]. IEEE Transactions on Audio, Speech, and LanguageProcessing,2012,20(7):2080-2094.
    [12] Upadhyay N, Karmakar A. The spectral subtractive-type algorithms for enhancing speech innoisy environments [C].1st International Conference on Recent Advances in InformationTechnology,2012:841-847.
    [13] Lim J S, and Oppenheim A V. All-pole modeling of degraded speech [J]. IEEE Transactions onAcoustics, Speech and Signal Processing,1978,26(3):197-210.
    [14] Lim J S, Oppenheim A V. Enhancement and bandwidth compression of noisy speech [J].Proceedings of the IEEE,1979,67(12):1586-1604.
    [15] Huang F, Lee T, Kleijn W B. Transform-domain wiener filter for speech periodicity enhancement[C]. IEEE International Conference on Acoustics, Speech, Signal Processing,2012:4577-4580.
    [16] Tseng H W, Vishnubhotla S, Hong M Y, et al. A Novel single channel speech enhancementapproach by combining wiener filter and dictionary learning [C]. IEEE International Conferenceon Acoustics, Speech, Signal Processing,2013:8653-8656.
    [17] Almajai I, Milner B. Visually derived Wiener filters for speech enhancement [J]. IEEETransactions on Audio, Speech, and Language Processing,2011,19(6):1642-1651.
    [18] Cornelis B, Moonen M, Wouters J. Performance analysis of multichannel Wiener filter-basednoise reduction in hearing aids under second order statistics estimation errors [J]. IEEETransactions on Audio, Speech, and Language Processing,2011,19(5):1368-1381.
    [19] Kokkinis E K, Reiss J D, Mourjopoulos J. A Wiener filter approach to microphone leakagereduction in close-microphone applications [J]. IEEE Transactions on Audio, Speech, andLanguage Processing,2012,20(3):767-779.
    [20] Ephraim Y, Malah D. Speech enhancement using a minimum mean-square error short-timespectral amplitude estimator [J]. IEEE Transactions on Acoustics, Speech, and Signal Processing,1984,32(6):1109-1121.
    [21] Ephraim Y and Malah D. Speech enhancement using a minimum mean square error log-spectralamplitude estimator [J]. IEEE Transactions on Acoustics, Speech, and Signal Processing,1985,33(2):443-445.
    [22] Martin R. Speech enhancement using MMSE short time spectral estimation with gammadistributed speech priors [C]. IEEE International Conference on Acoustics, Speech, SignalProcessing2002(1):253-256.
    [23] Martin R, Breithaupt C. Speech enhancement in the DFT domain using Laplacian speech priors[C]. International Workshop on Acoustic Echo and Noise Control,2003:87-90.
    [24] Gazor S, Zhang W. Speech enhancement employing Laplacian-Gaussian Mixture [J]. IEEETransactions on Speech and Audio Processing,2005,13(5):896-904.
    [25] Lotter T, Vary P. Speech enhancement by MAP spectral amplitude estimation using asuper-Gaussian speech model [J]. EURASIP Journal on Applied Signal Processing2005(7):1110-1126.
    [26] Martin R. Speech enhancement based on minimum mean-square error estimation andsupergaussian priors [J]. IEEE Transactions on Speech and Audio Processing,2005,13(5):845-856.
    [27] Burshtein D, Gannot S. Speech enhancement using a mixture-maximum model [J]. IEEETransactions on Speech and Audio Processing,2002,10(6):341-351.
    [28] Hao J C, Lee T W, Sejnowski T J. Speech enhancement using Gaussian scale mixture models [J].IEEE Transactions on Audio, Speech, and Language Processing,2010,18(6):1127-1136.
    [29] Ephraim Y, Trees H L V. A signal subspace approach for speech enhancement [J]. IEEETransactions on Speech and Audio Processing,1995,3(4):251-266.
    [30] Hu Y, Loizou P C. A generalized subspace approach for enhancing speech corrupted by colorednoise [J]. IEEE Transactions on Speech and Audio Processing,2003,11(4):334-341.
    [31] Jensen J, Heusdens R. Improved subspace-based single-channel speech enhancement usinggeneralized super-gaussian priors [J]. IEEE Transactions on Audio, Speech, and LanguageProcessing,2007,15(3):862-872.
    [32] Jabloun F, Champagne B. Incorporating the human hearing properties in the signal subspaceapproach for speech enhancement [J]. IEEE Transactions on Speech and Audio Processing,2003,11(6):700-708.
    [33] Guo X. A new speech enhancement method based on LCMP combined with subspace theory [C].International Conference on Computer Application and System Modeling,2010:243-246.
    [34] Cohen I. Relaxed statistical model for speech enhancement and a Priori SNR estimation [J].IEEE Transactions on Speech and Audio Processing,2005,13(5):870-881.
    [35] Plapous C, Marro C, Scalart P. Improved signal-to-noise ratio estimation for speech enhancement[J]. IEEE Transactions on Audio Speech and Language Processing,2006,14(6):2098-2108.
    [36] Alam M J, O'Shaughnessy D D, Selouani S A. Speech enhancement based on novel two-step apriori SNR estimators [C].9th Annual Conference of the International Speech CommunicationAssociation,2008:565-568.
    [37] McKinley B L, Whipple G H. Model based speech pause detection [C].22th IEEE InternationalConference on Acoustics, Speech, Signal Processing,1997:1179–1182.
    [38] Meyer J, Simmer K U, Kammeyer K D. Comparison of one-and two-channel noise-estimationtechniques [C].5th International Workshop on Acoustic Echo and Noise Control,1997:137–145.
    [39] Sohn J, Kim N S, Sung W. A statistical model-based voice activity detector [J]. IEEE SignalProcessing Letters,1999,6(1):1–3.
    [40] Martin R. Noise power spectral density estimation based on optimal smoothing and minimumstatistics [J]. IEEE Transactions on Speech and Audio Processing,2001,9(5):504-512.
    [41] Cohen I, Berdugo B. Noise estimation by minima controlled recursive averaging for robustspeech enhancement [J]. IEEE Signal Processing Letters,2002,9(1):12-15.
    [42] Cohen I. Noise spectrum estimation in adverse environments: improved minima controlledrecursive averaging [J]. IEEE Transactions on Speech and Audio Processing,2003,11(5):466-475.
    [43] Gerkmann T, Hendriks R C. Unbiased MMSE-based noise power estimation with low complexityand low tracking delay [J]. IEEE Transactions on Audio, Speech, and Language Processing,2012,20(4):1383-1392.
    [44] Wang D L, Lim J S. The unimportance of phase in speech enhancement [J]. IEEE Transactionson Acoustics Speech and Signal Processing,1982,30(4):6679-681
    [45] Wolfe P J, Godsill S J. Efficient alternatives to the Ephraim and Malah suppression rule for audiosignal enhancement [J]. EURASIP Journal on Applied Signal Processing,2003(10):1043–1051.
    [46] Trawicki M B, Johnson M T. Distributed multichannel speech enhancement with minimummean-square error short-time spectral amplitude, log-spectral amplitude, and spectral phaseestimation [J]. Signal Processing,2012,92(2):345–356.
    [47] Paliwal K, Wójcicki K, Shannon B. The importance of phase in speech enhancement [J]. SpeechCommunication,2011,53(4):465-494.
    [48] Krawczyk M, Gerkmann T. STFT phase improvement for single channel speech enhancement[C]. International Workshop on Acoustic Signal Enhancement,2012.
    [49] Gerkmann T, Krawczyk M. MMSE-optimal spectral amplitude estimation given the STFT-phase[J]. IEEE Signal Processing Letters,2013,20(2):129-132.
    [50] Ephraim Y. A Bayesian estimation approach for speech enhancement using hidden Markovmodels [J]. IEEE Transactions on Signal Processing,1992,40(4):725–735.
    [51] Zhao D Y, Kleijn W B. HMM-based gain modeling for enhancementof speech in noise [J]. IEEETransactions on Audio, Speech, and Language Processing,2007,15(3):882-892.
    [52] Mohammadiha N, Martin R, Leijon A. Spectral domain speech enhancement using HMMstate-dependent super-gaussian priors [J]. IEEE Signal Processing Letters,2013,20(3):253-256.
    [53] Mallat S G, Zhang Z F. Matching pursuits with time-frequency dictionaries [J]. IEEETransactions on Signal Processing,1993,41(12):3397-3415.
    [54] Donoho D L. Compressed sensing [J]. IEEE Transactions on Information Theory,2006,52(4):1289-1306.
    [55] Gemmeke J F, Virtanen T, Hurmalainen A. Exemplar-based sparse representations for noiserobust automatic speech recognition [J]. IEEE Transactions on Audio, Speech, and LanguageProcessing,2011,19(7):2067-2080.
    [56] Elad M, Aharon M. Image denoising via sparse and redundant representations over learneddictionaries [J]. IEEE Transactions on Image Processing,2006,15(12):3736-3745.
    [57] Mairal J, Elad M, Sapiro G. Sparse representation for color image restoration [J]. IEEETransactions on Image Processing,2008,17(1):53-69.
    [58] Wright J, Yang A Y, Ganesh A, et al. Robust face recognition via sparse representation [J]. IEEETransactions on Pattern Analysis and Machine Intelligence,2009,31(2):210-227.
    [59] Chen Y, Nasrabadi N M, Tran T D. Sparse representation for target detection in hyperspectralimagery [J]. IEEE Journal of Selected Topics in Signal Processing,2011,5(3):629-640.
    [60] Bofill P, Zibulevsky M. Underdetermined blind source separation using sparse representations [J].Signal Processing,2001,81(11):2353-2362.
    [61] Tian Z, Giannakis G B. Compressed sensing for wideband cognitive radios [C]. IEEEInternational Conference on Acoustics, Speech, Signal Processing,2007:1357-1360.
    [62] Yin J, Chen T Q. Direction-of-arrival estimation using a sparse representation of array covariancevectors [J]. IEEE Transactions on Signal Processing,2011,59(9):4489-4493.
    [63] Deng S W, Han J Q. Statistical voice activity detection based on sparse representation overlearned dictionary [J]. Digital Signal Processing,2013,23(4):1228-1232.
    [64] Lee C T, Yang Y H, Chen H H. Multipitch estimation of piano music by exemplar-based sparserepresentation [J]. IEEE Transactions on Multimedia,2012,14(3):608-618.
    [65] Naseem I, Togneri R, Bennamoun M. Sparse representation for speaker identification [C]. IEEEInternational Conference on Pattern Recognition,2010,4460-4463.
    [66]王天荆,郑宝玉,杨震.基于自适应冗余字典的语音信号稀疏表示算法[J].电子与信息学报,2011,33(10):2372-2377.
    [67]孙慧琳,杨震.基于数据驱动字典和稀疏表示的语音增强[J].信号处理,2011,27(12):1793-1800.
    [68] Zhao N, Xu X, Yang Y. Sparse representations for speech enhancement [J]. Chinese Journal ofElectronics,2011,19(2):268-272.
    [69] Jafari M G, Plumbley M D. Fast dictionary learning for sparse representations of speech signals[J]. IEEE Journal of Selected Topics in Signal Processing,2011,5(5):1025-1031.
    [70] Sigg C D, Dikk T, Joachim M, et al. Speech enhancement with sparse coding in learneddictionaries [C]. IEEE International Conference on Acoustics, Speech, Signal Processing,2010:4758-4761.
    [71] Sigg C D, Dikk T, Buhmann J M. Speech enhancement using generative dictionary learning [J].IEEE Transactions on Audio, Speech, and Language Processing,2012,20(6):1698-1712.
    [72] Hu Y, Loizou P C. Evaluation of objective quality measures for speech enhancement [J]. IEEETransactions on Audio, Speech, and Language Processing,2008,16(1):229-238.
    [73] Lee K. Application of non-negative spectrogram decomposition with sparsity constraints tosingle-channel speech enhancement [J]. Speech Communication,2014,58:69-80.
    [74] Sreenivas T V, Kirnapure P. Codebook constrained Wiener filtering for speech enhancement [J].IEEE Transactions on Speech and Audio Processing,1996:4(5):383-389.
    [75] Lev-Ari H, Ephraim Y. Extension of the signal subspace speech enhancement approach tocolored noise [J]. IEEE Signal Processing Letter,2003,10(4):104-106.
    [76] Ephraim Y. Statistical-model-based speech enhancement systems [J]. Proceedings of the IEEE,1992,80(10):1526-1555.
    [77] Goh Z, Tan K, Tan T G. Postprocessing method for suppressing musical noise generated byspectral subtraction [J]. IEEE Transactions on Speech and Audio Processing,1998,6(3):287-292.
    [78] Yousefian N, Loizou P C. A dual-microphone speech enhancement algorithm based on thecoherence function [J]. IEEE Transactions on Audio, Speech, and Language Processing,2012,20(2):599-609.
    [79] Yousefian N, Loizou P C. A dual-microphone algorithm that can cope with competing-talkerscenarios [J]. IEEE Transactions on Audio, Speech, and Language Processing,2013,21(1):145-155.
    [80] Hu J S, Lee M T, Yang C H. Robust adaptive beamformer for speech enhancement using thesecond-order extended H filter [J]. IEEE Transactions on Audio, Speech, and LanguageProcessing,2013,21(1):39-50.
    [81] Meyer J, Simmer K U. Multi-channel speech enhancement in a car environment using Wienerfiltering and spectral subtraction [C]. IEEE International Conference on Acoustics, Speech, andSignal Processing,1997:1167-1170.
    [82] Nehorai A, Porat B. Adaptive comb filtering for harmonic signal enhancement [J]. IEEETransactions on Acoustics, Speech and Signal Processing,1986,34(5):1124-1138.
    [83] Paliwal K K, Basu A. A speech enhancement method based on Kalman filtering [C]. IEEEInternational Conference on Acoustics, Speech, Signal Processing,1987,12:177-180.
    [84] Hu Y, Loizou P C. Speech enhancement based on wavelet thresholding the multitaper spectrum[J]. IEEE Transactions on Speech and Audio Processing,2004,12(1):59-67,
    [85] Soon Y, Koh S N. Speech enhancement using2-D Fourier transform [J]. IEEE Transactions onSpeech and Audio Processing,2003,11(6):717-724.
    [86] Rezayee A, Gazor S. An adaptive KLT approach for speech enhancement [J]. IEEE Transactionson Speech and Audio Processing,2001,9(2):87-95.
    [87] Soon Y, Koh S N, Yeo C K. Noisy speech enhancement using discrete cosine transform [J].Speech Communication,1998,24(3):249-257.
    [88] Blumensath T, Davies M E. Gradient Pursuits [J]. IEEE Transactions on Signal Processing,2008,56(6):2370-2382.
    [89] Xu Y, Zhang D, Yang J, et al. A two-phase test sample sparse representation method for use withface recognition [J]. IEEE Transactions on Circuits and Systems for Video Technology,2011,21(9):1255-1262.
    [90] Zhang C, Liu J, Tian Q, et al. Image classification by non-negative sparse coding, low-rank andsparse decomposition [C]. IEEE Conference on Computer Vision and Pattern Recognition,2011:1673-1680.
    [91] Fang Y, Lin W, Chen Z, et al. A saliency detection model based on sparse features and visualacuity [J]. IEEE International Symposium on Circuits and Systems,2013:2888-2891.
    [92] Wang B, Liu J, Sun X. Mixed sources localization based on sparse signal reconstruction [J].IEEE Signal Processing Letters,2012,19(8):487-490.
    [93] Casanovas A L, Monaci G, Vandergheynst P, et al. Blind audiovisual source separation based onsparse redundant representations [J]. IEEE Transactions on Multimedia,2010,12(5):358-371.
    [94] Pati Y C, Rezaifar R, Krishnaprasad P S. Orthogonal matching pursuit: Recursive functionapproximation with applications to wavelet decomposition [C].27th Asilomar Conference onSignals, Systems Computer,1993.
    [95] Chen S, Billings S A, Luo W. Orthogonal least squares methods and their application tonon-linear system identification [J]. International Journal of control,1989,50(5):1873–1896.
    [96] Chen S S, Donoho D L, Saunders M A. Atomic decomposition by basis pursuit [J]. SIAM journalon scientific computing,2001,43(1):129–159.
    [97] Tibshirani R. Regression shrinkage and selection via the lasso [J]. Journal of the Royal StatisticalSociety. Series B (Methodological),1996:267-288.
    [98] Efron B, Hastie T, Johnston I, et al. Least angle regression [J]. The Annals of statistics,2004,32(2):407–499.
    [99] Gorodnitsky I F, George J S, Rao B D. Neuromagnetic source imaging with FOCUSS: arecursive weighted minimum norm algorithm [J]. Electroencephalography and clinicalNeurophysiology,1995,95(4):231-251.
    [100] Gorodnitsky I F, Rao B D. Sparse Signal Reconstruction from Limited Data Using FOCUSS: ARe-weighted Minimum Norm Algorithm [J]. IEEE Transactions on Signal Processing,1997,45(3):600-616.
    [101] Olshausen B, Field D. Emergence of simple-cell receptive field properties by learning a sparsecode for natural images [J]. Nature,1996,381(6583):607–609.
    [102] Lewicki M S, Sejnowski T J. Learning overcomplete representations [J]. Neural computation,2000,12(2):337-365.
    [103] Aharon M, Elad M, Bruckstein A. K-SVD: an algorithm for designing overcomplete dictionariesfor sparse representations [J]. IEEE Transactions on Signal Processing,2006,54(11):4311-4322.
    [104] Rubinstein R, Zibulevsky M, Elad M. Efficient implementation of the K-SVD algorithm usingbatch orthogonal matching pursuit [R]. CS Technion,2008:40.
    [105] Rosset S, Zhu J. Piecewise linear regularized solution paths [J]. The Annals of Statistics,2007:1012-1030.
    [106] Sj strand K, Clemmensen L H, Larsen R, et al. SpaSM: A Matlab toolbox for sparse statisticalmodeling [J]. Journal of Statistical Software Accepted for publication,2012.
    [107] McOlash S M, Niederjohn R J, Heinen J A. A spectral subtraction method for the enhancement ofspeech corrupted by nonwhite, nonstationary noise [C]. Conference on Industrial Electronics,Control, and Instrumentation,1995,2:872-877.
    [108] Lu Y, Loizou P C. Estimators of the magnitude-squared spectrum and methods for incorporatingSNR uncertainty [J]. IEEE Transactions on Audio, Speech, and Language Processing,2011,(19)5:1123-1137.
    [109] Lu Y, Loizou P C. A geometric approach to spectral subtraction [J]. Speech Communication,2008,50(6):453-466.
    [110] Charoenruengkit W, Erdol N. The effect of spectral estimation on speech enhancementperformance [J]. IEEE Transactions on Audio, Speech, and Language Processing,2011,19(5):1170-1179.
    [111] Ellis D P W, Weiss R J. Model-based monaural source separation using a vector-qunatizedphase-vocoder representation [C]. IEEE International Conference on Acoustics, Speech, SignalProcessing,2006,5:957-960.
    [112] Srinivasan S, Samuelsson J, Kleijn W. Codebook driven shortterm predictor parameter estimationfor speech enhancement [J]. IEEE Transactions on Audio, Speech, and Language Processing,2006,14(1):163-176.
    [113] Srinivasan S, Samuelsson J, Kleijn W B. Codebook-based bayesian speech enhancement fornonstationary environments [J]. IEEE Transactions on Audio, Speech, and Language Processing,2007,15(2):441-452.
    [114] Rix A W, Beerends J G, Hollier M P, et al. Perceptual evaluation of speech quality (PESQ), andobjective method for end-to-end speech quality assessment of narrowband telephone networksand speech codecs [C]. IEEE International Conference on Acoustics, Speech, and SignalProcessing,2001,2:749-752.
    [115] Hansen J, Pellom B. An effective quality evaluation protocol for speech enhancement algorithms[C]. International Conference on Spoken Language Processing,1998,7:2819-2822.
    [116] Klatt D. Prediction of perceived phonetic distance from critical band spectra: A first step [C].IEEE International Conference on Acoustics, Speech, Signal Processing,1982,7:1278-1281.
    [117] Application guide for objective quality measurement based on recommendations P.862, P.862.1and P.862.2[J]. International Telecommunication Union,2005.
    [118] Tribolet J, Noll P, McDermott B, et al. A study of complexity and quality of speech waveformcoders [C]. IEEE International Conference on Acoustics, Speech, Signal Processing,1978:586-590.
    [119] Ma J F, Hu Y, Loizou P C. Objective measures for predicting speech intelligibility in noisyconditions based on new band-importance functions [J], Journal of the Acoustical Society ofAmerica,2009,125(5):3387-3405.
    [120] Garofolo J S. Getting started with the DARPA TIMIT CD-ROM: An acoustic phoneticcontinuous speech database [J]. National Institute of Standards and Technology (NIST),Gaithersburgh, MD,1988,107.
    [121] Varga A, Steeneken H J M. Assessment for automatic speech recognition: II. NOISEX-92: Adatabase and an experiment to study the effect of additive noise on speech recognition systems[J]. Speech Communication,1993,12(3):247–251.
    [122] Hu Y, Loizou P C. Subjective evaluation and comparison of speech enhancement algorithms [J].Speech Communication,2007,49(7):588-601.
    [123]欧世峰,赵晓晖.改进型先验信噪比估计语音增强算法[J].吉林大学学报(工学版),2009,39(3):787-791.
    [124] Erkelens J S, Hendriks R C, Heusdens R, et al. Minimum mean-square error estimation ofdiscrete Fourier coefficients with generalized Gamma priors [J]. IEEE Transactions on Audio,Speech, and Language Processing,2007,15(6):1741-1752.
    [125] Andrianakis I, White P R. Speech spectral amplitude estimators using optimally shaped Gammaand Chi priors [J]. Speech Communication,2009,51(1):1-14.
    [126] Martin R. Spectral subtraction based on minimum statistics [C]. Eur. Signal ProcessingConference,1994:1182-1185.
    [127] Hendriks R C, Heusdens R, Jensen J. MMSE based noise PSD tracking with low complexity [C].IEEE International Conference on Acoustics, Speech, Signal Processing,2010:4266-4269.
    [128] Cohen I, Berdugo B. Speech enhancement for nonstationary noise environments [J]. SignalProcessing,2001,81(11):2403-2418.
    [129] Yu R. A low-complexity noise estimation algorithm based on smoothing of noise powerestimation and estimation bias correction [C]. IEEE International Conference on Acoustics,Speech, Signal Processing,2009:4421-4424.
    [130] Gerkmann T, Breithaupt C, Martin R. Improved a posteriori speech presence probabilityestimation based on a likelihood ratio with fixed priors [J]. IEEE Transactions on Audio, Speech,and Language Processing,2008,16(5):910-919.
    [131] Gerkmann T, Krawczyk M, Martin R. Speech presence probability estimation based on temporalcepstrum smoothing [C]. IEEE International Conference on Acoustics, Speech, Signal Processing,2010:4254-4257.
    [132] Etter W, Moschytz G S. Noise reduction by noise-adaptive spectral magnitude expansion [J].Journal of the Audio Engineering Society,1994,42(5):341-349.
    [133] Diethorn E J. Subband noise reduction methods for speech enhancement. Acoustic signalprocessing for telecommunication [M]. Springer US,2000:155-178.
    [134] Faller C, Chen J. Suppressing acoustic echo in a spectral envelope space [J]. IEEE Transactionson Speech and Audio Processing,2005,13(5):1048-1062.
    [135] McAulay R, Malpass M. Speech enhancement using a soft-decision noise suppression filter [J].IEEE Transactions on Acoustics, Speech Signal Processing,1980,28(2):137-145.
    [136] Faraji N, Hendriks R C. Noise power spectral density estimation for public address systems innoisy reverberant environments [C]. International Workshop on Acoustic Signal Enhancement,2012:61-64.
    [137] Lu Y, Loizou P C. Speech enhancement by combining statistical estimators of speech and noise[C]. IEEE International Conference on Acoustics, Speech, Signal Processing,2010:4754-4757.
    [138] Lu C T. Enhancement of single channel speech using perceptual-decision-directed approach [J].Speech Communication,2011,53(4):495-507.
    [139] Chen R F, Chan C F, So H C. Model-based speech enhancement with improved spectral envelopeestimation via dynamics tracking [J]. IEEE Transactions on Audio, Speech and LanguageProcessing,2012,20(4):1324-1336.
    [140] Weiss M R, Aschkenasy A E, Parsons T W. Study and development of the INTEL technique forimproving speech intelligibility [R]. Nicolet Scientific Corp. NORTHVALE NJ,1975.
    [141] Oppenheim A V, Lim J S. The importance of phase in signals [J]. Proceedings of the IEEE,1981,69(5):529-541.
    [142] Breithaupt C, Krawczyk M, Martin R. Parameterized MMSE spectral magnitude estimation forthe enhancement of noisy speech [C]. IEEE International Conference on Acoustics, Speech,Signal Processing,2008:4037-4040.
    [143] You C H, Koh S N, Rahardja S.-order MMSE spectral amplitude estimation for speechenhancement [J]. IEEE Transactions on Speech Audio Processing,2005,13(4):475-486.
    [144] Andrianakis I, White P R. MMSE speech spectral amplitude estimators with Chi and Gammaspeech priors [C]. IEEE International Conference on Acoustics, Speech, Signal Processing,2006:1068-1071.
    [145] Madhu N, Spriet A, Jansen S, et al. The potential for speech intelligibility improvement using theideal binary mask and the ideal Wiener filter in signal channel noise reduction systems:application to auditory prostheses [J]. IEEE Transactions on Audio, Speech and LanguageProcessing,2013,21(1):63-72.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700