摘要
通过对腭裂语音声门塞音的研究,提出基于频谱能量加强段、Mel倒频谱系数(MFCC)、频带功率谱、小波信息熵和小波包信息熵特征参数的腭裂语音声门塞音自动识别算法。提取的声学特征参数结合K-最近邻(KNN)分类器,实现对腭裂声门塞音的自动识别。实验结果表明,基于5种声学特征参数的声门塞音检测系统的正确率均达到70%以上,小波信息熵、小波包信息熵均达到近90%的正确率,临界频带功率谱达到近95%的正确率,可为语音师提供有效的临床辅助诊断。
An automatic glottal stop detection method was proposed.Five acoustic features were extracted,including spectral energy strengthen segment,MFCC,critical band based power spectrum,wavelet entropy and wavelet packet entropy.The extracted acoustic features were combined with KNN classifier.The experimental results show that the classification accuracies of the proposed method based on five acoustic features reach 70% above.Moreover,the detection accuracies,using the features based on wavelet entropy and wavelet package information entropy,are 90%above.Especially,the detection accuracy using critical band power spectrum feature achieves 95%.The proposed method can provide effective clinical diagnosis to the speech therapists.
引文
[1]Blumstein S E,Stevens K S.Acoustic invariance in speech production:Evidence from measurements of the characteristics of stop consonants[J].Journal of the Acoustical Society of America,2009,66(4):1001-1017.
[2]Seid Hussien,Yegnanarayana B,Rajendran S.Spotting glottal stop in Amharic in continuous speech[J].Computer Speech and Language,2012,26(4):293-305.
[3]SHEN Xiangrong.The acoustic performances of glottal stop[J].Studies in Language and Linguistics,2010,30(3):35-39(in Chinese).[沈向荣.喉塞音的声学表现[J].语言研究,2010,30(3):35-39.]
[4]XIAO Yan,FENG Yongqiang,ZHAO Qingwei,et al.Acoustic analysis and detection of glottal stops substituted alveolar stops in cleft palate speech[J].Acta Acustica,2015(2):285-293(in Chinese).[肖彦,冯勇强,赵庆卫,等.腭裂语音中齿龈塞音的声门代偿现象声学分析与判定[J].声学学报,2015(2):285-293.]
[5]CHEN Bin,ZHANG Lianhai,WANG Bo,et al.Boundary detection of Chinese initials and finals based on seneff’s auditory spectrum features[J].Acta Acustica,2012(1):104-112(in Chinese).[陈斌,张连海,王波,等.基于Seneff听觉谱特征的汉语连续语音声韵母边界检测[J].声学学报,2012(1):104-112.]
[6]Rajesh Janakiraman,Chaitanya Kumar J,Hema A Murthy.Robust syllable segmentation and its application to syllable-centric continuous speech recognition[C]//IEEE Conference on Communications,2010:1-5.
[7]MENG Zihou.Statistical survey of female pure vowel formants[J].Acta Acustica,2009(3):199-202(in Chinese).[孟子厚.普通话单元音女声共振峰统计特性测量[J].声学学报,2009(3):199-202.]
[8]Loni DY,Subbaraman S.Formant estimation of speech and singing voice by combining wavelet with LPC and Cepstrum techniques[J].Industrial and Information Systems,2014:1-7.
[9]Doulah ABMSU,Islam S.Detection of various diseases by using formant track extraction and pitch contour analysis[C]//14th International Conference on Computer and Information Technology,2011:366-369.
[10]LYU Xiaoyun,WANG Hongxia.Abnormal sudio recognition algorithm based on MFCC and short term energy[J].Journal of Computer Applications,2010,30(3):796-798(in Chinese).[吕霄云,王宏霞.基于MFCC和短时能量混合的异常声音识别算法[J].计算机应用,2010,30(3):796-798.]
[11]Ahmad KS,Thosar AS,Nirmal JH,et al.A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network[J].Advances in Pattern Recognition,2015:1-6
[12]Wang Chen,Miao Zhenjiang,Meng Xiao.Differential MFCC and vector quantization used for real-time speaker recognition system[J].Image and Signal Proces-sing,2008,5:319-323.
[13]ZHANG Ting,HE Ling,HUANG Hua,et al.Noisy speech endpoint detection based on wavelet transform and energy entropy[J].Computer Engineering and Design,2013,34(4):1331-1335(in Chinese).[张婷,何凌,黄华,等.基于小波及能量熵的带噪语音端点检测算法[J].计算机工程与设计,2013,34(4):1331-1335.]
[14]Johari NA,Hariharan M,Saidatul A,et al.Multistyle classification of speech under stress using wavelet packet energy and entropy features[C]//IEEE Conference on Sustainable Utilization and Development in Engineering and Technology,2011:74-78.
[15]Zhao Xiaolan,Wu Zuguo,Xu Jiren,et al.Speech signal feature extraction based on wavelet transform[C]//Intelligent Computation and Bio-Medical Instrumentation,2011:179-182.
[16]ZHAO Lasheng.Study on feature extraction and recognition for speech emotion[D].Dalian:Dalian University of Technology,2010(in Chinese).[赵腊生.语音情感特征提取与识别方法研究[D].大连:大连理工大学,2010.]
[17]ZHANG Lei,LIU Jianwei,LUO Xionglin.KNN and RVM based classification method KNN-RVM Classifier[J].PR&AI,2010,22(3):376-384(in Chinese).[张磊,刘建伟,罗雄麟.基于KNN和RVM的分类方法——KNNRVM分类器[J].模式识别与人工智能,2010,22(3):376-384.]
[18]Pao Tsang-Long,Liao Wen-Yuan,Chen Yu-Te.Audiovisual speech recognition with weighted KNN-based classification in mandarin database[C]//International Conference on Intelligent Information Hiding and Multimedia Signal Processing,2007:39-42.
[19]Zhou Lijuan,Wang Linshuang,Ge Xuebin,et al.A clustering-based KNN improved algorithm CLKNN for text classification[J].Informatics in Control,Automation and Robotics,2010:212-215