用户名: 密码: 验证码:
基于听觉仿生的目标声音识别系统研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
目标声音识别技术是声音识别的一个重要分支,它的发展极大地提高了人的工作效率、生活品质和服务质量。但是由于声音变化范围较大,声音识别系统很难进行精确匹配;而且声音容易受音量、音质、速度和背景噪声的影响而降低识别效果。因此,研究并设计具有高识别率和高鲁棒性的目标声音识别系统是十分必要的。
     随着声音信号处理技术的深入研究,结果发现人的听觉系统在听音辨物方面具有独特的优越性,它能够准确地提取目标声音特征并精确地识别声音的方向、类别和内容,基于人耳听觉仿生的目标声音识别技术日益受到重视。因此,本文针对基于听觉仿生的目标声音识别技术展开系统研究,积极探索先进的人耳仿生理论、特征提取技术、目标声音分类技术和基于FPGA的识别系统硬件实现方法,全文主要研究内容及成果如下:
     1.通过分析人耳听觉系统的生理结构及其对声音的感知过程,研究并建立了一个较为完整的听觉系统数学模型,实现对人耳声音处理过程的模拟。通过仿真实验表明,该数学模型可以较好地模拟耳蜗基底膜的分频滤波功能和内毛细胞的能量转换过程。
     2.通过分析比较几种常用的声音特征提取方法,针对其普遍存在的鲁棒性差等问题,提出一种基于听觉谱的声音特征提取方法。该方法采用听觉系统的数学模型对声音进行信号处理,其原理符合人耳对声音的处理过程,能够很好地提取声音的特征量,避免关键信息的丢失,提高系统的抗噪声性能和识别率。
     3.通过对常用几种模式识别方法的对比研究,综合考虑声音具有非线性的特点,本文选择具有自适应能力强的BP神经网络对目标声音信号进行识别及分类处理,该方法思想直观,数学意义明确。通过仿真实验表明:采用BP神经网络设计的分类器对所有测试样本的平均识别率达到93.14%,这说明此方法对目标声音特征进行分类识别是行之有效的。
     4.在听觉系统数学模型、听觉谱特征提取方法和BP神经网络识别算法已有研究的基础上,综合考虑算法的复杂程度、所需的硬件资源和对外接口等问题,本文提出采用FPGA嵌入式开发平台完成目标声音识别系统的硬件设计。该硬件系统采用VHDL硬件描述语言来模拟耳蜗基底膜的分频功能并设计了基底膜滤波器,采用NOIS II软核技术实现内毛细胞数学模型、耳蜗核数学模型、基于听觉谱的特征提取算法和基于BP神经网络的分类器。最后,针对大炮、救护车、轮船、火车和飞机滑行这5种不同目标声音,在基于FPGA的目标声音识别系统上进行了多次识别实验。测试结果表明,5类目标声音测试集中对救护车的测试样本识别率最高,达到了97.14%,而对大炮的测试样本识别率最低,达到85.71%,所有测试样本的平均识别率达到91.43%。实验结果证明,利用FPGA硬件实现的听觉仿生系统具有良好的识别效果,整个方案是可行且有效的。
     本文将听觉仿生技术和FPGA硬件技术成功地应用在目标声音识别系统中,为相关技术的研究和工程实践提供了理论支持和技术参考。
The recognition technology of target sound is one of the important branches ofacoustics recognition, whose development improves peoples’ working efficiency,quality of living and service quality greatly. But because of wider range of sound, it isvery difficult to do accurate matching for sound recognition system, and recognitioneffect is also easy to be reduced by the influence of volume, acoustics, velocity andbackground noise. Therefore, it is essential to study and design the recognition systemof target sound with high recognition rate and high robustness.
     With further study of audio signal processing, it is found that the human’sauditory system has an unique superiority in listening and distinguishing, which canextract feature of target sound accurately and recognize the direction, category andcontent precisely. And the target sound recognition based on the ear bionic isincreasingly concerned. Therefore, the target sound recognition based on the audiobionic is studied systematically in this paper, and the ear bionic theory, featureextraction technologies, target sound classification technology and hardwareimplementation of recognition system based on FPGA are explored actively. Allresearch works in the paper are outlined as following:
     1. Through analyzing the physiological structure of auditory system andperception process, a comparatively complete mathematical model of auditory systemis studied and established, which simulates the sound treatment processing. Thesimulation experiment shows that the mathematical model can commendably simulatethe frequency division and filtering of the basilar membrane of cochlea and theprocess of energy transition of inner hair cells.
     2. Through analyzing and comparing with usual methods of audio featureextraction, in the light of the commonly existing problems of poor robustness, anaudio feature extraction based on auditory spectrum is proposed. This methodprocesses signal by using the mathematical model of auditory system, which canaccord with the treatment process of ear, extract audio feature well, avoid losing thekey information and improve the anti-noise performance and the recognition rate ofthe system.
     3. Through studying the comparison with usual methods of pattern recognition,considering nonlinear algorithm of sound, BP Neural Network which has higheradaptability, direct form and clear mathematic significance is chosen in this paper torecognize and classify the target sound. The simulation experiment shows that theaverage recognition rate of all test samples by using BP Neural Network reaches93.14%, which suggests that it is an effectual method to classify and recognize thefeatures of target sound.
     4. Based on mathematical model of auditory system, auditory spectrum methodand BP Neural Network, considering the complexity of algorithm, hardware resourcesand interface etc, the hardware design of target sound reorganization by using FPGAembedded development platform is proposed in this paper. The hardware systemsimulates the frequency division of the basilar membrane of cochlea and designs thefilter of basilar membrane by using VHDL (hardware description language), andrealizes the mathematical model of inner hair cells and cochlear nucleus, as well asfeature extraction method based on auditory spectrum and FPGA hardwareimplementation of BP Neural Network. Finally, the five chosen different sounds,cannon, ambulance, steamship, train and aircraft taxiing as examples, are recognizedrepeatedly in target sound system based on FPGA. The test results show that therecognition rate for the sound samples of ambulance is maximum,97.14%, however,the recognition rate of cannon in testing sample is the minimum,85.71%. The averagerecognition rate of all testing sample reaches91.43%. The research results show thatthe auditory bionic system based on FPGA hardware implementation is effective insound recognition and the scheme is feasible.
     Auditory bionics technology and FPGA hardware technology are applied totarget sound recognition system successfully in this paper, which provide theoreticalsupport and technical reference for the study of relevant technology and engineeringpractice.
引文
[1]胡志峰.基于嵌入式声音识别技术的列车预警研究[D]:[硕士学位论文].西安:西南交通大学,2005
    [2]郭利刚,赵凡.声音匹配识别算法的研究与实践[J].中国传媒大学学报自然科学版,2007,14(1):20-25
    [3]曹慧敏.基于海上侦察系统的声音识别技术研究[D]:[硕士学位论文].南京:南京理工大学,2010
    [4]强勇,缑水平,王永刚.战场感知系统目标识别技术的进展[J].火控雷达技术,2008,37(1):1-9
    [5]周立伟,刘玉岩.目标探测与识别[M].北京理工大学出版社,2002,12
    [6]强勇,张冠杰,谷月东.目标识别技术及其在现代战争中应用[J].火控雷达技术,2005,34(3):1-5
    [7] COLOMBI J M, ANDERSON T R, ROGERS S K. Auditory modelrepresentation for speaker recognit on[C].Proc ICASSP. Piscataway, NJ: IEEE Press,1993:700-703
    [8]辛忻.车辆的声音识别技术[J].鞍山科技大学学报,2004,27(2):132-140
    [9]卢亚玲,谢兆鸿.基于DSP芯片实现的异常声音识别系统[J].武汉工业学院学报,2002,4:52-54
    [10]李振国,宋吉江,李月然.基于虚拟仪器的声音识别系统设计[J].山东理工大学学报:自然科学版,2011,25(1):101-103
    [11]郭利刚,赵凡.声音匹配识别算法的研究与实践[J].中国传媒大学学报:自然科学版,2007,14(1):20-25
    [12]张宇波.基于信号处理的声音模式识别过程及方法研究[J].计算机仿真,2004,21(9):134-137
    [13]于大海,孙健民.浅谈语音识别技术的应用和发展[J].理论科学,2009(11):22-23
    [14]王炳锡,屈丹,彭煊.实用语音识别基础[M].北京:国防工业出版社,2005
    [15]黄松岭,吴静.虚拟仪器设计基础教程[M].北京:清华大学出版社,2008
    [16]解国栋,易瑔,韩兆福,杨建昌.战场目标声音识别关键技术研究[J].火力与指挥控制,2008,33:33-35
    [17]朱志松.战场声目标特征提取研究[J].探测与控制学报,2006,28(3):9211
    [18]夏辉达.基于DSP的战场声目标识别技术的研究[D]:[硕士学位论文].太原:中北大学,2004
    [19]马祖礼.生物与仿生[M].天津:天津科学技术出版社,1984,12-13
    [20]刘福林.仿生学发展过程的分析[J].安徽农业科学,2007,35(15):4404-4408
    [21]路甬祥.仿生学的意义与发展[J].科学中国人,2004(4):24
    [22]王谷岩.视觉与仿生学[M].上海:知识出版社,1985:15-20
    [23]孙久荣,戴振东.仿生学的现状和未来[J].生物物理学报,2007,23(2):109-115
    [24]路甬祥.仿生学的科学意义与前沿[J].科学中国人,2004,4:22-34
    [25]郭策,戴振东,孙久荣.生物机器人的研究现状及其未来发展[J].机器人,2005,27(2):187-192
    [26]GeislerC D, Le S, Schwid H. Further studies on the Schroeder-Hall hair-cellmodel [J]. JASA,65(4):985-990,1979
    [27]GeislerC D. A model for discharge patterns of auditory-nerve fibers [J]. Brain Res,1981,21(2):198-201
    [28]GeislerC D, Greenberg S.A two-stage nonlinear cochlear model possessesautomatic gain control[J].JASA,1986,80(6):1359-1363
    [29]Colomes C, Lever M, Rault J B, et al. A Perceptual Model Applied to AudioBit-rate Reduction [J]. Jorunal of Audio Engineering Society,1995,43(4):233-240
    [30]Payton K L. Vowel processing by a model of the auditory periphery: Acomparison to eight-nerve responses [J]. JASA,1988,83(1):145-162
    [31]Sachs M B, Abbas P J. Phenomenological model for two tone suppression [J].JASA,1976,60(5):1157-1163
    [32]Seneff S.A joint synchrony/mean-rate model of auditory speech [J]. J. Phon-etics,1988,16(1):55-76
    [33]Carney L H.A model for the responses of low-frequency auditory-nerve fibers incat[J].JASA,1993,93(1):401-417
    [34]Carney L H.Spatiotemporal encoding of sound level: Models for normal encodingand recruitment of loudness [J]. Hearing Research,1994,76(1):31-44
    [35]Canvey L H.Yin T C T.Temporal coding of resonances by low-frequencyauditory nerve fibers: Single-fiber responses and a population model[J].J.Neurophysiol,1988,60(6):1653-1677
    [36]Paterson R D, Allerhand M H, Giguere C.Time-domain modeling of auditoryprocessing: A modular architecture and a software platform[J],JASA,1995,98(4):1890-1894
    [37]Paterson R D, Irino T.Modeling temporal asymmetry in the auditory system [J].JASA,1998,104(5):2967-2979
    [38]Shamma S. The acoustic features of speech sounds in a model of auditoryprocessing: vowels and voiceless fricatives [J]. J. Phonetics,1988,6(1):77-91
    [39]Terhardt E, Stoll G and Seewann M. Algorithm for Extraction of Pitch and PitchSalience from Complex Tonal Signals [J]. JASA,1982,71:679-688
    [40]Pan D. Digital audio compression [J]. Digital Technical J,1993,5:1-14
    [41]梁杰.基于双耳听觉模型的车内声品质分析与评价方法研究[D]:[博士学位论文].长春:吉林大学,2007
    [42]戴维萍,熊建文.听觉系统的定性分析及应用[J].海南师范学院学报:自然科学版,2005,18(1):27-31
    [43]廖琪梅,张星.听觉感知测试模型的计算机实现[J].医疗卫生装备,2007,28(6):74-75
    [44]施晓敏,顾济华,陶智,赵鹤鸣,张晓俊.基于听觉感知的电子耳蜗共振峰提取方案[J].计算机工程与应用,2007,43(29):232-234
    [45]赵鹤鸣,王永琦,陈雪勤.听觉模型反演方法及其应用[J].声学学报,2005,30(6):530-535
    [46]高雨青,黄泰翼,陈韶岩.听觉模型用于语音识别以及与一般方法的比较[J].电子学报,1993,21(10):1-6
    [47]高印寒,谢军,梁杰,李强.基于小波分析的听觉滤波器组模型[J].吉林大学学报:工学版,2008,38:177-181
    [48]杨俊,樊昌信.听觉模型及其应用[J].电子科学学刊:1992,14(1):7-14
    [49]王仁华,夏德瑜,付前杰.外周听觉系统的计算模型[J].生物物理学报:1991,7(4):436-441
    [50]马元锋,陈克安,王云山,马苗.自适应听觉感知时频分析模型[J].声学学报:2010,35(4):393-402
    [51]林宝成,黄志同.基于听觉模型的子波变换语音处理[J].数据采集与处理:1995,10(4):269-274
    [52]贾克明.嵌入式语音识别系统的研究[D]:[硕士学位论文].武汉:武汉理工大学,2007
    [53]董伟.特征提取及特征优选在车辆声识别中的应用研究[D].太原:中北大学硕士学位论文,2010
    [54]吴镇扬,王卫斌.基于空间特征提取与神经网络的人耳空间听觉模型[J].声学学报:1999,24(6):645-652
    [55]卢小春,尹俊勋,王修信.基于听觉模型特征的与文本无关说话人识别系统[J].广西师范大学学报:自然科学版,2010,28(6):22-26
    [56]张焱,闵丽娟,黄志同.基于听觉模型的语音特征提取[J].数据采集与处理:2000,15(3):307-311
    [57]卢绪刚,陈道文.听觉计算模型在鲁棒性语音识别中的应用[J].声学学报:2000,25(6):492-498
    [58]王跃,钱志鸿,王雪,程光明.基于伽马通滤波器组的听觉特征提取算法研究[J].电子学报,2010,38(3):525-528
    [59]马元锋,陈克安,王娜,郑文.听觉模型输出谱特征在声目标识别中的应用[J].声学学报:2009,34(2):142-150
    [60]刘辉,杨俊安,周志增.听觉模型倒谱系数及其在声目标识别中的应用[J].应用科学学报:2011,29(1):51-55
    [61]刘辉,杨俊安,许学忠.隐马尔可夫模型和支持向量机混合模型声识别[J].探测与控制学报,2009,31(6):33-37
    [62]吴岳松.基于听觉模型的水下目标识别研究[D]:[硕士学位论文].西安:西北工业大学,2005
    [63]李思纯.基于矢量水听器的目标特征提取与识别技术研究[D]:[博士学位论文].哈尔滨:哈尔滨工程大学,2007
    [64]吴占稳.起重机的声发射源特性及识别方法研究[D]:[博士学位论文].武汉:武汉理工大学,2008
    [65]陶传会,杨道淳,王炜.听觉系统识别语音信号的模拟[J].数据采集与处理,1999,14(2):157-162
    [66]吕兰兰,蒋东梅,王风娜,Hichem Sahli,Werner Verhelst.基于三流DBN模型的听视觉情感识别[J].计算机工程,2011,37(16):1-3
    [67]梁之安.听觉感受和辨别的神经机制[M].上海:上海科技教育出版社,1999
    [68]阮迪云,寿天德.神经生理学[M].合肥:中国科学技术大学出版社,1992:302-318
    [69]陈雪勤.基于计算声场景分析的混叠语音分离研究[D]:[硕士学位论文].苏州:苏州大学,2002
    [70]刘普和,邝华俊,吴幸生.医学物理学[M].北京:人民卫生出版社,1980:564-569
    [71]李鸿勋,贾秉钧,俞安清.生理学[M].郑州:河南医科大学出版社,1992:243
    [72]张耀辉.基于ZCPA特征参数的口令识别系统[D]:[硕士学位论文].中国科学技术大学.2010
    [73]W.P.Shofner,M.B.Sachs.Representation of a low-frequency tone in the dischargerate of populations of auditory nerve fibers[J].Hearing Research,1986,21(1):91-95
    [74]Greewood D D.A Cochlear frequency-position function for several species-29years later [J]. J A coust Soc Am,1990,87(6):2592-2605
    [75]Lyon R F, Mead C. An analog electronic cochlea [J]. A coustics, Speech, andSignal Processing,1988,36(7):1119-1134
    [76]Patterson R D, Moore B C J. Auditory filters and excitation patterns asrepresentations of frequency resolution[C].Frequency Selectivity in Hearing London:Academic Press,1986:123-177
    [77]Johannesma PIM. The pre-response stimulus ensemble of neurons in the cochlearnucleus[C].Proc of the Symposium on Hearing Theory. Eindhoven, Netherlands: IPO,1972:58-69
    [78]Meddis R et al. Implementation details of a computer model of the innerhair-cell/audiroty-nerve synapse [J].J.Acoust.Soc.Am.1990,87(4):1813-1818
    [79]Hewitt, M.J.and Meddis, R.An evaluation of eight computer models ofmammalian inner hair cell function [J], Journal of the Acoustical Society of America,1991,90(2):904-917
    [80]S.Seneff. Response planning and generation in the MERCURY flight reservationsystem [J]. Cpmputer Speech&Language,2002,16(3):283-312
    [81]陈亮,张雄伟.语音信号非线性特征的研究[J].解放军理工大学学报,2000,4(1),No.2:11-17
    [82]解国栋,易瑔,韩兆福,杨建昌.战场目标声音识别关键技术研究[J].火力与指挥控制,2008,33:33-35
    [83]张军英.说话人识别的现代方法与技术[M].贵州:西北大学出版社,1994
    [84]朱志松.战场声目标特征提取研究[J].探测与控制学报,2006,28(3):9-11
    [85]夏辉达.基于DSP的战场声目标识别技术的研究[D]:[硕士学位论文].太原:中北大学,2004
    [86]宁更新.抗噪声语音识别新技术的研究[D]:[博士学位论文].广州:华南理工大学,2006
    [87]K.Wang and S.A.Shamma. Spectral shape analysis in the central auditorysystem[C].IEEE Transactions on Speech and Audio Processing,1995,3(5):382-395
    [88]马元峰,陈克安,马苗.一种新的可应用于声目标识别的倒谱系数[J].兵工学报,2009,30(11):1477-1483
    [89]RP.Ramachandran,KR.Farrell,et al.Speaker recognition-general classifierapproaches and data fusion methods[J].Pattern Recognition,2002,35(12):2801-2821
    [90]VN.Vapnik.The nature of statistical learning theory[M].Springer-Verlag.1995,2
    [91]H.Sakoe, S.Chiba.Dynamic programming algorithm optimization for spokenword recognition[C].IEEE Transactionson Acoustic, Speech and Signal Processing,1978,26(1):43-49
    [92]FK.Soong, AE.Rosenberg, LR.Rabiner, et al.A vector quantization approach tospeaker recognition[C].Proceedings of the1985IEEE International Conference onAcoustics, Speech and Signal Processing,1985:387-390
    [93]AL.Higgins, LG.Bahler, JE.Porter.Voice identification using nearest neighbordistance measure[C].Proceedings of the1993IEEE International Conference onAcoustics, Speech and Signal Processing,1993:375-378
    [94]D.Reynolds, RC.Rose.Robust text-independent speaker identification usingGaussian mixture speaker models[C].IEEE Transactions on Speech and AudioProcessing,1995,3(1):72-83
    [95]刘鸣,戴蓓倩,李辉等.鲁棒性话者辨识中的一种改进的马尔科夫模型[J].电子学报,2002,30(1):46-48
    [96]D.Reynolds,B.Carlson.Text-dependent speaker verification using decoupled andintegrated speaker and speech recognizers[C].Proceedings of the1995IEEEInternational Conference on Acoustics,Speech and Signal Processing,1995:647-650
    [97]Dalei Wu, Ji Li, Haiqing Wu.-Gaussian Mixture modelling for speakerrecognition[J].Pattern Recognition Letters,2009,30:589-594
    [98]Fredric M.Ham Ivica Kostanic.Principles of Neurocomputing for Science&Engineering[M].McGRAW-HILL INTERNATIONAL EDITION,2007
    [99]吴川.基于神经网络的目标识别及定位方法的研究[D]:[博士学位论文].长春:中国科学院长春光学精密机械与物理研究所,2005
    [100]王文成.神经网络及其在汽车工程中的应用[M],北京理工大学出版社,1998:9-14
    [101]张文娟.基于NIOS II多核技术的BP神经网络的硬件实现方法研究[D]:
    [硕士学位论文].长春:东北师范大学,2009

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700