用户名: 密码: 验证码:
基于连续语音识别技术的猪连续咳嗽声识别
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Pig continuous cough sound recognition based on continuous speech recognition technology
  • 作者:黎煊 ; 赵建 ; 高云 ; 刘望宏 ; 雷明刚 ; 谭鹤群
  • 英文作者:Li Xuan;Zhao Jian;Gao Yun;Liu Wanghong;Lei Minggang;Tan Hequn;College of Engineering,Huazhong Agricultural University;Cooperative Innovation Center for Sustainable Pig Production;College of Animal Science and Technology,College of Animal Medicine,Huazhong Agricultural University;
  • 关键词:信号处理 ; 声音信号 ; 识别 ; 生猪产业 ; 连续咳嗽声 ; 双向长短时记忆网络-连接时序分类模型 ; 声学模型
  • 英文关键词:signal processing;;acoustic signal;;recognition;;pig industry;;continuous cough;;birectional long short-term memory-connectionist temporal classification;;acoustic model
  • 中文刊名:NYGU
  • 英文刊名:Transactions of the Chinese Society of Agricultural Engineering
  • 机构:华中农业大学工学院;生猪健康养殖协同创新中心;华中农业大学动物科技学院动物医学院;
  • 出版日期:2019-03-23
  • 出版单位:农业工程学报
  • 年:2019
  • 期:v.35;No.358
  • 基金:国家重点研发计划项目(2018YFD0500700);; 华中农业大学自主科技创新基金;华中农业大学大北农青年学者提升专项项目(2017DBN005);; 现代农业产业技术体系项目(CARS-36);; 国家级大学生创新创业训练计划(201810504074)
  • 语种:中文;
  • 页:NYGU201906021
  • 页数:7
  • CN:06
  • ISSN:11-2047/S
  • 分类号:182-188
摘要
针对现有基于孤立词识别技术的猪咳嗽声识别存在识别声音种类有限,无法反映实际患病猪连续咳嗽的问题,该文提出了基于双向长短时记忆网络-连接时序分类模型(birectional long short-termmemory-connectionist temporal classification,BLSTM-CTC)构建猪声音声学模型,进行猪场环境猪连续咳嗽声识别的方法,以此进行猪早期呼吸道疾病的预警和判断。研究了体质量为75 kg左右长白猪单个咳嗽声样本的持续时间长度和能量大小的时域特征,构建了声音样本持续时间在0.24~0.74 s和能量大于40.15 V~2·s的阈值范围。在此阈值范围内,利用单参数双门限端点检测算法对基于多窗谱的心理声学语音增强算法处理后的30 h猪场声音进行检测,得到222段试验语料。将猪场环境下的声音分为猪咳嗽声和非猪咳嗽声,并以此作为声学模型建模单元,进行语料的标注。提取26维梅尔频率倒谱系数(Mel frequency cepstral coefficients,MFCC)作为试验语段特征参数。通过BLSTM网络学习猪连续声音的变化规律,并利用CTC实现了端到端的猪连续声音识别系统。5折交叉验证试验平均猪咳嗽声识别率达到92.40%,误识别率为3.55%,总识别率达到93.77%。同时,以数据集外1 h语料进行了算法应用测试,得到猪咳嗽声识别率为94.23%,误识别率为9.09%,总识别率为93.24%。表明基于连续语音识别技术的BLSTM-CTC猪咳嗽声识别模型是稳定可靠的。该研究可为生猪健康养殖过程中猪连续咳嗽声的识别和疾病判断提参考。
        Cough is one of the most frequent symptoms in the early stage of pig respiratory diseases. So it is possible to monitor and diagnose the diseases of pigs by detecting their coughs. The existing methods for pig cough recognition are based on key word recognition technology, which cannot recognize the samples that have not been trained or learned by itself,another drawback is that the methods are for isolated coughs while the coughs of sick pigs are usually continuous. This paper intends to realize the recognition of pig continuous cough sound based on continuous speech recognition technology. Ten Landrace pigs, with a body weight of about 75 kg, were used as sound collection objects, and pig sounds were collected in pig farms during late winter and early spring when the respiratory diseases of pigs were prevalent. The sound collection devices were working continuously all day. By selecting the frequent coughing phases in the collected signal, a total of 30 h pig farm sound signals were obtained as the experimental corpus. Firstly, the sound signals were denoised by the speech enhancement algorithm based on a psychoacoustical model. Then the time-domain characteristics, including duration and energy of individual cough, were studied, and it was found that the duration of pig cough ranged from 0.24 to 0.74 s and the energy ranged from 40.15 to 822.87 V~2·s. So threshold of the sound samples was set with the duration and the lower energy value of individual coughs. Based on the threshold range, the speech endpoint detection algorithm based on short-time energy was used to detect the 30 h pig field sound signals which had been preprocessed by the speech enhancement algorithm, and 222 experimental sentences were obtained. The longest was 9.14 s and the shortest was 3.91 s. All 222 corpus contained a total of1 145 sound samples, including 751 pig coughs and 394 non-pig coughs. Sounds in the pig farm environment, including cough,sneeze, eating, scream, hum, shaking ears sounds of pigs and sounds of dogs, metal clanging and some other background noise,were divided into pig cough and non-pig cough, which were chosen as the acoustic modeling units. The labels of the experimental sentences were obtained with the help of experts. Then the 13-dimensional Mel frequency cepstrum coefficients(MFCC) reflecting the static characteristics of pig sound were extracted, and the first-order differential coefficients reflecting the dynamic characteristics of pig sound were added to obtain the 26-dimensional MFCC, which were used as the characteristic parameter of the experimental sentence. Finally, the bidirectional Long Short-term Memory-Connectionist temporal classification(BLSTM-CTC) model was selected to recognize the pig continuous sounds, specifically, the BLSTM network had excellent feature learning ability of continuous pig sounds, and the CTC could directly model the alignment of the input continuous pig sound sequence and its labels. Through the 5-fold cross-validation experiment and analysis, the number of hidden layer neurons in the BLSTM forward propagation process, the backward propagation process, and the fully connected layer, were all set to 300, and the learning rate was set to 0.001. The average recognition rate, error recognition rate and total recognition rate of the results of 5 groups were 92.40%, 3.55% and 93.77%, respectively. Furthermore, the algorithm application test was carried out with another 1 h data, and the recognition rate reached to 94.23%, the error recognition rate was 9.09% with the total recognition rate of 93.24%. It is indicated that the pig cough sound recognition model based on continuous speech recognition technology is stable and reliable. This paper provides a reference for the recognition and disease judgment of pig continuous cough sound during the healthy breeding of pigs.
引文
[1]Cordeiro A,N??s I,Leit?o F,et al.Use of vocalisation to identify sex,age,and distress in pig production[J].Biosystems Engineering,2018,173:57-63.
    [2]Silva M,Ferrari S,Costa A,et al.Cough localization for the detection of respiratory diseases in pig houses[J].Computers and Electronics in Agriculture,2008,64(2):286-292.
    [3]Mitchell S,Vasileios E,Sara F,et al.The influence of respiratory disease on the energy envelope dynamics of pig cough sounds[J].Computers and Electronics in Agriculture,2009,69(1):80-85.
    [4]Sara F,Mitchell S,Marcella G,et al.Cough sound analysis to identify respiratory infection in pigs[J].Computers and Electronics in Agriculture,2009,64(2):318-325.
    [5]何东健,刘冬,赵凯旋.精准畜牧业中动物信息智能感知与行为检测研究进展[J].农业机械学报,2016,47(5):231-244.He Dongjian,Liu Dong,Zhao Kaixuan.Review of perceiving animal information and behavior in precision livestock farming[J].Transactions of the Chinese Society for Agricultural Machinery,2016,47(5):231-244.(in Chinese with English abstract)
    [6]Exadaktylos V,Silva M,Aerts J M,et al.Real-time recognition of sick pig cough sounds[J].Computers and Electronics in Agriculture,2008,63(2):207-214.
    [7]Hirtum A V,Berckmans D.Fuzzy approach for improved recognition of citric acid induced piglet coughing from continuous registration[J].Journal of Sound and Vibration,2003,266(3):677-686.
    [8]徐亚妮,沈明霞,闫丽,等.待产梅山母猪咳嗽声识别算法的研究[J].南京农业大学学报,2016,39(4):681-687.Xu Yani,Shen Mingxia,Yan Li,et al.Research of predelivery meishan sow cough recognition algorithm[J].Journal of Nanjing Agricultural University,2016,39(4):681-687.(in Chinese with English abstract)
    [9]Guarino M,Jans P,Costa A,et al.Field test of algorithm for automatic cough detection in pig house[J].Computers and Electronics in Agriculture,2008,62(1):22-28.
    [10]刘振宇,赫晓燕,桑静,等.基于隐马尔可夫模型的猪咳嗽声音识别的研究[C]//中国畜牧兽医学会信息技术分会第十届学术研讨会论文集,2015:99-104.
    [11]黎煊,赵建,高云,等.基于深度信念网络的猪咳嗽声识别[J].农业机械学报,2018,49(3):179-186.Li Xuan,Zhao Jian,Gao Yun,et al.Recognitional of pig cough sound based on deep belief nets[J].Transactions of the Chinese Society for Agricultural Machinery,2018,49(3):179-186.(in Chinese with English abstract)
    [12]陈升科.从中兽医学角度分析猪咳嗽气喘及治疗方案[J].中国动物保健,2015,17(3):22-23.
    [13]陈润生.猪咳嗽疾病的鉴别诊断[J].现代农业科技,2016(14):269-270.
    [14]Milone D H,Galli J R,Cangianoc C A,et al.Automatic recognition of ingestive sounds of cattle based on hidden markov models[J].Computers and Electronics in Agriculture,2012,87(3):51-55.
    [15]Reby D,Andreobrecht R,Galinier A,et al.Cepstral coefficients and hidden markov models reveal idiosyncratic voice characteristics in red deer(cervus elaphus)stags[J].Journal of the Acoustical Society of America,2006,120(6):4080-4089.
    [16]Milone D H,Rufiner H L,Galli J R,et al.Computational method for segmentation and classification of ingestive sounds in sheep[J].Computers and Electronics in Agriculture2009,65(2):228-237.
    [17]Trifa V M,Kirschel A N,Taylor C E,et al.Automated species recognition of antbirds in a mexican rainforest using hidden markov models[J].Journal of the Acoustical Society of America,2008,123(4):2424-2431.
    [18]Sepp H,Jurgen S.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
    [19]陈英义,程倩倩,方晓敏,等.主成分分析和长短时记忆神经网络预测水产养殖水体溶解氧[J].农业工程学报,2018,34(17):183-191.Chen Yingyi,Cheng Qianqian,Fang Xiaomin,et al.Principal component analysis and long short-term memory neural network for predicting dissolved oxygen in water for aquaculture[J].Transactions of the Chinese Society of Agricultural Engineering(Transactions of the CSAE),2018,34(17):183-191.(in Chinese with English abstract)
    [20]Bengio Y,Frasconi P,Simard P.The problem of learning long-term dependencies in recurrent networks[C]//IEEEInternational Conference on Neural Networks.IEEE,1993:1183-1188.
    [21]王智超,张鹏远,潘接林,等.连接时序分类准则声学建模方法优化[J].声学学报,2018,43(6):984-990.Wang Zhichao,Zhang Pengyuan,Pan Jielin,et al.Optimization of acoustic modeling method with connectionist temporal classification criterion[J].Acta Acustica,2018,43(6):984-990.(in Chinese with English abstract)
    [22]Bahdanau D,Chorowski J,Serdyuk D,et al.End-to-end attention-based large vocabulary speech recognition[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2016:4945-4949.
    [23]Graves A,JaitlyA N.Towards end-to-end speech recognition with recurrent neural networks[C]//International Conference on Machine Learning,2014:1764-1772.
    [24]Chia A O,Hariharan M,Yaacob S,et al.Classification of speech dysfluencies with mfcc and lpcc features[J].Expert Systems with Applications,2012,39(2):2157-2165.
    [25]李志忠,腾光辉.基于改进MFCC的家禽发声特征提取方法[J].农业工程学报,2008,24(11):202-205.Li Zhizhong,Teng Guanghui.Feature extraction for poultry vocalization recognition based on improved MFCC[J].Transactions of the Chinese Society of Agricultural Engineering(Transactions of the CSAE),2008,24(11):202-205.(in Chinese with English abstract)
    [26]Hinton G E.Learning multiple layers of representation[J].Trends in Cognitive Sciences,2007,11(10):428-434.
    [27]Lecun Y,Bengio Y,Hinton G E.Deep learning[J].Nature,2015,512:436-444.
    [28]赵明,杜回芳,董翠翠,等.基于word2vec和LSTM的饮食健康文本分类研究[J].农业机械学报,2017,48(10):202-208.Zhao Ming,Du Huifang,Dong Cuicui,et al.Diet health text classification based on word2vec and LSTM[J].Transactions of the Chinese Society for Agricultural Machinery,2017,48(10):202-208.(in Chinese with English abstract)
    [29]Schuster M,Paliwal K K.Bidirectional recurrent neural networks[J].IEEE Transactions on Signal Processing,2002,45(11):2673-2681.
    [30]Chen K,Huo Q.Training deep bidirectional LSTM acoustic model for LVCSR by a Context-Sensitive-Chunk BPTTapproach[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2016,24(7):1185-1193.
    [31]Woellmer M,Eyben F,Schuller B,et al.Spoken term detection with connectionist temporal classification:A novel hybrid CTC-DBN decoder[C]//International Conference on Acoustics Speech and Signal Processing(ICASSP).IEEE,2010:5274-5277.
    [32]Graves A,Gomez F.Connectionist temporal classification:Labelling unsegmented sequence data with recurrent neural networks[C]//International Conference on Machine Learning ACM,2006:369-376.
    [33]Abu-Khzam F N,Fernau H,Langston M A,et al.Afixed-parameter algorithm for string-to-string correction[C]//Sixteenth Symposium on Computing:the Australasian Theory(CATS 2010).Australian Computer Society,2010:31-37.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700