基于特征点与多网络联合训练的表情识别

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于特征点与多网络联合训练的表情识别

详细信息查看全文 | 推荐本文 |

英文篇名：Landmark-Based Facial Expression Recognition by Joint Training of Multiple Networks
作者：夏添 ; 张毅锋 ; 刘袁
英文作者：Xia Tian;Zhang Yifeng;Liu Yuan;School of Information Science and Engineering, Southeast University;Nanjing Institute of Communications Technologies, Southeast University;State Key Laboratory for Novel Software Technology, Nanjing University;
关键词：表情识别 ; 深度学习 ; 联合训练 ; 融合网络
英文关键词：expression recognition;;deep learning;;joint training;;fusion network
中文刊名：JSJF
英文刊名：Journal of Computer-Aided Design & Computer Graphics
机构：东南大学信息科学与工程学院;东南大学-南京通信技术研究院;南京大学计算机软件新技术国家重点实验室;
出版日期：2019-04-15
出版单位：计算机辅助设计与图形学学报
年：2019
期：v.31
基金：国家自然科学基金(61673108);; 江苏省自然科学基金(BK20151102);; 北京大学机器感知与智能教育部重点实验室开放课题(K-2016-03);; 东南大学水声信号处理教育部重点实验室开放项目(UASP1502)
语种：中文;
页：JSJF201904005
页数：8
CN：04
ISSN：11-2925/TP
分类号：42-49

摘要

由于表情图片序列比单张表情图片的信息更丰富,因此基于前者的表情识别容易取得更好的实验效果.针对表情图片序列,提出一种仅基于人脸特征点信息和联合训练2个深度神经网络进行表情识别的方法.首先基于长度不定的图片序列抽取各帧之间差异最大化的子集;其次提取该子集中所有图片的特征点坐标进行预处理;再将坐标分别输入微观深度网络(MIC-NN)与宏观深度网络(MAC-NN)进行独立训练;最后基于惩罚MIC-NN与MAC-NN间差异的损失函数联合训练二者后,使用融合网络(FUS-NN)作为最终预测模型.在CK+,Oulu-CASIA,MMI这3个数据集中的实验结果表明,FUS-NN取得了优于绝大部分已知方法 1%～15%的识别率,仅在MMI数据集中落后于最优模型2%;相比之下,该网络的时间复杂度远远小于效果相近的模型,取得了更好的识别效果与计算资源的平衡.
Information contained in expression image sequence is more abundant than single expression image, thus expression recognition based on the former is easier to achieve better results. An expression recognition method based on facial landmark information and joint training of two deep neural networks is presented in this paper. Firstly, fixed number of frames which maximize the distance among them were extracted from variable length image sequence. Then, coordinates of landmarks were extracted for preprocessing. Next, microcosmic deep network(MIC-NN) and macroscopic deep network(MAC-NN) were trained independently using landmark information. Finally, a loss function which punish the differences between MIC-NN and MAC-NN was applied for joint training of them, and their fusion network(FUS-NN) was tested as final prediction model. Experiments on CK+, Oulu-CASIA and MMI database indicate the recognition rate of FUS-NN surpass most of known methods by 1%-15%, only lags behind the optimal model by 2%in MMI database. However, the time complexity of FUS-NN is sharply reduced compared to those models with similar performance, achieving better balance between recognition rate and computing resources.

引文

[1]Lucey P,Cohn J F,Kanade T,et al.The extended Cohn-Kanade dataset(CK+):a complete dataset for action unit and emotion-specified expression[C]//Proceedings of the IEEE Society Conference on Computer Vision and Pattern Recognition Workshops.Los Alamitos:IEEE Computer Society Press,2010:94-101
    [2]Zhao G Y,Huang X,Taini M,et al.Facial expression recognition from near-infrared videos[J].Image and Vision Computing,2011,29(9):607-619
    [3]Valstar M F,Pantic M.Induced disgust,happiness and surprise:an addition to the MMI facial expression database[C]//Proceedings of International Conference Language Resources and Evaluation,Workshop on Emotion.Aire-la-Ville:Eurographics Association Press,2010:65-70
    [4]Yao Naiming,Guo Qingpei,Qiao Fengchun,et al.Robust facial expression recognition with generative adversarial networks[J].Acta Automatica Sinica,2018,44(5):865-877(in Chinese)(姚乃明,郭清沛,乔逢春,等.基于生成式对抗网络的鲁棒人脸表情识别[J].自动化学报,2018,44(5):865-877)
    [5]Hasani B,Mahoor M H.Facial expression recognition using enhanced deep 3D convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.Los Alamitos:IEEE Computer Society Press,2017:2278-2288
    [6]Zhang Y M,Ji Q.Active and dynamic information fusion for facial expression understanding from image sequences[J].IEEETransactions on Pattern Analysis and Machine Intelligence,2005,27(5):699-714
    [7]Zhao G Y,Pietikainen M.Dynamic texture recognition using local binary patterns with an application to facial expressions[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(6):915-928
    [8]P?iv?rinta J,Rahtu E,Heikkil?J.Volume local phase quantization for blur-insensitive dynamic texture classification[C]//Proceedings of the 16th Scandinavian Conference on Image Analysis.Heidelberg:Springer,2011,6688:360-369
    [9]Dalal N,Triggs B.Histograms of oriented gradients for human detection[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2005:886-893
    [10]Harandi M T,Sanderson C,Sanin A,et al.Spatio-temporal covariance descriptors for action and gesture recognition[C]//Proceedings of the IEEE Workshop on Applications of Computer Vision.Los Alamitos:IEEE Computer Society Press,2013:103-110
    [11]Fu Xiaofeng,Fu Xiaojuan,Li Jianjun,et al.Facial expression recognition using multi-scale spatiotemporal local orientational pattern histogram projection in video sequences[J].Journal of Computer-Aided Design&Computer Graphics,2015,27(6):1060-1066(in Chinese)(付晓峰,付晓鹃,李建军,等.视频序列中基于多尺度时空局部方向角模式直方图映射的表情识别[J].计算机辅助设计与图形学学报,2015,27(6):1060-1066)
    [12]Lopes A T,de Aguiar E,De Souza A F,et al.Facial expression recognition with convolutional neural networks:coping with few data and the training sample order[J].Pattern Recognition,2017,61:610-628
    [13]Liu M Y,Li S X,Shan S G,et al.Deeply learning deformable facial action parts model for dynamic expression analysis[C]//Proceedings of the 12th Asian Conference on Computer Vision.Heidelberg:Springer,2014,9006:143-157
    [14]Jung H,Lee S,Yim J,et al.Joint fine-tuning in deep neural networks for facial expression recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.Los Alamitos:IEEE Computer Society Press,2015:2983-2991
    [15]Xiong X H,De la Torre F.Supervised descent method and its applications to face alignment[C]//Proceedings of the IEEEConference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2013:532-539
    [16]Zhu X X,Ramanan D.Face detection,pose estimation,and landmark localization in the wild[C]//Proceedings of the IEEEConference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2012:2879-2886
    [17]Chen D,Ren S Q,Wei Y C,et al.Joint cascade face detection and alignment[C]//Proceedings of the 13th European Conference on Computer Vision.Heidelberg:Springer,2014,8694:109-122
    [18]Srivastava N,Hinton G,Krizhevsky A,et al.Dropout:a simple way to prevent neural networks from overfitting[J].The Journal of Machine Learning Research,2014,15(1):1929-1958
    [19]Liu M Y,Shan S G,Wang R P,et al.Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2014:1749-1756
    [20]Guo Y M,Zhao G Y,Pietik?inen M.Dynamic facial expression recognition using longitudinal facial expression atlases[C]//Proceedings of the 12th European Conference on Computer Vision.Heidelberg:Springer,2012,PartⅡ:631-644

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700