摘要
为了挖掘基于视频的动态手势识别问题中手势的固有时空表示,提出一种3D-2D受限玻尔兹曼机(restricted Boltzmann machine,RBM)模型,以便建模手势视频数据的时空相关信息.特别地,为了更好地描述动态手势的时空特征,提出传统手工定义特征与3D-2D RBM结合的混合特征表示方法,该方法首先提取Canny-2D HOG表观特征以及光流-2D HOG运动特征,然后基于3D-2D RBM进一步学习动态手势潜在的高层时空语义特征,提升动态手势的特征描述力.融合手势外观判别和运动判别的双通道融合判别改进了单通道分类的能力.在公开的剑桥手势数据集上的实验验证了所提方法的有效性和优越性.
To explore the intrinsic spatio-temporal representation of dynamic hand gesture in the videobased hand gesture recognition,this paper proposed a 3D-2D restricted Boltzmann machine( RBM)model,which is able to model the spatio-temporal correlation of hand gesture video data. Especially,a method combining traditional hand-defined feature with 3D-2D RBM was proposed to describe hand gesture better. The proposed hybrid 3D-2D RBM model consists of three phases. First,Canny-2D HOG and optical flow 2D HOG were used to describe the spatial and temporal feature,respectively. A 3D-2D RBM was then adopted to learn the latent high-level semantics. Finally,the two-channel discrimination results were fused together for recognition. The experimental results on the public Cambridge Hand Gesture Data set show that the proposed hybrid 3D-2D RBM outperforms the state-of-the-art.
引文
[1]ESHED O B,MOHAN M T.Hand gesture recognition in real time for automotive interfaces:a multimodal visionbased approach and evaluations[J].IEEE Transactions on Intelligent Transportation Systems,2014,15(6):2368-2377.
[2]WU D,SHAO L.Deep dynamic neural networks for gesture segmentation and recognition[C]∥Computer Vision-ECCV 2014 Workshops.Berlin:Springer,2015:552-571.
[3]AUEPHANWIRIYAKUL S,PHITAKWINAI S.Thai sign language translation using scale invariant feature transform and hidden markov models[J].Pattern Recognition Letters,2013,34(11):1291-1298.
[4]WANG M,CHEN W Y,LI X D.Hand gesture recognition using valley circle feature and hu's moments technique for robot movement control[J].Measurement,2016,94:734-744.
[5]PRASUHN L,OYAMADA Y,MOCHIZUKI Y,et al.Ahog-based hand gesture recognition system on a mobile device[C]∥Proceedings of IEEE International Conference on Image Processing.Piscataway:IEEE,2014:3973-3977.
[6]SIMONYAN K,ZISSERMAN A.Two-stream convolutional networks for action recognition in videos[C]∥Advances in Neural Information Processing Systems.New York:ACM,2014:568-576.
[7]CHEN F S,FU C M,HUANG C L.Hand gesture recognition using a real-time tracking method and hidden markov models[J].Image and Vision Computing,2003,21(8):745-758.
[8]YANG M H,AHUJA N,TABB M.Extraction of 2Dmotion trajectories and its application to hand gesture recognition[J].IEEE Transaction Pattern Analysis and Machine Intelligence,2002,24(8):1061-1074.
[9]FISCHER A,IGEL C.An introduction to restricted boltzmann machines[J].Lecture Notes in Computer Science,2012,7441:14-36.
[10]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Image Net classification with deep convolutional neural networks[C]∥Advances in Neural Information Processing Systems.New York:ACM,2012:1097-1105.
[11]MOLCHANOV P,GUPTA S,KIM K,et al.Hand gesture recognition with 3d convolutional neural networks[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops.Piscataway:IEEE,2015:1-7.
[12]HUANG J,ZHOU W,LI H,et al.Sign language recognition using 3d convolutional neural networks[C]∥Proceedings of IEEE International Conference on Multimedia&Expo.Piscataway:IEEE,2015:1-6.
[13]QI G L,SUN Y F,GAO J B,et al.Matrix Variate RBMand Its Applications[C]∥Proceedings of IEEEInternational Joint Conference on Neural Networks.Piscataway:IEEE,2016:389-395.
[14]HINTON G E,SALKHUTDINOW R R.Reducing the dimensionality of data with neural networks[J].Science,2006,313(5786):504-507.
[15]CANNY J.A computational approach to edge detection[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,1986,8(6):679-98.
[16]FARNEBACK G.Two-frame motion estimation based on polynomial expansion[J].Lecture Notes in Computer Science,2003,2749:363-370.
[17]NAVNEET D,BILL T.Histograms of oriented gradients for human detection[C]∥Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2005:886-893.
[18]KIM T K,CIPOLLA R.Canonical correlation analysis of video volume tensors for action categorization and Detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(8):1415-1428.
[19]LUI Y M.Human gesture recognition on product manifolds[J].Journal of Machine Learning Research,2012,13(1):3297-3321.