基于张量和典型相关分析的人体行为识别算法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

NSTL服务站

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Human Action Recognition Based on Tensor and Canonical Correlation Analysis
作者：贾程程
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：行为识别 ; 张量空间 ; 典型相关分析 ; 线性判别分析 ; 递增学习 ; 多尺度特征 ; 多判别融合
英文关键词：Action recognition ; tensor subspace ; canonical correlation analysis ; linear discriminant
英文关键词：analysis ; incremental learning ; multi-scale feateure ; fusion of discriminant methods
学位年度：2013
导师：于哲舟
学科代码：081203
学位授予单位：吉林大学
论文提交日期：2013-06-01
答辩委员会主席：张斌

摘要

为了避免图像向量化引起的维度诅咒问题，并且学习不同张量样本之间的相关性，我们提出了一种基于典型相关分析(CCA)的张量判别方法(MDCC)。我们在Weizmann行为数据库上进行实验，与现有的基于CCA的行为识别方法作比较，取得了较高的识别率，并且对于残破图像具有较强的鲁棒性。
     接下来，我们引入递增学习的思想，通过递增加入训练样本来更新判别投影矩阵。我们同样在Weizmann行为数据库上进行了实验，算法具有较高的识别率以及较强的鲁棒性。并且具有较低的时间复杂度。最后，算法通过迭代学习达到了聚合。
     人体行为可以在不同的尺度空间中表示，描述动作在不同尺度上的角色和意义。我们提出了一种基于多尺度特征的张量判别分析方法(MSF-TDA)来构建行为模型。将所有的行为样本组织成一个大的张量，其中每个维度空间都代表了不同的语义信息，比如不同视角，以及不同的行为执行者。我们在理想数据库和KTH行为数据库中进行了实验，取得了较高的识别率。MSF-TDA对于不同的视角具有较强的鲁棒性，并且降低了时间复杂度。
     提出一种基于张量的多判别融合算法(FTDA)，首先应用线性判别分析(LDA)提取图像的特征，然后应用判别典型相关分析(DCC)得到张量之间的最大相关度。在三个通用数据库中，即KTH行为数据库，C-PASCAL数据库以及AR人脸数据库中，FTDA有较高的识别率，时间复杂度较低。FTDA可以很好地处理破损图像以及遮挡图像，另外，FTDA在迭代过程中聚合。并且由于判别函数值不受转换矩阵初值影响，所以判别函数是凸函数。
In recent years, the canonical correlation analysis (CCA) is widely used in the field ofbehavior recognition, such as discriminant canonical correlation analysis (DCC), theincremental discriminant canonical correlation analysis (IDCC), because it can reflect thedegree of correlation of the two image collections, as a similarity measure. However, in DCCand IDCC algorithms, the original structure of the sample is damaged, and the largedimension caused by vectorization can result into dimensional curse problem.. And the timecomplexity is very high. In this paper, CCA-based tensor discriminant analysis method isproposed to learn discriminant correlation between different tensor samples. Experiments onthe Weizmann behavior database compared with the-state-of-the-art CCA-based methodsachieved higher recognition rate and has strong robustness for broken images. The MDCC hasthree nolvelties as follows:
     1. MDCC employs tensor expression, to avoid the curse of dimensional curse problem;
     2. MDCC takes into account the typical correlation between the tensor samples;
     3. MDCC not only takes into account the correlation between the pair of tensor samples,also takes into account the correlation between the homogeneous and heterogeneous samples,for making the multi-linear discriminant analysis.
     Incremental learning has been used to classification areas, such as the incremental lineardiscriminant analysis (ILDA), incremental discriminant canonical correlation analysis (IDCC),as well as the increment tensor biased discriminant analysis (ITBDA) with to deal with theaction recognition and tracking issues. This article will be to introduce the idea of incrementalversion of MDCC method, by adding training samples incrementally instead of the wholeinitial training samples, thereby reducing the time complexity. Moreover, the more effectivediscriminant projection matrices can be obtained by updating iteratively. Experiments areperformed on the Weizmann behavior database, IMDCC recognition performance is higherthan the-state-of-the-art behavior recognition algorithms. IMDCC can obtain higherrecognition rate regardless of broken images, so it has better robustness. IMDCC has threenovelties as follows:
     1. IMDCC has lower computational complexity, therefore can be used as an effective method to deal with large-scale data;
     2. IMDCC converges by iterative learning;
     3. The original data is represented as a tensor, thereby avoiding the dimensional curseproblem caused by image vectorization.
     Higher order tensor analysis as an extended version of the vector or matrix analysis, hasbeen widely used in the field of biometrics recognition. There are two expressions for datasamples. The first way is independent a sample to a tensor expression, so the whole specialinformation of the sample can be retained. In the external environment, however, there arealways some interference factors exist in the field of behavior recognition, for example, thecamera angle or different people causing data differences. Accordingly, the second version ofexpression is that all samples is expressed as a tensor, under normal circumstances, the tensorof each mode represents a related external influence factor. This article focuses on the seconddata expression, mainly to consider the impact of multi-angle factors in behavior recognition.In addition, in order to improve the classification ability, this paper combines tensordiscriminant method. This paper proposed a new method called tensor-based multi-scalefeature discriminant analysis method (MSF-TDA) to build the behavioral model, to performthe behavior recognition. All the behavior samples are organized into a big tensor, where eachmode represents different semantic information, for example, different view angle, anddifferent behavior executor. Multi-scale feature extracted from each sample is used toillustrate the details of the different scales of movement, then is used for the tensor spacediscriminant analysis. This iterative learning method is confirmed by the theory and practiceof this paper. The proposed method combines the nearest neighbor classification (NNC)method, compared with the-state-of-the-art methods, to increase the recognition rate andreduces the time complexity, it also has better robustness according to changing view angles.MSF-TDAalgorithm has three nolvelties as follows:
     1. MSF-TDAdescribes the characteristics of the behavior in different scales;
     2. MSF-TDA considers and processes all the external factors which may affect therecognition rate;
     3. The time complexity of MSF-TDA can be greatly reduced, as the multi-scale featurewith lower dimension is used.
     In this paper, CCA-based tensor discriminant analysis method (MDCC/IMDCC) isproposed, which transforms the behavioral sequence into a third-order tensor, in order tocalculate the discriminant projection matrices by the correlation of tensor samples. Compared to TCCA, MDCC and IMDCC get a higher recognition rate because of the inherentdiscrimination. However, all of these methods are applied only one dimensionality reductionmethod in the tensor space, such as linear discriminant analysis (LDA) or the CCA, withouttaking into account the practical significance of each mold of a tensor, for example, the timecorrelation of sequence mode and the pixel characteristics of the image mode. We know abehavior sequence can be regarded as a third-order tensor, mdoe-1and the mode-2show theimage information of a behavior, while the mold-3shows the time information. Similarly, theface image may also be represented as a third-order tensor, mdoe-1and the mode-2show theface pixels, while the mold-3shows expression, illumination or color (RGB) factors. Thesame applies to the target image, mdoe-1and the mode-2show a target image, while themold-3represents the illumination, angle or a change in color. Considering that the imagesequence can reflect the time information of the behavior, the expression, light, colorinformation of the human face, and light, angle, color information of the target image, andtherefore can use these factors to improve the recognition rate of the sub-sequenceidentification method. In this paper, a tensor discriminant fusion algorithm (fusion tensordiscriminant analysis FTDA) is proposed, which combines the linear discriminant analysis(LDA) with the discriminant canonical correlation analysis (DCC) in tensor space. FTDAfirstperformed LDAin (N-1) modes which indicate the characteristic dimension, for extracting thefeature of the image, and then performs DCC in N mode which indicates the relevantdimensions for obtaining maximum correlation in the sequence. Projection combined withLDA and DCC is different from the traditional method of tensor analysis. There are threecommon databases, the KTH behavior database, C-PASCAL database, and AR face database,used for experiment. FTDA performs better than the-state-of-the-art tensor discriminantmethods. FTDAhas low time complexity than other methods, therefore suitable to be used forlarge data calculation. FTDA can deal well with the damaged images and occlusion images,thereby improving its robustness. In addition, FTDA conveges in an iterative procedure,which has been proved in experiment. The discriminant function value is unique regardless ofinitial transformation matrices, so the discriminant function is convex. The FTDA algorithmhas three nolvelties as follows:
     1. Considering the different characteristic of each mode of a tensor, FTDA performsdifferent projection method on different mode;
     2. FTDA takes into account the correlation of discriminant information of the image
     sequence;
     3. FTDAconverges by iterative learning.

引文

[1] Kembhavi A, Yeh T, Davis L. Why did the person cross the road (there)? sceneunderstanding using probabilistic logic models and common sense reasoning[J].Computer Vision–ECCV2010,2010:693-706.
    [2] Candamo J, Shreve M, Goldgof D B, et al. Understanding transit scenes: a survey onhuman behavior-recognition algorithms[J]. Intelligent Transportation Systems, IEEETransactions on,2010,11(1):206-224.
    [3] Lu H, Plataniotis K N, Venetsanopoulos A N. MPCA: Multilinear principal componentanalysis of tensor objects[J]. Neural Networks, IEEE Transactions on,2008,19(1):18-39.
    [4] Tao D, Li X, Wu X, et al. General tensor discriminant analysis and gabor features for gaitrecognition[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on,2007,29(10):1700-1715.
    [5] Kim T K, Cipolla R. Canonical correlation analysis of video volume tensors for actioncategorization and detection[J]. Pattern Analysis and Machine Intelligence, IEEETransactions on,2009,31(8):1415-1428.
    [6] Chinpanchana S. Semantic human action classification based on energy-action model[C].TENCON2006.2006IEEE Region10Conference. IEEE,2006:1-4.
    [7]于仰泉.基于时空特征的交互行为识别研究[D].吉林大学,2012.
    [8] Yao B, Bradski G, Fei-Fei L. A codebook-free and annotation-free approach forfine-grained image categorization[C]. Computer Vision and Pattern Recognition (CVPR),2012IEEE Conference on. IEEE,2012:3466-3473.
    [9] Chen H S, Chen H T, Chen Y W, et al. Human action recognition using star skeleton[C].Proceedings of the4th ACM international workshop on Video surveillance and sensornetworks. ACM,2006:171-178.
    [10]Aggarwal J K, Cai Q. Human motion analysis: A review[C]. Nonrigid and ArticulatedMotion Workshop,1997. Proceedings., IEEE. IEEE,1997:90-102.
    [11]Gondal I, Murshed M. On dynamic scene geometry for view-invariant actionmatching[C]//Computer Vision and Pattern Recognition (CVPR),2011IEEE Conferenceon. IEEE,2011:3305-3312.
    [12]Junejo I N, Dexter E, Laptev I, et al. View-independent action recognition from temporalself-similarities[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on,2011,33(1):172-185.
    [13]Liu J, Shah M, Kuipers B, et al. Cross-view action recognition via view knowledgetransfer[C]. Computer Vision and Pattern Recognition (CVPR),2011IEEE Conferenceon. IEEE,2011:3209-3216.
    [14]Weinland D, zuysal M, Fua P. Making action recognition robust to occlusions andviewpoint changes[J]. Computer Vision–ECCV2010,2010:635-648.
    [15]Weinland D, Ronfard R, Boyer E. Free viewpoint action recognition using motion historyvolumes[J]. Computer Vision and Image Understanding,2006,104(2):249-257.
    [16]Osentoski S, Manfredi V, Mahadevan S. Learning hierarchical models of activity[C].Intelligent Robots and Systems,2004.(IROS2004). Proceedings.2004IEEE/RSJInternational Conference on. IEEE,2004,1:891-896.
    [17]Huang K, Wang S, Tan T, et al. Human behavior analysis based on a new motiondescriptor[J]. Circuits and Systems for Video Technology, IEEE Transactions on,2009,19(12):1830-1840.
    [18]Huang K, Tao D, Yuan Y, et al. View-independent behavior analysis[J]. Systems, Man,and Cybernetics, Part B: Cybernetics, IEEE Transactions on,2009,39(4):1028-1035.
    [19]Blank M, Gorelick L, Shechtman E, et al. Actions as space-time shapes[C]. ComputerVision,2005. ICCV2005. Tenth IEEE International Conference on. IEEE,2005,2:1395-1402.
    [20]Nayar S K, Nene S A, Murase H. Subspace methods for robot vision[J]. Robotics andAutomation, IEEE Transactions on,1996,12(5):750-758.
    [21]Yan P, Khan S M, Shah M. Learning4d action feature models for arbitrary view actionrecognition[C]. Computer Vision and Pattern Recognition,2008. CVPR2008. IEEEConference on. IEEE,2008:1-7.
    [22]汤一平,何祖灵.动态图像理解技术在ATM智能监控中的应用[J].计算机测量与控制,2009(6):1110-1112.
    [23]梁英宏,王知衍,曹晓叶,等.视频图像理解在客流统计中的应用[J].计算机工程与设计,2008,29(5):1203-1206.
    [24]Wu X, Liang W, Jia Y. Action recognition feedback-based framework for human posereconstruction from monocular images[J]. Pattern Recognition Letters,2009,30(12):1077-1085.
    [25]Zhang X, Yang J, Liu Z, et al. Segmenting foreground from similarly coloredbackground[J]. Optical Engineering,2008,47(7):077002-077002-11.
    [26]Liu H, Chen X, Chen Y, et al. Double change detection method for moving-objectsegmentation based on clustering[C]. Circuits and Systems,2006. ISCAS2006.Proceedings.2006IEEE International Symposium on. IEEE,2006:4pp.
    [27]万成凯,袁保宗,苗振江.一种基于活动轮廓和Gauss背景模型的固定摄像机运动目标分割算法[J].中国科学(F辑:信息科学),2009,4:004.
    [28]Jia L, Zhenjiang M, Chengkai W. Markerless human body motion capture using multiplecameras[C]. Signal Processing,2008. ICSP2008.9th International Conference on. IEEE,2008:1469-1474.
    [29]贾程程,许相莉,周春光,等.基于链码的人体骨架建模[J].吉林大学学报:理学版,2010,48(004):641-645.
    [30]钱堃,马旭东,戴先中.基于抽象隐马尔可夫模型的运动行为识别方法[J].模式识别与人工智能,2009(3):433-439.
    [31]黄彬,田国会,李晓磊.利用轮廓特征识别人的日常行为[J].光电子.激光,2009,19(12):1686-1689.
    [32]黄天羽,石崇德,李凤霞,等.一种基于判别随机场模型的联机行为识别方法[J].计算机学报,2009,32(2):275.
    [33]杜友田,陈峰,徐文立.基于多层动态贝叶斯网络的人的行为多尺度分析及识别方法[J].自动化学报,2009,35(3):225-232.
    [34]Alpert S, Galun M, Basri R, et al. Image segmentation by probabilistic bottom-upaggregation and cue integration[C]. Computer Vision and Pattern Recognition,2007.CVPR'07. IEEE Conference on. IEEE,2007:1-8.
    [35]Perona P, Zelnik-Manor L. Self-tuning spectral clustering[J]. Advances in neuralinformation processing systems,2004,17:1601-1608.
    [36]Kolda T G, Bader B W. Tensor decompositions and applications[J]. SIAM review,2009,51(3):455-500.
    [37]Kim T K, Wong S F, Stenger B, et al. Incremental linear discriminant analysis usingsufficient spanning set approximations[C]. Computer Vision and Pattern Recognition,2007. CVPR'07. IEEE Conference on. IEEE,2007:1-8.
    [38]Wu X, Jia Y, Liang W. Incremental discriminant-analysis of canonical correlations foraction recognition[J]. Pattern Recognition,2010,43(12):4190-4197.
    [39]Zheng F, Shao L, Song Z. A set of co-occurrence matrices on the intrinsic manifold ofhuman silhouettes for action recognition[C]. In Proceedings of the ACM InternationalConference on Image and Video Retrieval. ACM,2010:454-461.
    [40]Zheng F, Shao L, Song Z. Eigen-space learning using semi-supervised diffusion maps forhuman action recognition[C]. In Proceedings of the ACM International Conference onImage and Video Retrieval.ACM,2010:151-157.
    [41]Shao L, Chen X. Histogram of body poses and spectral regression discriminant analysisfor human action categorization[C]. British Machine Vision Conference (BMVC),Aberystwyth, UK.2010,4.
    [42]Li X, Lin S, Yan S, et al. Discriminant locally linear embedding with high-order tensordata[J]. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on,2008,38(2):342-352.
    [43]Turk M A, Pentland A P. Face recognition using eigenfaces[C]. Computer Vision andPattern Recognition,1991. Proceedings CVPR'91., IEEE Computer Society Conferenceon. IEEE,1991:586-591.
    [44]Handbook of face recognition[M]. Springer,2011.
    [45]Belhumeur P N, Hespanha J P, Kriegman D J. Eigenfaces vs. fisherfaces: Recognitionusing class specific linear projection[J]. Pattern Analysis and Machine Intelligence, IEEETransactions on,1997,19(7):711-720.
    [46]Fraley C, Raftery A E. Model-based clustering, discriminant analysis, and densityestimation[J]. Journal of the American StatisticalAssociation,2002,97(458):611-631.
    [47]Shakunaga T, Shigenari K. Decomposed eigenface for face recognition under variouslighting conditions[C]. Computer Vision and Pattern Recognition,2001. CVPR2001.Proceedings of the2001IEEE Computer Society Conference on. IEEE,2001,1:I-864-I-871vol.1.
    [48]Tenenbaum J B, Freeman W T. Separating style and content with bilinear models[J].Neural computation,2000,12(6):1247-1283.
    [49]Bissacco A, Chiuso A, Ma Y, et al. Recognition of human gaits[C]. Computer Vision andPattern Recognition,2001. CVPR2001. Proceedings of the2001IEEE Computer SocietyConference on. IEEE,2001,2: II-52-II-57vol.2.
    [50]Bach F R, Jordan M I. A probabilistic interpretation of canonical correlation analysis[J].2005.
    [51]Kim T K, Kittler J, Cipolla R. Discriminative learning and recognition of image setclasses using canonical correlations[J]. Pattern Analysis and Machine Intelligence, IEEETransactions on,2007,29(6):1005-1018.
    [52]Vasilescu M A O, Terzopoulos D. Multilinear subspace analysis of image ensembles[C].Computer Vision and Pattern Recognition,2003. Proceedings.2003IEEE ComputerSociety Conference on. IEEE,2003,2: II-93-9vol.2.
    [53]Tao D, Li X, Wu X, et al. Tensor rank one discriminant analysis—a convergent methodfor discriminative multilinear subspace selection[J]. Neurocomputing,2008,71(10):1866-1882.
    [54]Wen J, Gao X, Yuan Y, et al. Incremental tensor biased discriminant analysis: A newcolor-based visual tracking method[J]. Neurocomputing,2010,73(4):827-839.
    [55]Geng X, Smith-Miles K, Zhou Z H, et al. Face image modeling by multilinear subspaceanalysis with missing values[J]. Systems, Man, and Cybernetics, Part B: Cybernetics,IEEE Transactions on,2011,41(3):881-892.
    [56]Nie F, Xiang S, Song Y, et al. Extracting the optimal dimensionality for local tensordiscriminant analysis[J]. Pattern Recognition,2009,42(1):105-114.
    [57]Lu H, Plataniotis K N, Venetsanopoulos A N. A survey of multilinear subspace learningfor tensor data[J]. Pattern Recognition,2011,44(7):1540-1551.
    [58]Lu H, Plataniotis K N, Venetsanopoulos A N. Uncorrelated multilinear discriminantanalysis with regularization and aggregation for tensor object recognition[J]. NeuralNetworks, IEEE Transactions on,2009,20(1):103-123.
    [59]De Lathauwer L, De Moor B, Vandewalle J. A multilinear singular valuedecomposition[J]. SIAM Journal on Matrix Analysis and Applications,2000,21(4):1253-1278.
    [60]Pers J, Vuckovic G, Dezman B, et al. Scale-based human motion representation for actionrecognition[C]. Image and Signal Processing and Analysis,2003. ISPA2003. InProceedings of the3rd International Symposium on. IEEE,2003,2:668-673.
    [61]Jia C, Wang S, Xu X, et al. Tensor analysis and multi-scale features based multi-viewhuman action recognition[C]. Computer Engineering and Technology (ICCET),20102ndInternational Conference on. IEEE,2010,4: V4-60-V4-64.
    [62]Yan S, Xu D, Yang Q, et al. Multilinear discriminant analysis for face recognition[J].Image Processing, IEEE Transactions on,2007,16(1):212-220.
    [63]Peng B, Qian G, Ma Y. Recognizing body poses using multilinear analysis andsemi-supervised learning[J]. Pattern Recognition Letters,2009,30(14):1289-1294.
    [64]Vasilescu M A O, Terzopoulos D. Multilinear independent components analysis[C].Computer Vision and Pattern Recognition,2005. CVPR2005. IEEE Computer SocietyConference on. IEEE,2005,1:547-553.
    [65]Pang S, Ozawa S, Kasabov N. Incremental linear discriminant analysis for classificationof data streams[J]. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEETransactions on,2005,35(5):905-914.
    [66]Hardoon D R, Szedmak S, Shawe-Taylor J. Canonical correlation analysis: An overviewwith application to learning methods[J]. Neural Computation,2004,16(12):2639-2664.
    [67]Kim T K, Cipolla R. Canonical correlation analysis of video volume tensors for actioncategorization and detection[J]. Pattern Analysis and Machine Intelligence, IEEETransactions on,2009,31(8):1415-1428.
    [68]Keinosuke Fukunaga. Introduction to statistical pattern recognition[M]. Academic Pr,1990.
    [69]Lykou A, Whittaker J. Sparse CCA using a Lasso with positivity constraints[J].Computational Statistics&DataAnalysis,2010,54(12):3144-3157.
    [70]Everingham M, Van Gool L, Williams C K I, et al. The pascal visual object classes (voc)challenge[J]. International journal of computer vision,2010,88(2):303-338.
    [71]A. Mart nez and R. Benavente. The AR face database. Univ. Purdue, CVC Tech. Rep, vol.24,1998.
    [72]Li H, Jiang T, Zhang K. Efficient and robust feature extraction by maximum margincriterion[J]. Neural Networks, IEEE Transactions on,2006,17(1):157-165.
    [73]Hong J H, Cho S B. Multi-class cancer classification with OVR-support vector machinesselected by naive Bayes classifier[C]. Neural Information Processing. SpringerBerlin/Heidelberg,2006:155-164.
    [74]Blank M, Gorelick L, Shechtman E, et al. Actions as space-time shapes[C]. ComputerVision,2005. ICCV2005. Tenth IEEE International Conference on. IEEE,2005,2:1395-1402.
    [75]Schuldt C, Laptev I, Caputo B. Recognizing human actions: A local SVM approach[C].Pattern Recognition,2004. ICPR2004. In Proceedings of the17th InternationalConference on. IEEE,2004,3:32-36.
    [76]Zou H, Hastie T, Tibshirani R. Sparse principal component analysis[J]. Journal ofcomputational and graphical statistics,2006,15(2):265-286.
    [77]Du Y, Chen F, Xu W. Human interaction representation and recognition through motiondecomposition[J]. Signal Processing Letters, IEEE,2007,14(12):952-955.
    [78]梅雪,胡石,张继法,等.多尺度特征的双层隐马尔可夫模型及其在行为识别中的应用[J].智能系统学报,2012,7(6):1-6.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700