160 likes | 290 Views
Univ. of Texas at San Antonio. Human Action Recognition. Hong Lin. Univ. of Texas at San Antonio. Outline. Experimental Results: Image Sequence Tagging. Experimental Results: Image Classification. Research Background Method Experiment Current work.
E N D
Univ. of Texas at San Antonio Human Action Recognition Hong Lin
Univ. of Texas at San Antonio Outline Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Research Background • Method • Experiment • Current work
Univ. of Texas at San Antonio Research Background Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Human action recognition: • automatically analyze ongoing activities from an unknown video; • essential for visual surveillance, human computer interaction, video retrieval, et al. • Two categories methods: • Single-view: high variation of appearances, shapes; potential occlusions; • Multi-view: difficulties in correlation discovery among multiple views;
Univ. of Texas at San Antonio Research Background: Roughly, we divide activity recognition techniques under single view into two categories: • Model-based methods[1][2] rely on human body tracking or pose estimationin order to model the dynamics of individual body parts for action recognition • Appearance-based methods[3][4] employ appearance features for action recognition 1. global space-time shape templates 2. local spatiotemporal interest points [1] C. Fanti, L. Zelnik-manor, and P. Perona, “Hybrid models for human motion recognition,” inProc. IEEE CVPR, pp. 1166–1173 Jun. 2005. [2] A. Yilmaz, “Recognizing human actions in videos acquired by uncalibrated moving cameras,” inProc. IEEE ICCV, pp. 150–157 Oct. 2005. [3] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 12, pp. 2247–2253, Dec. 2007. [4] C. Schuldt, I. Laptev, and B. Caputo, “Recognizing human actions: A local SVM approach,” in Proc. ICPR, pp. 32–36, Aug. 2004.
Univ. of Texas at San Antonio Research Background: Single-view Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Local space-time feature-based methods: • Advantage: Capture local salient characteristics of appearance and motion; Robust to spatiotemporal shifts and scales, background clutter and multiple motions; • Framework of local space-time feature-based method: Local space–time feature extraction: 1.Detector: select spatio-temporal interest points in video by maximizing specific saliency functions 2.Descriptor: capture shape and motion in the neighborhoods of selected points using image measurements
Univ. of Texas at San Antonio Research Background: Single-view Experimental Results: Image Sequence Tagging Experimental Results: Image Classification BoW+SVM framework: • Bag-of-Words (BoW) [5] : Space-time interest point features are quantized into visual words; A video is then represented as the frequency histogram over the visual words. • SVM classification for modeling and recognition • Wang et al. [6] gave a comprehensive evaluation of the popular local feature detectors and descriptors for the standard BoW+SVM framework Reference: [5]A. Klaser, M. Marszalek, C. Schmid, A spatio-temporal descriptor based on 3d-gradients, in: BMVC’08, 2008. [6]H. Wang, M. M. Ullah, A. Klaser, I. Laptev, C. Schmid, Evaluation of local spatio-temporal features for action recognition, in: BMVC’09, 2009.
Univ. of Texas at San Antonio Method Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Motivation: information about the structure of human body
Univ. of Texas at San Antonio Method Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Partwise BoW + Graph-based Multi-task Learning • Partwise BoW representation Discover the information about human body structure • Multi-task Learning Discover the latent correlation among part-wise visual features Single-taskclassification Part-induced multi-task classification
Univ. of Texas at San Antonio Method: Partwise BoW + graph-based MTL Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Partwise bag-of-word (PBoW) representation • Local space–time feature extraction: Harris3D, HoG/HoF • Body part localization: part model, skeleton information • PBoW generation: 7 Components of PBoW: Level 0: limb-wise BoW head-wise BoW leg-wise BoW foot-wise BoW Level 1: upper body-wise BoW lower body-wise BoW Level 2: full body-wise BoW Part model & Skeleton
Univ. of Texas at San Antonio Method: Partwise BoW + graph-based MTL Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Graph-based Multi-task Learning (GMTL) • Objective: Covert individual BoW-based single-task learning into joint multiple components of PBoW-based multi-task learning • Formulation: To encode the reasonable latent relatedness between part-wise features.
Univ. of Texas at San Antonio Experiment Result Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Evaluation on KTH • KTH Dataset: • 6 kind of actions: • Each of the 6 actions was performed four times by 25 subjects in 4 different scenarios. • All videos were with a static camera with 25fps frame rate. The sequences were down sampled to the spatial resolution of 160x120pixels and have a length of four seconds in average.
Univ. of Texas at San Antonio Experiment Result Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Evaluation on KTH • Baseline: BoW+SVM Implement the standard framework of BoW+SVM ( kernal) on KTH The best accuracies is 91.0% with 4000-D codebook. The worse results were obtained with 100-D. Reference: An-An Liu, Yuting Su, Hong Lin, et al., “Single/Multi-view Human Action Recognition via Regularized Multi-Task Learning”, Neurocomputing, 2013 (under review).
Univ. of Texas at San Antonio Experiment Result Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Evaluation on MV-TJU Further, we implemented the framework of BoW+SVM ( kernal) with individual part-wise BoW features. The performances show that we can achieve the competitive performance(89.0%) with only 100-D partwise BoW against the best one (91.0%) by 4000-D feature in the standard BoW+SVM framework. Reference: [7]An-An Liu, Yuting Su, Hong Lin, et al., “Single/Multi-view Human Action Recognition via Regularized Multi-Task Learning”, Neurocomputing, 2013 (under review).
Univ. of Texas at San Antonio Experiment Result Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Evaluation on KTH • Performance by PBoW (100D) +GMTL Depending on the human body structure, we implemented seven kinds of graph structures to formulated the 3 levels part-wise BoW features into one multi-task learning problem, hope to encode reasonable latent relatedness between part-wise features
Univ. of Texas at San Antonio Experiment Result Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Evaluation on MV-TJU • Performance by PBoW (100D) +GMTL Analysis: • The graph penalty can further facilitate common knowledge discovery by MTL • The overall accuracies by MTL with graph structure R6 is promising. • R6 structure is important for effective relatedness transferring
Univ. of Texas at San Antonio Experimental Results: Image Sequence Tagging Experimental Results: Image Classification Thank you! Hong Lin