170 likes | 398 Views
CRF-based Activity Recognition on Manifolds. Presented by Arshad Jamal and Prakhar Banga. Introduction.
E N D
CRF-based Activity Recognition on Manifolds Presented by Arshad Jamal and Prakhar Banga
Introduction • Objective: To explore the STIP based activity analysis by finding a manifold structure for the HoG-HOF descriptor to reduce the dimension of the data and learn a hCRF based discriminative classifier to classify actions • Challenges: Huge diversity in the data (view-points, appearance, motion, lighting etc.) • Applications: video surveillance, video indexing and retrieval and human computer interaction
Related Works • Various variants of spatio-temporal interest points (STIPs) based approaches exist • Generative models like Bayesian Networks to model key pose changes Discriminative models like a CRF network to model the temporal dynamics of silhouettes based features Learning and Matching of Dynamic Shape Manifolds for Human Action Recognition
Proposed Approach Labeled Training Dataset STIP Detector & Descriptor Manifold Learning (LPP) Learn CRF Classifier STIP Detector & Descriptor Dimensionality reduction Classifier Test Video Action Class
Algorithm Details: STIP Detector • Looks for distinctive neighborhood in the video • High image variation in space and time • Describe it using distribution of gradient and optical flow Where Lij is Any (x, y, t) location in the video is STIP if I. Laptev. On Space-Time Interest Points. IJCV, 2005
Algorithm Details: STIP Descriptor • Small spatio-temporal neighborhood extracted • Divided into 3x3x2 tiles I. Laptev. On Space-Time Interest Points. IJCV, 2005
Dimensionality Reduction A broad classification • Linear: • Cannot capture inherent non-linearity in the manifold • Non-linear • May not be defined everywhere • Do not preserve neighborhood
Locality preserving projections (LPP) Concentrates on preserving locality rather than minimizing the Least Square Error Capable of learning the non-linear manifold structure as optimally as possible
Algorithm Details: HCRF based Classifier Output y • HCRF is a discriminative classifier conditioned globally on all the observations • Model parametersare foundby maximizing the conditional log likelihood on the labelled training data • Flexible configuration and connectivity of the Hidden variables Hidden states h1 h2 hm x1 x2 xm Observations Wang, Mori NIPS 2008
Algorithm Details: HCRF based Classifier Given a sequence of STIPs x and it’s class label We wish to find p(y/x) Wang, Mori NIPS 2008
Current Status • Datasets: • KTH: 600 videos of 6 actions by 25 actors in 4 scenarios • UCF-50 dataset • Features: 3D-Harris corner as STIP, 162-dim (72+90) HoG-HoF descriptors computed for the two datasets • Script based approach to add new datasets • Dimensionality reduction using LPP • Learn a low dimensional manifold using all the STIPs obtained from the full dataset • Project the data onto the learned manifold • HCRF model is learned using the training dataset • Initial results obtained
Result: LPP output ~1.2Lac STIPs collected from all action classes in KTH dataset 3-dimensions Plotted for visualization
To be done… • Testing the classifier for different dataset • Compiling the action classification results for different datasets • Code debugging
References • I. Laptev. On Space-Time Interest Points. IJCV, 2005 • I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In CVPR, 2008 • F. Lv and R. Nevatia. Single view human action recognition using key pose matching and viterbi path searching. In CVPR, 2007 • S. Wang, A. Quattoni, L.-P. Morency, D. Demirdjian, and T. Darrell. Hidden Conditional Random Fields for Gesture Recognition. In CVPR, 2006 • L.-P. Morency, A. Quattoni, and T. Darrell. Latent-Dynamic Discriminative Models for Continuous Gesture recognition. In CVPR, June 2007 • A. Quattoni, S. Wang, L.-P. Morency, M. Collins, and T. Darrell. Hidden conditional random fields. PAMI, Oct. 2007 • C. Sminchisescu. Selection and context for action recognition. In ICCV, 2009 • J. Sun, X. Wu, S. Yan, L. Cheong, T. Chua, and J. Li. Hierarchical spatio-temporal context modelling for action recognition. In CVPR, 2009 • L Wang, D Suter, Learning and Matching of Dynamic Shape Manifolds for Human Action Recognition, IEEE TIP, 2007 Thank You