340 likes | 519 Views
Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest. Tsz -Ho Yu, Tae-Kyun Kim and Roberto Cipolla. Machine Intelligence Laboratory, Engineering Department, University of Cambridge. Introduction and Motivations. A novel real-time solution for action recognition
E N D
Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest Tsz-Ho Yu, Tae-Kyun Kim and Roberto Cipolla Machine Intelligence Laboratory, Engineering Department, University of Cambridge
Introduction and Motivations • A novel real-time solution for action recognition • utilises local-appearance and structural information. Main features / major contributions: Continuous / frame-by-frame recognition Real-time feature extraction and classification Pyramidal spatiotemporal relationship match (PSRM) Main objective: efficiency
A short demo Please visit: “http://www.youtube.com/watch?v=eD5b8d7hV6E” on the Internet for the full demo video.
Related Work • Many current methods focus on:[Schuldt et al. ICPR2004, Niebles et al. BMVC06, Ryoo and Aggarwal ICCV09, Willems BMVC09, Riemenschneider et al. BMVC09] • Some achieve high accuracies, but take a long time to recognise • How can we improve efficiency? • Can we improve codebook learning and feature matching? Accuracy Action representation model (Feature design)
Related Work • Vector quantisation by random forest [Moosmann et al. ECCV06] • For image segmentation [Shotton et al. CVPR08] • Can we apply it in video analysis? • Pyramid match kernel [Graumann and Darrell. ICCV05] • Image recognition [Graumann and Darrell. ICCV05] , scene classification[Lazebnik et al. CVPR06],etc. • Spatiotemporal relationship match [Ryoo and Aggarwal ICCV09] Moosmann NIPS2006 Graumann and Darrell.ICCV05 • S. Lazebnik C. Schmid J. Ponce “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories” , CVPR 2006 • K. Grauman and T. Darrell “The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features” ICCV2005 • F. Moosmann, B. Triggs, and F. Jurie. “Fast discriminative visual codebooks using randomized clustering forests” NIPS2006 • J. Shotton, M. Johnson, and R. Cipolla. “Semantic texton forests for image categorization and segmentation” CVPR2008 • M. S. Ryoo and J. K. Aggarwal. “Spatio-temporal relationship match: Video structure comparison for recognition of copmlex human activities” ICCV2009 Ryoo and Aggarwal ICCV09
Our Contributions • Our contribution is three-fold: Spatiotemporal Texton Forest Image segmentation(2D) → Action recognition (3D) • SRM → PSRM: pyramidal spatiotemporal relationship match 1. V-FAST corner detector 2. Random forest classifiers 3. Continuous action recognition
Comparison with existing approaches Typical Approaches Our Method K-means Clustering Semantic Texton Forest Feature Encoding Efficient Slow for Large Codebook Robust Feature Matching The “Bag of Words” (BOW) Model PSRM Structural Information Lacks Structural Information Quantisation Error Hierarchical Matching
Overview Spatiotemporal Semantic Texton Forest PSRM K-means Forest Results V-FAST Corner Spatio-temporal Cuboids BOST Random Forest Classifier
Feature detection Feature detection Spatiotemporal Semantic Texton Forest PSRM K-means Forest Results V-FAST Corner Spatio-temporal Cuboids BOST Random Forest Classifier
V-FAST: Spatiotemporal Feature Detection • A novel spatiotemporal interest point detector • Inspired from FAST [Rosten and Drummond ECCV2006] • A cascade of three FAST detectors. • Consider three orthogonal Bensenham circles • Features: • Very fast! E. Rosten and T. Drummond. “Machine learning for high-speed corner detection” ECCV 2006
Feature extraction Feature extraction Spatiotemporal Semantic Texton Forest PSRM K-means Forest Results V-FAST Corner Spatio-temporal Cuboids BOST Random Forest Classifier
Building a codebook using STF • Extract small video cuboids at detected keypoints • Visual codebook using STF:
Feature extraction Feature matching Spatiotemporal Semantic Texton Forest PSRM K-means Forest Results V-FAST Corner Spatio-temporal Cuboids BOST Random Forest Classifier
Pyramidal Spatiotemporal Relationship Match (PSRM) A set of “rules” (in different colours) are designed to describe spatiotemporal structure of features.
Pyramidal Spatiotemporal Relationship Match (PSRM) TREE N TREE N
Pyramidal Spatiotemporal Relationship Match (PSRM) Typical pyramid match kernel Ajacent bins are merged Our Pyramid Match Kernel Children are merged to parents
Pyramidal Spatiotemporal Relationship Match (PSRM) Pyramid Match Kernel (PMK) Multiple Structural Relationship Histograms
Continuous action recognition Our Approach Classification Classification Classification Classification Classification Classification Classification Classification Classification Features Features Features Features Features Features Features Features Features Features Classification Typical Methods
Classification Classification! Spatiotemporal Semantic Texton Forest PSRM K-means Forest Results V-FAST Corner Spatio-temporal Cuboids BOST Random Forest Classifier
Combined Classification • PSRM and BOST (bag of spatiotemporal textons) are classified indenpendently: • PSRM: k-means forest Data points are clustered using k-means at root For each cluster, perform another k-means recursively At each terminal cluster , a posterior prob. dist. Is assigned M.Muja and D. G. Lowe. “Fast approximate nearest neighbors with automatic algorithm” VISAPP2009 K-means tree figure courtesy of David Aldavert Miró : http://www.cvc.uab.cat/~aldavert/plor/
Experiments • Short video sequences (50 frames ~ 2 seconds) are extracted from the input video. • Sampling frequency is 5 frames for experiment and 1 frame for the laptop demo. (so it is a frame-by-frame recognition) • Two datsets are used for performance evaluation: UT interaction dataset KTH dataset
Experiments: Results (KTH dataset) snippet: subsequence level recognition • Comparable to most state-of-the-art. • Around ~3% slower than the top performer • Is it a sensible trade-off? • Useful for many more practical applications. (surveillance, robotics, etc.) sequence: major voting of subsequence labels leave-of-out-cross-validation Leave-of-out-cross-validation
Experiments: Results • Results: UT interaction dataset • Run time performance ~20% performance improved by simply combining the class labels! PSRM and BOST gave low accuracies when applied separately. Can be further optimised (e.g. GPU, mult-core processing) < 25 fps, but enough for most real-time applications
Demo video • Frame-level recognition • Potential improvement: • Delay (~1s) in recognition results (Depends on the subsequence length ) • Please visit: “http://www.youtube.com/watch?v=eD5b8d7hV6E” on the Internet for the full demo video.
THANK YOU VERY MUCH THE END
Extra slide • Formulation of V-FAST
Extra slide • Formulation of STF • Split function model: • Split criteria --- Information gain:
Extra slide • Formulation of STF
Extra slide • Formulation of PSRM • Step 1 Feature matching: • Step 2 Semantic PMK over histogram
Extra slide • Formulation of Classifier training • Optimising the clusters of feature which maximise the PMK with the mean.
Extra slide • Experiment parameters
Extra slide • Confusion matrix:
Extra slide PSRM BOST Kernel k-means forest Random forest Weighted combination Action recognition results (class labels)