Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest Tsz-Ho Yu, Tae-Kyun Kim and Roberto Cipolla Machine Intelligence Laboratory, Engineering Department, University of Cambridge

Introduction and Motivations • A novel real-time solution for action recognition • utilises local-appearance and structural information. Main features / major contributions: Continuous / frame-by-frame recognition Real-time feature extraction and classification Pyramidal spatiotemporal relationship match (PSRM) Main objective: efficiency

A short demo Please visit: “http://www.youtube.com/watch?v=eD5b8d7hV6E” on the Internet for the full demo video.

Related Work • Many current methods focus on:[Schuldt et al. ICPR2004, Niebles et al. BMVC06, Ryoo and Aggarwal ICCV09, Willems BMVC09, Riemenschneider et al. BMVC09] • Some achieve high accuracies, but take a long time to recognise • How can we improve efficiency? • Can we improve codebook learning and feature matching? Accuracy Action representation model (Feature design)

Related Work • Vector quantisation by random forest [Moosmann et al. ECCV06] • For image segmentation [Shotton et al. CVPR08] • Can we apply it in video analysis? • Pyramid match kernel [Graumann and Darrell. ICCV05] • Image recognition [Graumann and Darrell. ICCV05] , scene classification[Lazebnik et al. CVPR06],etc. • Spatiotemporal relationship match [Ryoo and Aggarwal ICCV09] Moosmann NIPS2006 Graumann and Darrell.ICCV05 • S. Lazebnik C. Schmid J. Ponce “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories” , CVPR 2006 • K. Grauman and T. Darrell “The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features” ICCV2005 • F. Moosmann, B. Triggs, and F. Jurie. “Fast discriminative visual codebooks using randomized clustering forests” NIPS2006 • J. Shotton, M. Johnson, and R. Cipolla. “Semantic texton forests for image categorization and segmentation” CVPR2008 • M. S. Ryoo and J. K. Aggarwal. “Spatio-temporal relationship match: Video structure comparison for recognition of copmlex human activities” ICCV2009 Ryoo and Aggarwal ICCV09

Our Contributions • Our contribution is three-fold: Spatiotemporal Texton Forest Image segmentation(2D) → Action recognition (3D) • SRM → PSRM: pyramidal spatiotemporal relationship match 1. V-FAST corner detector 2. Random forest classifiers 3. Continuous action recognition

Comparison with existing approaches Typical Approaches Our Method K-means Clustering Semantic Texton Forest Feature Encoding Efficient Slow for Large Codebook Robust Feature Matching The “Bag of Words” (BOW) Model PSRM Structural Information Lacks Structural Information Quantisation Error Hierarchical Matching

Overview Spatiotemporal Semantic Texton Forest PSRM K-means Forest Results V-FAST Corner Spatio-temporal Cuboids BOST Random Forest Classifier

Feature detection Feature detection Spatiotemporal Semantic Texton Forest PSRM K-means Forest Results V-FAST Corner Spatio-temporal Cuboids BOST Random Forest Classifier

V-FAST: Spatiotemporal Feature Detection • A novel spatiotemporal interest point detector • Inspired from FAST [Rosten and Drummond ECCV2006] • A cascade of three FAST detectors. • Consider three orthogonal Bensenham circles • Features: • Very fast! E. Rosten and T. Drummond. “Machine learning for high-speed corner detection” ECCV 2006

Feature extraction Feature extraction Spatiotemporal Semantic Texton Forest PSRM K-means Forest Results V-FAST Corner Spatio-temporal Cuboids BOST Random Forest Classifier

Building a codebook using STF • Extract small video cuboids at detected keypoints • Visual codebook using STF:

Feature extraction Feature matching Spatiotemporal Semantic Texton Forest PSRM K-means Forest Results V-FAST Corner Spatio-temporal Cuboids BOST Random Forest Classifier

Pyramidal Spatiotemporal Relationship Match (PSRM) A set of “rules” (in different colours) are designed to describe spatiotemporal structure of features.

Pyramidal Spatiotemporal Relationship Match (PSRM) TREE N TREE N

Pyramidal Spatiotemporal Relationship Match (PSRM) Typical pyramid match kernel Ajacent bins are merged Our Pyramid Match Kernel Children are merged to parents

Pyramidal Spatiotemporal Relationship Match (PSRM) Pyramid Match Kernel (PMK) Multiple Structural Relationship Histograms

Continuous action recognition Our Approach Classification Classification Classification Classification Classification Classification Classification Classification Classification Features Features Features Features Features Features Features Features Features Features Classification Typical Methods

Classification Classification! Spatiotemporal Semantic Texton Forest PSRM K-means Forest Results V-FAST Corner Spatio-temporal Cuboids BOST Random Forest Classifier

Combined Classification • PSRM and BOST (bag of spatiotemporal textons) are classified indenpendently: • PSRM: k-means forest Data points are clustered using k-means at root For each cluster, perform another k-means recursively At each terminal cluster , a posterior prob. dist. Is assigned M.Muja and D. G. Lowe. “Fast approximate nearest neighbors with automatic algorithm” VISAPP2009 K-means tree figure courtesy of David Aldavert Miró : http://www.cvc.uab.cat/~aldavert/plor/

Experiments • Short video sequences (50 frames ~ 2 seconds) are extracted from the input video. • Sampling frequency is 5 frames for experiment and 1 frame for the laptop demo. (so it is a frame-by-frame recognition) • Two datsets are used for performance evaluation: UT interaction dataset KTH dataset

Experiments: Results (KTH dataset) snippet: subsequence level recognition • Comparable to most state-of-the-art. • Around ~3% slower than the top performer • Is it a sensible trade-off? • Useful for many more practical applications. (surveillance, robotics, etc.) sequence: major voting of subsequence labels leave-of-out-cross-validation Leave-of-out-cross-validation

Experiments: Results • Results: UT interaction dataset • Run time performance ~20% performance improved by simply combining the class labels! PSRM and BOST gave low accuracies when applied separately. Can be further optimised (e.g. GPU, mult-core processing) < 25 fps, but enough for most real-time applications

Demo video • Frame-level recognition • Potential improvement: • Delay (~1s) in recognition results (Depends on the subsequence length ) • Please visit: “http://www.youtube.com/watch?v=eD5b8d7hV6E” on the Internet for the full demo video.

Conclusions

THANK YOU VERY MUCH THE END

Extra slide • Formulation of V-FAST

Extra slide • Formulation of STF • Split function model: • Split criteria --- Information gain:

Extra slide • Formulation of STF

Extra slide • Formulation of PSRM • Step 1 Feature matching: • Step 2 Semantic PMK over histogram

Extra slide • Formulation of Classifier training • Optimising the clusters of feature which maximise the PMK with the mean.

Extra slide • Experiment parameters

Extra slide • Confusion matrix:

Extra slide PSRM BOST Kernel k-means forest Random forest Weighted combination Action recognition results (class labels)

Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Presentation Transcript

Real-Time Action Tracking System (RATS)

Human Action Recognition by Learning Bases of Action Attributes and Parts

Human Action Recognition by Learning Bases of Action Attributes and Parts

Rendering Forest Scenes in Real-Time

Real-Time Facial Recognition

Action Recognition

Action Recognition

Action Recognition

Action Recognition

Action Recognition

Action Recognition

Human Action Recognition by Learning Bases of Action Attributes and Parts

Action Recognition

Action Recognition

Demonstrate Real-Time HRM Pattern Recognition

Real Time Integral-Based Structural Health Monitoring

Real-Time Speech Recognition

Action Recognition

Human Action Recognition by Learning Bases of Action Attributes and Parts

Spatiotemporal and Thematic Semantic Analytics

Real-time Action Tracking System (RATS)

Real-time Activity Recognition using Smartphone Accelerometer