Making Action Recognition Robust to Occlusions and Viewpoint Changes

Making Action Recognition Robust to Occlusions and Viewpoint Changes Daniel Weinland Mustafa Özuysal Pascal Fua European Conference on Computer Vision, 2010

Outline • Introduction • Block Embedding of 3DHOG • Classifier Combination • Experiments • Conclusion

Introduction • Most state-of-the-art approaches achieve nearly perfect results on KTH and Weizmann datasets • Subjects are seen from similar viewpoints • Uniform backgrounds • The motions in the test and training set look very similar • A novel approach is proposed to provide robustness to both occlusions and viewpoint changes

Introduction • System Overview

Block Embedding of 3DHOG • Histograms of Oriented Gradient(HOG) [2] [2] Dalal N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: CVPR (2005)

Block Embedding of 3DHOG • Densely distributed locations within a ROI • 3DHOG [11] as feature descriptor [11] Kläser A., Marszałek, M, Schmid, C.: A Spatio-Temporal Descriptor Based on 3D-Gradients. In: BMVC (2008)

Block Embedding of 3DHOG • Block descriptor • A normalized histogram with clipping applied • Sequence of blocks • : number of overlapping blocks that fit within the duration of the sequence • : number of blocks that fit within the ROI

Block Embedding of 3DHOG • Block Embedding • Create a set of prototype block descriptors • Randomly sampling from training subsequencies • Create a -dimensional vector for each • The dimensionis the -distance from to the closest block in

Classifier Combination • Use L2-regularized logistic regression as local classifier • Estimate for each class • : learned logistic repression weights at position • action classes with represents the probability of region being occluded • Four combination schemes • Product Rule • Sum Rule • Weighted Rule • Hierarchical

Classifier Combination • Product Rule • Sum Rule • Weighted Rule • Hierarchical • Combine the outputs of all local classifiers into a single feature vector and learned a SVM on top of this representation

Experiments • Datasets • Weizmann, KTH, UCF, and IXMAS • Baseline methods • Global SVM • BoW SVM • Parameters • ROI is scaled and concatenated to produce a 48×64×t cube • 6 oriented bins for 3DHOG • 500 prototypes for embedding

Experiments • Weizmann, KTH, and UCF

Experiments • IXMAS

Experiments • IXMAS testing training

Experiments • IXMAS

Conclusion • The authors proposed a new approach based on a local 3DHOG descriptor • The authors demonstrated that the proposed descriptor can perform action recognition from multiple viewpoints and can against partial occlusions • Compared to our work

Making Action Recognition Robust to Occlusions and Viewpoint Changes

Making Action Recognition Robust to Occlusions and Viewpoint Changes

Presentation Transcript

Robust Speech recognition

Action Recognition

Action Recognition

Action Recognition

Action Recognition

Action Recognition

Action Recognition Robust to Occlusion U sing an Efficient Part-Based Approach

Action Recognition

Action Recognition

Human Action Recognition

Making Changes

Action Recognition

Robust Activity Recognition

An Introduction to Action Recognition/Detection

Robust Speaker Recognition

Action Recognition

Cross- Modal Object Recognition is Viewpoint Independent

RGB-D object recognition and localization with clutter and occlusions

Making Changes

Weakly Supervised Action Recognition

Making Connections and Making Changes