Temporal Order-Preserving Dynamic Quantization for Human Action Recognition

Temporal Order-Preserving Dynamic Quantization for Human Action Recognition from Multimodal Sensor StreamsJun Ye Kai LiGuo-Jun Qi Kien A. HuaUniversity of Central Florida

Outline • Background • Problem, existing methods, challenges • Our algorithm • Dynamic Temporal Quantization • Multimodal Feature Fusion • Performance study • MSR-Action3D • UTKinect-Action • MSR-ActionPairs • Conclusions

Background • Depth sensors becomes affordable and popular • New human-computer interaction • Gesture recognition • Speech recognition • Application domain • Video games, education, business, healthcare

Problem and Challenges • Key problem: modeling the temporal dynamics of 3D human action/gestures • Existing methods • Histogram-based methods do not preserve order (bag-of-3d-words [5, 21], HOJ3D[16], HON4D[9] ) • Temporal modeling suffer from video misalignment (motion template[7,20], temporal pyramid[9,14]) • Challenge: temporal misalignment due to • Temporal translation • Execution rate variation

Objective • Modeling the temporal patterns of 3D actions according to the transition of sub-actions satisfying • Frames with similar postures are clustered together (sub-action constraint) • Temporal order of the sequence must be preserved (order-preserving) • Dynamic Temporal Quantization Algorithm

Dynamic Temporal Quantization • Quantization: videos X1,X2,… Xn of varied length n quantizedvector V1,V2,…Vm of fixed length m. • Optimalframeassignmenta • Objective function: • Optimal quantization can be obtained by jointly optimizing a and V

Dynamic Temporal Quantization(cont’d) • Nontrivial tojointlysolvetheframeassignmenta • Initialization:uniform partition • Aggregationstep:givenfixedassignmenta,vjis computed by the aggregation • Assignmentstep: fixed the quantized vector V, update the assignment a by DTW • Iterateuntilconvergence.

Hierarchical representation • MultilayersoftheDynamicQuantization • Toplayers:globaltemporalpatterns • Bottomlayers:localtemporalpatterns • Concatenate all layers

Multimodel Feature Fusion • Multimodalfeatures: • joint coordinate • pairwise angle • joint offset [21] • histogram of velocity components (HVC) • Supervised learning for all quantized vectors • Multiclass SVM • Fusion by regression (softmax)

Experiments • Experiments on three public 3D human action datasets • MSR-Action3D • UTKinect-Action • MSR-ActionPairs

Dynamic quantization outperforms deterministic quantization. Experiment: dynamic quantization VS deterministic quantization MSR-Action3D dataset Similar performances can be observed in the other two datasets.

Experiment: hierarchical representation MSR-Action3D dataset with the joint coordinate feature More layers generally produce higher accuracy though need to take care of the overfitting.

Experiment: Comparison with state-of-the-art results MSR-Action3D dataset MSR-ActionPairs dataset UTKinect-Action dataset (100% accuracy)

Conclusions • A novel algorithm for 3D human action sequence recognition from the perspective of dynamic temporal quantization. • Extensive experiments on three public datasets demonstrate the effectiveness of the proposed technique for temporal modeling.

Thank you. Questions?

Temporal Order-Preserving Dynamic Quantization for Human Action Recognition

Temporal Order-Preserving Dynamic Quantization for Human Action Recognition

Presentation Transcript

Outline

Outline

Outline

Outline

Outline

Outline

Outline

outline

outline

OUTLINE

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline:

Outline

Outline

OUTLINE: