150 likes | 163 Views
Proposing a novel algorithm for 3D human action recognition using dynamic temporal quantization and multimodal feature fusion. Our method outperforms existing techniques on public datasets.
E N D
Temporal Order-Preserving Dynamic Quantization for Human Action Recognition from Multimodal Sensor StreamsJun Ye Kai LiGuo-Jun Qi Kien A. HuaUniversity of Central Florida
Outline • Background • Problem, existing methods, challenges • Our algorithm • Dynamic Temporal Quantization • Multimodal Feature Fusion • Performance study • MSR-Action3D • UTKinect-Action • MSR-ActionPairs • Conclusions
Background • Depth sensors becomes affordable and popular • New human-computer interaction • Gesture recognition • Speech recognition • Application domain • Video games, education, business, healthcare
Problem and Challenges • Key problem: modeling the temporal dynamics of 3D human action/gestures • Existing methods • Histogram-based methods do not preserve order (bag-of-3d-words [5, 21], HOJ3D[16], HON4D[9] ) • Temporal modeling suffer from video misalignment (motion template[7,20], temporal pyramid[9,14]) • Challenge: temporal misalignment due to • Temporal translation • Execution rate variation
Objective • Modeling the temporal patterns of 3D actions according to the transition of sub-actions satisfying • Frames with similar postures are clustered together (sub-action constraint) • Temporal order of the sequence must be preserved (order-preserving) • Dynamic Temporal Quantization Algorithm
Dynamic Temporal Quantization • Quantization: videos X1,X2,… Xn of varied length n quantizedvector V1,V2,…Vm of fixed length m. • Optimalframeassignmenta • Objective function: • Optimal quantization can be obtained by jointly optimizing a and V
Dynamic Temporal Quantization(cont’d) • Nontrivial tojointlysolvetheframeassignmenta • Initialization:uniform partition • Aggregationstep:givenfixedassignmenta,vjis computed by the aggregation • Assignmentstep: fixed the quantized vector V, update the assignment a by DTW • Iterateuntilconvergence.
Hierarchical representation • MultilayersoftheDynamicQuantization • Toplayers:globaltemporalpatterns • Bottomlayers:localtemporalpatterns • Concatenate all layers
Multimodel Feature Fusion • Multimodalfeatures: • joint coordinate • pairwise angle • joint offset [21] • histogram of velocity components (HVC) • Supervised learning for all quantized vectors • Multiclass SVM • Fusion by regression (softmax)
Experiments • Experiments on three public 3D human action datasets • MSR-Action3D • UTKinect-Action • MSR-ActionPairs
Dynamic quantization outperforms deterministic quantization. Experiment: dynamic quantization VS deterministic quantization MSR-Action3D dataset Similar performances can be observed in the other two datasets.
Experiment: hierarchical representation MSR-Action3D dataset with the joint coordinate feature More layers generally produce higher accuracy though need to take care of the overfitting.
Experiment: Comparison with state-of-the-art results MSR-Action3D dataset MSR-ActionPairs dataset UTKinect-Action dataset (100% accuracy)
Conclusions • A novel algorithm for 3D human action sequence recognition from the perspective of dynamic temporal quantization. • Extensive experiments on three public datasets demonstrate the effectiveness of the proposed technique for temporal modeling.
Thank you. Questions?