450 likes | 644 Views
Hierarchical Part-Based Human Body Pose Estimation. * Ram anan Navaratnam * Arasanathan Thayananthan † Prof. Phil Torr * Prof. Roberto Cipolla. * University Of Cambridge † Oxford Brookes University. Introduction. Input. Introduction. Input. Output. Overview. Motivation
E N D
Hierarchical Part-Based Human Body Pose Estimation * Ramanan Navaratnam * Arasanathan Thayananthan † Prof. Phil Torr * Prof. Roberto Cipolla * University Of Cambridge † Oxford Brookes University
Introduction Input
Introduction Input Output
Overview • Motivation • Hierarchical parts • Template search • Pose estimation in a single frame • Temporal smoothing • Summary & Future work
Overview • Problem motivation ??? • Hierarchical parts • Template search • Pose estimation in a single frame • Temporal smoothing • Summary & Future work
Overview • Problem motivation ??? • Hierarchical parts • Template search • Pose estimation in a single frame • Temporal smoothing • Summary & Future work
Overview • Problem motivation ??? • Hierarchical parts • Template search • Pose estimation in a single frame • Temporal smoothing • Summary & Future work
Motivation • ‘Real-time Object Detection for Smart Vehicles’ – D. M. Gavrila & V. Philomin (ICCV 1999) • ‘Filtering using a tree-based estimator’ – Stenger et.al. (ICCV 2003)
Motivation • ‘Real-time Object Detection for Smart Vehicles’ – D. M. Gavrila & V. Philomin (ICCV 1999) • ‘Filtering using a tree-based estimator’ – Stenger et.al. (ICCV 2003) • Exponential increase of templates with dimensions
Motivation • ‘Pictorial Structures for Object Recognition’ – P. Felzenszwalb & D. Huttenlocher (IJCV 2005) • ‘Human upper body pose estimation in static images’ – M.W. Lee & I. Cohen (ECCV 2004)
Motivation • ‘Pictorial Structures for Object Recognition’ – P. Felzenszwalb & D. Huttenlocher (IJCV 2005) • ‘Human upper body pose estimation in static images’ – M.W. Lee & I. Cohen (ECCV 2004) • Part based approach • Assembling parts together is complex
Motivation • ‘Automatic Annotation of Everyday Movements’ – D. Ramanan & D. A. Forsyth (NIPS 2003) • ‘3-D model-based tracking of humans in action:a multi-view approach’ – D. M. Gavrila & L. S. Davis (CVPR 1996)
Motivation • ‘Automatic Annotation of Everyday Movements’ – D. Ramanan & D. A. Forsyth (NIPS 2003) • ‘3-D model-based tracking of humans in action:a multi-view approach’ – D. M. Gavrila & L. S. Davis (CVPR 1996) • ‘State space decomposition’
Hierarchical Parts Conditional prior p(xi/xparent(i)) Spatial dimensions (translation) Joint Angles
Hierarchical Parts Head and torso Upper arm Lower Arm True Positive False Positive
Hierarchical Parts Part Detections Head and torso 56 61 Detection Threshold = 0.81
Hierarchical Parts Part Detections Head and torso 56 61 Lower arm 13 199 44 993 Detection Threshold = 0.81
Template Search • Features • Chamfer distance • Appearance
Template Search • Features • Chamfer distance • Appearance
Template Search • Features • Chamfer distance • Appearance
Template Search • Features • Chamfer distance • Appearance
Template Search • Features • Chamfer distance • Appearance
Template Search • Features • Chamfer distance • Appearance
Template Search • Features • Chamfer distance • Appearance
Template Search • Features • Chamfer distance • Appearance
Template Search • Features • Chamfer distance • Appearance
Template Search • Learning Appearance • Match ‘T’ pose based on edge likelihood only in initial frames • Update 3D histograms in RGB space that approximates P(RGB/part) and P(RGB)
Temporal Smoothing T = t HMM
Temporal Smoothing Viterbi back tracking HMM
Temporal Smoothing Viterbi back tracking
Summary & Future work Summary • Realtime process (unoptimized code at 1Hz, 2.4 Ghz IG RAM) • 3D pose • Automatic initialisation and recovery from failure
Summary & Future work Summary • Realtime process (unoptimized code at 1Hz, 2.4 Ghz IG RAM) • 3D pose • Automatic initialisation and recovery from failure Future work • Extend robustness to illumination changes • Non-fronto-parallel poses • Poses when arms are inside the body silhouette • Simple gesture recognition by assigning semantics to regions of articulation space