380 likes | 461 Views
Looking at people and Image-based Localisation. Roberto Cipolla Department of Engineering Research team http://www.eng.cam.ac.uk/~cipolla/people.html. 1. Real-time hand detection and tracking. Why is it hard?. Highly articulated object, 27 model parameters
E N D
Looking at people and Image-based Localisation Roberto Cipolla Department of Engineering Research team http://www.eng.cam.ac.uk/~cipolla/people.html
Why is it hard? • Highly articulated object, 27 model parameters • Shape variation and self-occlusions • Unreliable point features • Ambiguities in single view lead to multi-modal distributions (local minima)
Why is it hard? • Background clutter • Potentially fast motion • Lighting changes • Partial / full occlusion
A Solved Problem? 3D tracking, 6/7 DOF • Model: 3D quadrics • Cost Function: Edges or colour-edges • Tracking: Unscented Kalman filtering • Single or dual view • Single hypothesis filter, no recovery strategy
A Robust Tracker • Should work in scenes with complex background and varying illumination • Important: Cost function design • Optimization strategy • Should handle multi-modality • Examples: Particle filters, multi-hypotheses filters • Should have a recovery strategy when track is lost • Trigger search algorithm
3D Pose Recovery 3D hand model constructed from cones and ellipsoids Contour projection, handling self-occlusions 27 motion parameters
Likelihood : Edges 3D Model Input Image Edge Detection Projected Contours Robust Edge Matching
Chamfer Matching Input image Canny edges Distance transform Projected Contours
Likelihood : Colour 3D Model Input Image Projected Silhouette Skin Colour Model Template Matching
Matching Multiple Templates • Use tree structure to efficiently match many templates (>50,000) • Arrange templates in tree based on their similarity • Traverse tree using breadth-first search, several ‘active’ leaves possible Search Tree Grid-based partitioning ofparameter space
Bayesian-Tree State space partitioning Estimation of posterior pdf • The search-tree is brought into a Bayesian framework by adding the prior knowledge from previous frame. • The Bayesian-Tree can be thought as approximating the posterior probability at different resolutions.
Experiments Global Motion • 3D motions limited to hemisphere • Dynamics: First-order Gaussian process • 3 level tree with 16,000 templates at leaf level • 5 scales, divisions of 15 degrees in 3D rotation and divisions of 10 degrees in image plane rotation • Translation search at 20, 5, 2-pixel resolution
Experiments Finger Articulation • Opening and closing of thumb and fingers approximated by 2 parameters • Global motion restricted to smaller range, but still with 6 DOF • 35,000 templates at the leaf level
Ongoing work • Large number of templates required Examples shown here show only constrained motion Number of templates required for fully articulated motion? • Tracking rates at 5 fps to 0.2 fps For 400 - 35,000 templates(on a 2.4 GHz Pentium IV) • Error introduced by geometric model No palm deformation, no skin deformation, no arm model
Image-based localisation ... ...
Summary and deliverables • Realtime hand detection in clutter • 3D models from uncalibrated images • Image-based localisation for augmented reality