Looking at people and Image-based Localisation

Looking at people and Image-based Localisation Roberto Cipolla Department of Engineering Research team http://www.eng.cam.ac.uk/~cipolla/people.html

1. Real-time hand detection and tracking

Why is it hard? • Highly articulated object, 27 model parameters • Shape variation and self-occlusions • Unreliable point features • Ambiguities in single view lead to multi-modal distributions (local minima)

Why is it hard? • Background clutter • Potentially fast motion • Lighting changes • Partial / full occlusion

A Solved Problem? 3D tracking, 6/7 DOF • Model: 3D quadrics • Cost Function: Edges or colour-edges • Tracking: Unscented Kalman filtering • Single or dual view • Single hypothesis filter, no recovery strategy

A Robust Tracker • Should work in scenes with complex background and varying illumination • Important: Cost function design • Optimization strategy • Should handle multi-modality • Examples: Particle filters, multi-hypotheses filters • Should have a recovery strategy when track is lost • Trigger search algorithm

3D Pose Recovery 3D hand model constructed from cones and ellipsoids Contour projection, handling self-occlusions 27 motion parameters

Hierarchy of classifiers

Likelihood : Edges 3D Model Input Image Edge Detection Projected Contours Robust Edge Matching

Chamfer Matching Input image Canny edges Distance transform Projected Contours

Likelihood : Colour 3D Model Input Image Projected Silhouette Skin Colour Model Template Matching

Tree-based bayesian filtering

Matching Multiple Templates • Use tree structure to efficiently match many templates (>50,000) • Arrange templates in tree based on their similarity • Traverse tree using breadth-first search, several ‘active’ leaves possible Search Tree Grid-based partitioning ofparameter space

Bayesian-Tree State space partitioning Estimation of posterior pdf • The search-tree is brought into a Bayesian framework by adding the prior knowledge from previous frame. • The Bayesian-Tree can be thought as approximating the posterior probability at different resolutions.

Experiments Global Motion • 3D motions limited to hemisphere • Dynamics: First-order Gaussian process • 3 level tree with 16,000 templates at leaf level • 5 scales, divisions of 15 degrees in 3D rotation and divisions of 10 degrees in image plane rotation • Translation search at 20, 5, 2-pixel resolution

Tracking Results

Experiments Finger Articulation • Opening and closing of thumb and fingers approximated by 2 parameters • Global motion restricted to smaller range, but still with 6 DOF • 35,000 templates at the leaf level

Opening and closing

Hand detection system

Ongoing work • Large number of templates required Examples shown here show only constrained motion Number of templates required for fully articulated motion? • Tracking rates at 5 fps to 0.2 fps For 400 - 35,000 templates(on a 2.4 GHz Pentium IV) • Error introduced by geometric model No palm deformation, no skin deformation, no arm model

Detecting people

2. Building 3D models of cities

Trumpington Street Data

Camera pose determination

3D reconstruction

Reconstruction texture mapped

3. Where am I?

Image-based localisation ... ...

Image-based localisation

Image-based localisation … …

Image-based localisation

Summary and deliverables • Realtime hand detection in clutter • 3D models from uncalibrated images • Image-based localisation for augmented reality

Looking at people and Image-based Localisation

Looking at people and Image-based Localisation

Presentation Transcript

Localisation

Image-based modeling (IBM) and image-based rendering (IBR)

Content-Based Image Retrieval: Reading One’s Mind and Making People Share

Image-Based Lighting

Image-Based Rendering

People. Image. Results

IMAGE-BASED AUTHENTICATION

Image-Based Rendering

Image Based Rendering

Localisation

Image Based Rendering

Indoor Localisation Sensor for Elderly People

Image-Based Rendering

Image-Based Modeling and Rendering

Image-based modeling

People. Image. Results

A Peek At Business Localisation

Image-based rendering