Epitomic Location Recognition

K. Ni, A. Kannan, A. Criminisi and J. Winn Epitomic Location Recognition A generative approach for location recognition In proc. CVPR 2008. Anchorage, Alaska.

Goal Introduction Recognition Enhancements Evaluation

Location Recognition • Where am I? • Instance recognition • Category recognition (more difficult) Lobby? Cubicle? Hallway? Kitchen?

Geometry Based Recognition • SLAM & structure from motion • Why do we need metric reconstruction? • Lose the flexibility to do class recognition. Training Images Local Feature Database Geometry &Labels Testing Image Features F. Schaffalitzky and A. Zisserman G. Schindler, M. Brown, R. Szeliski

Appearance Based Recognition • Capture global appearance information • Gaussian mixture model used by A. Torralba, et. al Preprocessing Image Vectors Training Training Images Appearance Model (e.g. PCA) A. Torralba, K. Murphy, W. T. Freeman and M. A. Rubin M. Cummins and P. Newman

Appearance or Geometry? • Can we do better by fusing both information together? A small example with 2 location labels: cubicle and corridor

The Simplest Model • Nearest neighbor classification • Naive but still effective with enough samples. • A small shift may disrupt the recognition. • Does not capture uncertainty.

How to Incorporate Translation Invariance? • We need something better than a “bag of frames” model Training images Testing image

Panorama • It models both appearance & geometry • Adapts to camera rotation and focal length change • Generative • An image is a patch “extracted” from the panorama M. Brown and D. G. Lowe

Cons of Panoramas • Not easy to build a panorama due to parallax • Do not capture uncertainty • Only work for location instance recognition • No compact representation for repetitive scenes

Gaussian Mixture Model • Six mixtures trained as in Torralba et al’s paper • Handles uncertainties but no translation invariance Remove boundaries Much more blurred Means Variances

A Weak Panorama • 3D motions can be roughly modeled by 2D translation + scaling. 2D translation Scaling

Epitome = Panorama + GMM • Epitome • Generative model for image patches /video frames • Captures repetitive patterns in the original image • Mapping = 2D translation + scaling Epitome A source image Image patches N. Jojic et.al., ICCV 2003; N. Petrovic, et.al., CVPR 2006

Epitome as Probabilistic Panorama • Model 3D scenes rather than a single 2D image Location Epitome Means Variances Environment = Virtual panorama

Learning the Location Epitome • Initialize epitome randomly • EM Iterations • E-step: infer the posteriors over all mappings • M-step: use the posteriors as weights to update the mean and variance of epitome pixels Free energy EM iterations

Model Comparison • Epitome is a smart mixture of Gaussians model with parameters sharing among components • For the same number of parameters, the epitome generalizes better

Build Label Maps • The label maps are the posterior of the label given the mapping Cubicle label map Corridor label map Epitome Label maps

Recognition from Location Epitomes • Fast correlation: infer the best mapping region • Sum the pixel-wise votes • Temporal smoothing using HMM Best matching patch Input testing image Cubicle label map Location epitome Corridor label map

Color is not always the best feature • Other features besides RGB • For example, stereo feature captures the depth info. • Do not need high stereo accuracy (efficient DP here) Corridor Cubicle Kitchen

Integrating Multiple Features • Stack multiple feature “channels” Stereo R G B

Local Histograms • Enable better translation invariance and more generalization • Error rate: 0.49  0.36 in a test, 4-class dataset • Improve the efficiency dramatically: 30 times speed-up

Supervised Learning • Incorporates training image labels • Helps discriminate images with similar features but different location labels. A microwave in the kitchen An example epitome A monitor in the cubicle Discriminative features An example label feature

MIT Image Database • Created by Antonio Torralba, and et. al. • 17 sequences, 62 locations, 7 categories, 72077 images

Results on Recognizing Location Instances • Location epitome vs. GMM, 10% better in average

Results on Recognizing Location Classes • Location Epitome vs. GMM, 10%-20% better

MSRC Data Set • Captured with a stereo camera • 5409 images collected at the speed of 4 fps • 11 sequences and 7 classes corridor_visionlab cubicle_mlp kitchen-fl2-north lectureroom-large lectureroom-small stairs-1st-to-2nd stairs-2nd-to-1st

Integrate Depth Cues corridor_visionlab cubicle_mlp kitchen-fl2-north lectureroom-large lectureroom-small stairs-1st-to-2nd stairs-2nd-to-1st

Instance Recognition with Multiple Features • RGB & Stereo overwhelms the other features • Learning: 5.7 fps • Recognition: 116 fps = 29 times the capture speed

Summary • A generative model for the recognition of both location instances and classes • Fast: capable of real-time applications • Flexible: capable of integrating various features • Probabilistic: capable of capturing uncertainties • Future applications • Navigation for visually impaired people • Appearance-based loop closing for SLAM problems

K. Ni, A. Kannan, A. Criminisi and J. Winn Epitomic Location Recognition Thank you ! A generative approach for location recognition

Local Histograms (2) • Improves efficiency (both training and testing) • The bottle neck: convoluting epitome and images • Compression rate: 3*(C1C2)2/50 = 2400 • Learning: 3 hours  6 mins, 30 times faster Ne/C2 N/C2 Me/C1 M/C1 N Ne Epitome Image Me M * * Convolute 3-dimension RGB features Convolute 50-dimension local histograms

Epitomic Location Recognition

Epitomic Location Recognition

Presentation Transcript

Location, Location, Location

Location, Location, Location

Location – Location - Location

Location, Location, Location

Location, Location, Location

Epitomic representations in Computer Vision

Location! Location! Location!

Location, location, location…

Location Recognition

Graph-Based Discriminative Learning for Location Recognition

Location, location, location…

Location, Location, Location

CS224N Final Project Geo-location Route Recognition

Location, Location, Location!

Location, Location, Location!

LOCATION, LOCATION, LOCATION

Location, Location, Location!

Location, location, location

Location, Location, Location!

Location, Location, Location

Location, Location, Location

Location, Location, Location