350 likes | 495 Views
K. Ni, A. Kannan , A. Criminisi and J. Winn. Epitomic Location Recognition. A g enerative approach for location recognition. In proc. CVPR 2008. Anchorage, Alaska. Goal Introduction Recognition Enhancements Evaluation. Location Recognition. Where am I? Instance recognition
E N D
K. Ni, A. Kannan, A. Criminisi and J. Winn Epitomic Location Recognition A generative approach for location recognition In proc. CVPR 2008. Anchorage, Alaska.
Goal Introduction Recognition Enhancements Evaluation
Location Recognition • Where am I? • Instance recognition • Category recognition (more difficult) Lobby? Cubicle? Hallway? Kitchen?
Goal Introduction Recognition Enhancements Evaluation
Geometry Based Recognition • SLAM & structure from motion • Why do we need metric reconstruction? • Lose the flexibility to do class recognition. Training Images Local Feature Database Geometry &Labels Testing Image Features F. Schaffalitzky and A. Zisserman G. Schindler, M. Brown, R. Szeliski
Appearance Based Recognition • Capture global appearance information • Gaussian mixture model used by A. Torralba, et. al Preprocessing Image Vectors Training Training Images Appearance Model (e.g. PCA) A. Torralba, K. Murphy, W. T. Freeman and M. A. Rubin M. Cummins and P. Newman
Appearance or Geometry? • Can we do better by fusing both information together? A small example with 2 location labels: cubicle and corridor
The Simplest Model • Nearest neighbor classification • Naive but still effective with enough samples. • A small shift may disrupt the recognition. • Does not capture uncertainty.
How to Incorporate Translation Invariance? • We need something better than a “bag of frames” model Training images Testing image
Panorama • It models both appearance & geometry • Adapts to camera rotation and focal length change • Generative • An image is a patch “extracted” from the panorama M. Brown and D. G. Lowe
Cons of Panoramas • Not easy to build a panorama due to parallax • Do not capture uncertainty • Only work for location instance recognition • No compact representation for repetitive scenes
Gaussian Mixture Model • Six mixtures trained as in Torralba et al’s paper • Handles uncertainties but no translation invariance Remove boundaries Much more blurred Means Variances
A Weak Panorama • 3D motions can be roughly modeled by 2D translation + scaling. 2D translation Scaling
Epitome = Panorama + GMM • Epitome • Generative model for image patches /video frames • Captures repetitive patterns in the original image • Mapping = 2D translation + scaling Epitome A source image Image patches N. Jojic et.al., ICCV 2003; N. Petrovic, et.al., CVPR 2006
Epitome as Probabilistic Panorama • Model 3D scenes rather than a single 2D image Location Epitome Means Variances Environment = Virtual panorama
Learning the Location Epitome • Initialize epitome randomly • EM Iterations • E-step: infer the posteriors over all mappings • M-step: use the posteriors as weights to update the mean and variance of epitome pixels Free energy EM iterations
Model Comparison • Epitome is a smart mixture of Gaussians model with parameters sharing among components • For the same number of parameters, the epitome generalizes better
Goal Introduction Recognition Enhancements Evaluation
Build Label Maps • The label maps are the posterior of the label given the mapping Cubicle label map Corridor label map Epitome Label maps
Recognition from Location Epitomes • Fast correlation: infer the best mapping region • Sum the pixel-wise votes • Temporal smoothing using HMM Best matching patch Input testing image Cubicle label map Location epitome Corridor label map
Goal Introduction Recognition Enhancements Evaluation
Color is not always the best feature • Other features besides RGB • For example, stereo feature captures the depth info. • Do not need high stereo accuracy (efficient DP here) Corridor Cubicle Kitchen
Integrating Multiple Features • Stack multiple feature “channels” Stereo R G B
Local Histograms • Enable better translation invariance and more generalization • Error rate: 0.49 0.36 in a test, 4-class dataset • Improve the efficiency dramatically: 30 times speed-up
Supervised Learning • Incorporates training image labels • Helps discriminate images with similar features but different location labels. A microwave in the kitchen An example epitome A monitor in the cubicle Discriminative features An example label feature
Goal Introduction Recognition Enhancements Evaluation
MIT Image Database • Created by Antonio Torralba, and et. al. • 17 sequences, 62 locations, 7 categories, 72077 images
Results on Recognizing Location Instances • Location epitome vs. GMM, 10% better in average
Results on Recognizing Location Classes • Location Epitome vs. GMM, 10%-20% better
MSRC Data Set • Captured with a stereo camera • 5409 images collected at the speed of 4 fps • 11 sequences and 7 classes corridor_visionlab cubicle_mlp kitchen-fl2-north lectureroom-large lectureroom-small stairs-1st-to-2nd stairs-2nd-to-1st
Integrate Depth Cues corridor_visionlab cubicle_mlp kitchen-fl2-north lectureroom-large lectureroom-small stairs-1st-to-2nd stairs-2nd-to-1st
Instance Recognition with Multiple Features • RGB & Stereo overwhelms the other features • Learning: 5.7 fps • Recognition: 116 fps = 29 times the capture speed
Summary • A generative model for the recognition of both location instances and classes • Fast: capable of real-time applications • Flexible: capable of integrating various features • Probabilistic: capable of capturing uncertainties • Future applications • Navigation for visually impaired people • Appearance-based loop closing for SLAM problems
K. Ni, A. Kannan, A. Criminisi and J. Winn Epitomic Location Recognition Thank you ! A generative approach for location recognition
Local Histograms (2) • Improves efficiency (both training and testing) • The bottle neck: convoluting epitome and images • Compression rate: 3*(C1C2)2/50 = 2400 • Learning: 3 hours 6 mins, 30 times faster Ne/C2 N/C2 Me/C1 M/C1 N Ne Epitome Image Me M * * Convolute 3-dimension RGB features Convolute 50-dimension local histograms