Context-based vision system for place and object recognition

Context-based vision system for place and object recognition Antonio Torralba Kevin Murphy Bill Freeman Mark Rubin Presented by David Lee Some slides borrowed from Kevin Murphy

Object out of context

Object in context

Wearable test-bed

System diagram

Computing the features

4x4x24 =384 dim 80 dim Downsample to 4x4 24 filtered Images

Visualizing the filter bank output Images 80-dimensional representation

Place recognition system

Hidden Markov Model • Hidden states = location (63 values) • Observations = vGt∈ R80 • Transition model encodes topology of environment • Observation model is a mixture of Gaussians (100 views per place)

Mixture of Gaussians MLE (counting) Hidden Markov Model Observation Likelihood Prediction Prior Transition Matrix

Scene Categorization • 17 Categories (Office, Corridor, Street, etc) • Train a separate HMM on category labels

Place recognition demo

Performance on known env. Ground truth System estimate Specific location Location category Indoor/outdoor

Performance on new env.

Comparison of features Categorization Recognition

Effect of HMM on recognition Without With (But with temporal smoothing)

From place to object recognition

Object priming • Predict object properties based oncontext (top-down signals): • Visual gist, vtG • Specific Location, Qt • Kind of location, Ct

MLE Mixture of Gaussians Object Priming Estimate of current place (Output of HMM) Probability of object i in image vi given entire video sequence Probability of object i Given current observation & place Prior probability of object i being in place q Observation Likelihood Probability of object i Again…

Predicting object presence

ROC curves for object detection

Predicting object position and scale

Predicting object position and scale Probability of an object i being present and location being q (Output of previous system) Estimate of mask Estimate of mask given current gist, place, and object delta Gaussian

Predicted segmentation

Conclusion • Real world problem (and it works!) • Uses only global feature (context) • How much did {HMM / place prior} affect{place recognition / object detection}?Can we really say “context” did the job?

Context-based vision system for place and object recognition