260 likes | 425 Views
Context-based vision system for place and object recognition. Antonio Torralba Kevin Murphy Bill Freeman Mark Rubin Presented by David Lee. Some slides borrowed from Kevin Murphy. Object out of context. Object in context. Wearable test-bed. System diagram. Computing the features.
E N D
Context-based vision system for place and object recognition Antonio Torralba Kevin Murphy Bill Freeman Mark Rubin Presented by David Lee Some slides borrowed from Kevin Murphy
4x4x24 =384 dim 80 dim Downsample to 4x4 24 filtered Images
Visualizing the filter bank output Images 80-dimensional representation
Hidden Markov Model • Hidden states = location (63 values) • Observations = vGt∈ R80 • Transition model encodes topology of environment • Observation model is a mixture of Gaussians (100 views per place)
Mixture of Gaussians MLE (counting) Hidden Markov Model Observation Likelihood Prediction Prior Transition Matrix
Scene Categorization • 17 Categories (Office, Corridor, Street, etc) • Train a separate HMM on category labels
Performance on known env. Ground truth System estimate Specific location Location category Indoor/outdoor
Comparison of features Categorization Recognition
Effect of HMM on recognition Without With (But with temporal smoothing)
Object priming • Predict object properties based oncontext (top-down signals): • Visual gist, vtG • Specific Location, Qt • Kind of location, Ct
MLE Mixture of Gaussians Object Priming Estimate of current place (Output of HMM) Probability of object i in image vi given entire video sequence Probability of object i Given current observation & place Prior probability of object i being in place q Observation Likelihood Probability of object i Again…
Predicting object position and scale Probability of an object i being present and location being q (Output of previous system) Estimate of mask Estimate of mask given current gist, place, and object delta Gaussian
Conclusion • Real world problem (and it works!) • Uses only global feature (context) • How much did {HMM / place prior} affect{place recognition / object detection}?Can we really say “context” did the job?