310 likes | 339 Views
Explore the development of a mobile vision system that can determine its location and identify objects by incorporating contextual clues. This project delves into scene analysis, object recognition, and place identification using state-of-the-art techniques like Hidden Markov Models and object priming. The system aims to predict object properties, locations, and categories based on contextual information, improving recognition accuracy and enabling seamless integration of local and global features. Future work includes enhancing object detection/localization with discriminative feature selection and scaling up place recognition to larger environments. Other related projects in the pipeline involve automatic topological map building, hierarchical POMDPs for localization, hierarchical abstraction for factored MDPs, and learning object segmentation from video data.
E N D
Adaptive Intelligent Mobile Robots Kevin Murphy PI: Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT
Outline • Towards a mobile vision system that knows where it is & what it is looking at • Brief overview of other projects
Context-based vision system for place and object recognition Antonio Torralba Kevin Murphy Bill Freeman Mark Rubin Submitted to ICCV ‘03
What is context? • What kind of location? (indoors/outdoors, office/corridor) • Which location?(Kevin’s office, Leslie’s office) • Viewing direction (facing the window) • Global scene factors (illumination) - Current activity (moving, sitting, talking)
Low-dimensional representation for scenes Compute image intensity (no color) Pipe image through steerable filter bank (6 orientations, 4 scales) Compute magnitude of response Downsample to 4 x 4 PCA to 80 dimensions
Visualizing the filter bank output Images 80-dimensional representation
Hidden Markov Model • Hidden states = location (63 values) • Observations = vGt2 R80 • Transition model encodes topology of environment • Observation model is a mixture of Gaussians (100 views per place)
Performance on known env. Ground truth System estimate Specific location Location category Indoor/outdoor
Comparison of features Categorization Recognition
Effect of HMM on recognition Without With
Object priming • Predict object properties based oncontext (top-down signals): • Visual gist, vtG • Specific Location, Qt • Kind of location, Ct • Assume objects are independent conditional on context:
Closing the loop Integrate local features (bottom up likelihood) with global features (top down prior)
Future work • Add local features (bottom-up signal) for object detection/ localization • Model dependencies between objects • Scale-up place recognition to campus • Discriminative feature selection • Use a head tracker (view angle) • Recognize movemes (motion clips) • Online, unsupervised map and object class learning
Some other projects • Automatic topological map building – Temizer • Hierarchical POMDPs for multi-scale localization – Theocharous & Murphy • Hierarchical abstraction for factored MDPs – Steinkraus • Learning object segmentation from video – Ross
Automatic topological map building • Previous system did offline learning of topological map from labeled data • Goal: do online, unsupervised learning • “Rooms” (states) are regions for which local visual navigation suffices
States Hierarchical POMDPs • Hierarchical model supports more efficient learning, inference (state estimation), and planning 600 states Vertical transitions horizontal transitions 1200 states
Hierarchical abstraction for factored MDPs • Decompose domain using different abstractions • Dynamically adjust levels of abstraction based on current state and goal • Make decisions at highest possible level perception action current planning problem
Learning object segmentation from video data • Videos contain moving objects, which are easy to segment from background. • Goal: learn model (MRF) to infer object boundaries in static images.