240 likes | 470 Views
3D Object Recognition Pipeline. Morgan Quigley, Stephen Gould Stanford Marius Muja UBC. Kurt Konolige , Radu Rusu , Victor Eruhmov , Suat Gedikli Willow Garage Stefan Holzer , Stefan Hinterstoisser TUM. 3D and Object Recognition. Provides more info than just visual texture
E N D
3D Object Recognition Pipeline Morgan Quigley, Stephen Gould Stanford Marius Muja UBC Kurt Konolige, RaduRusu, Victor Eruhmov, SuatGedikli Willow Garage Stefan Holzer, Stefan Hinterstoisser TUM
3D and Object Recognition • Provides more info than just visual texture • Good for scale and segmentation • Verification Need a good device for 3D info
3D Cameras • Tabletop manipulation: • Short range • High resolution • High range accuracy • Real-time
WG Projected Texture Stereo Device • Paint the scene with texture from a projector • vs. single camera with structured light • Advantages: • Simple projector • Standard algorithms • Full frame rates (640x480) • Dynamic scenes
WG project texture device • Projector • Red LED • Eye safe • Synchronized to cameras 3D Fly-thru
Object Recognition Pipeline Pre-filter Detect Verify • Textured objects via keypoints[Victor Eruhimov, SuatGedikli] • Untextured objects via DOT [Stefan Holzer, Stefan Hinterstoisser] • Simple 3D model matching [Marius Muja] • STAIR 2D/3D features [Stephen Gould]
MOPED – Textured object recognition with pose • Model: Stereo view of an object at a known pose • Extract keypoints and features • For a new scene, match keypoints to each model • Run SfM geometric check to verify and recover pose Torres, Romea, Srinivasa ICRA 2010
Need texture • Need high res camera
Dominant Orientation Templates (DOT) Stefan Hinterstoisser, Stefan Holzer (TUM; CVPR 2010, ECCV 2010) DOT is a template matching based approach template current scene - Template is slid over the image to compute the response for each image position - If response is above a threshold it is considered as detection of the template
DOT – Basic Principle DOT uses gradients instead of color or gray values template current scene - Gradients are less sensitive to illumination changes - Gradients have orientation and magnitude
Offline Learning Good learning is necessary to reduce false-positive rate We try to use all available information to segment the object: Point cloud from narrow stereo is used to detect the table and segment the point cloud of the object Object point cloud is used to create an initial mask Mask is refined using GrabCut (see OpenCV)
False-Positive Rejection Two more precise templates for validation: more precise and not discretized gradient template disparity template to compare expected with real disparities
False-Positive Rejection Compute error between reference point cloud and point cloud at detected position Optimize initial 3D point cloud pose given from the detection Directly gives object pose if model is associated with learned point clouds
STAIR Vision Library (SVL)Stanford STAIR project [Andrew Ng, Stephen Gould] • Initially developed to support the Stanford AI Robot (STAIR) project • Builds on top of OpenCV computer vision library and Eigen matrix library • Provides a range of software infrastructure for • computer vision • machine learning • probabilistic graphical models • Hosted on SourceForge
Sliding-window object detector Features are extracted from a local window Learned boosted decision-tree classifier scores each window Image is scanned at multiple resolutions to detect objects at different scales Object Detection in SVL
Image decomposed into multiple channels Depth at each pixel, obtained from a laser scanner, can be thought of as an additional channel [Quigley et al., ICRA 2009] intensity image edge map depth map Image Channels
Learn a “patch” dictionary over intensity, edge and depth channels Patches encode localized templates for matching Depth patches capture shape; intensity and edge patches capture appearance Patch responses (over entire dictionary) are combined to form the feature vector [Quigley et al., ICRA 2009] Object Detection Features
150 images of cluttered indoor scenes 5-fold cross-validation Depth information provides significant improvement in area under precision-recall curve [Quigley et al., ICRA 2009] Results 8% improvement 3% improvement 38% improvement
Conclusions • Realtime, accurate 3D devices are becoming available • 3D can help in object detection for untextured objects • Combo of visual and 3D features best • 3D is useful for verification • Check out the PR2 Grasping Demo!