20 likes | 310 Views
A Large-Scale Hierarchical Multi-View RGB-D Object Dataset. Kevin Lai 1 Liefeng Bo 1 Xiaofeng Ren 2 Dieter Fox 1,2 1 University of Washington, Seattle WA 2 Intel Labs Seattle, Seattle WA. Summary
E N D
A Large-Scale Hierarchical Multi-View RGB-D Object Dataset Kevin Lai1 Liefeng Bo1 Xiaofeng Ren2 Dieter Fox1,2 1University of Washington, Seattle WA 2Intel Labs Seattle, Seattle WA • Summary • A large-scale, hierarchical multi-view object dataset collected using a Kinect-style (RGB-D) camera that provides both RGB and depth images. • Contains 300 objects organized into 51 categories according to WordNet hypernym / hyponym relations. • 250,000 640x480 RGB and depth image pairs. • 8 home and office scenes with ground truth object annotations. • Experiments demonstrate that combining color and depth information substantially improves quality of object recognition and localization. • Available at http://www.cs.washington.edu/rgbd-dataset Fig. 1. Sample objects from the RGB-D Object Dataset. • 2. RGB-D Object Dataset • Video sequences of each object being rotated on a motorized turntable. • Camera mounted at three different heights, making angles 30˚, 45˚, and 60˚ with the horizon. • Objects are segmented using a combination of depth and vision. • A video annotation tool that uses 3D scenes reconstructed from videos (Henry et al. ISER’10) to enable easy labeling of objects. Fig. 2. Labeling two objects in a kitchen scene using a 3D reconstruction. • 3. Object Recognition Experiments • Category (e.g. coffee mug vs. soda can) and Instance (Amelia’s coffee mug vs. Bob’s coffee mug). • Compare performance of shape features (Spin images, bounding box dimensions), visual features (SIFT, color histograms, texton histograms), and a combination of both. • Benchmark state-of-the-art classifiers on the RGB-D Object Dataset: linear SVM (linSVM), Gaussian kernel SVM (kSVM), and Random Forest (RF). Table 1. Category and instance recognition performance of various classifiers. Classifier Shape Vision All Category Recognition LinSVM 53.1±1.7 74.3±3.3 81.9±2.8 kSVM 64.7±2.2 74.5±3.1 83.8±3.5 RF 66.8±2.5 74.7±3.6 79.6±4.0 Instance (Alternating contiguous frames) LinSVM 32.4±0.5 90.9±0.5 90.2±0.6 kSVM 51.2±0.8 91.0±0.5 90.6±0.6 RF 52.7±1.0 90.1±0.8 90.5±0.4 Instance (Leave-sequence-out) LinSVM 32.3 59.3 73.9 kSVM 46.2 60.7 74.8 RF 45.5 59.9 73.1 • 4. Object Detection Experiments • Sliding window detector over image pyramid: • Features: Histogram of Oriented Gradients (HOG) on color image, HOG on depth image, and normalized depth histograms. • Linear SVM detectors trained on the RGB-D Object Dataset: Fig. 4. Object detection in a very complex scene. Contact email: kevinlai@cs.washington.edu Paper ID: 1661 This work was funded in part by an Intel grant, by ONR MURI grants N00014-07-1-0749 and N00014-09-1-1052, by the NSF under contract IIS-0812671, and through the Robotics Consortium sponsored by the U.S. Army Research Laboratory under Cooperative Agreement W911NF-10-2-0016.