Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes Cesar Cadena and Jana Kosecka • 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) • Karlsruhe, Germany ,2013

Motivation • Long-term robotic operation • The semantic information about the surrounding environment is important for high level robotic tasks. • It is difficult to know a priori all the possible instances or classesof objects that the robot will find in a real operation. • Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time. Semantic Parsing for Priming Object Detection in RGB-D Scenes

Motivation • However: • There are things we can assume to be present (almost) always • Generic “detachable” objects also share some characteristics Urban: Ground Buildings SkyObjects Indoors: Ground Walls CeilingObjects Today: Ground – Structure – Furniture – Props Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors Semantic Parsing for Priming Object Detection in RGB-D Scenes

Our Problem • However: • There are things we can assume to be present (almost) always • Generic “detachable” objects also share some characteristics Urban: Ground Buildings SkyObjects Indoors: Ground Walls CeilingObjects Today: Ground – Structure – Furniture – Props Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors Semantic Parsing for Priming Object Detection in RGB-D Scenes

NYU Depth v2 N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, Indoor segmentation and support inference from RGBD images, in ECCV, 2012. • 1449 labeled frames. • 26 scenes classes. • Labeling spans over 894 different classes. Thanks to N. Silberman for proving the mapping 894 to 4 classes. Semantic Parsing for Priming Object Detection in RGB-D Scenes

The System Semantic Segmentation MAP Marginals Semantic Parsing for Priming Object Detection in RGB-D Scenes

Different approaches • N. Silberman et al. ECCV 2012 • C. Couprie et al. CoRR 2013 • X. Ren et al. CVPR 2012 • D. Munoz et al. ECCV 2010 • I. Endres and D. Hoeim, ECCV 2010 Semantic Segmentation MAP Marginals • They have at least one: • Expensive over-segmentation • Expensive features • Expensive Inference Semantic Parsing for Priming Object Detection in RGB-D Scenes

Our approach Semantic Segmentation MAP Marginals Conditional Random Fields Graph Structure Preprocessing Inference Potentials Semantic Parsing for Priming Object Detection in RGB-D Scenes

Outline MAP Marginals (5) Results Conditional Random Fields (6) Conclusions Graph Structure (2) Preprocessing Inference Potentials (3) (1) (4) Semantic Parsing for Priming Object Detection in RGB-D Scenes

Preprocessing: Over-segmentation SLIC superpixels R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk, SLIC superpixels compared to state-of-the-art superpixel methods, PAMI, 2012. Semantic Parsing for Priming Object Detection in RGB-D Scenes

Graph Structure Classical choice on images Semantic Parsing for Priming Object Detection in RGB-D Scenes

Graph Structure: Our choice Minimum Spanning Tree Over 3D Semantic Parsing for Priming Object Detection in RGB-D Scenes

Potentials: Pairwise CRFs Semantic Parsing for Priming Object Detection in RGB-D Scenes

Potentials: unary frequency of label j in a k-NN query frequency of label j the database J. Tighe and S. Lazebnik, Superparsing: Scalable nonparametric image parsing with superpixels, ECCV 2010. The database is a kd-tree of features from training data Semantic Parsing for Priming Object Detection in RGB-D Scenes

Features 12D • From Image: • mean of Lab color space 3D • vertical pixel location 1D • entropy from vanishing points 1D • From 3D • height and depth 2D • mean and std of differences on depth 2D • local planarity 1D • neighboring planarity 1D • vertical orientation 1D Semantic Parsing for Priming Object Detection in RGB-D Scenes

Features • From Image: • entropy from vanishing points Semantic Parsing for Priming Object Detection in RGB-D Scenes

Features • From 3D • mean and std of differences on depth Semantic Parsing for Priming Object Detection in RGB-D Scenes

Features • From 3D • mean and std of differences on depth • local planarity • neighboring planarity • vertical orientation Semantic Parsing for Priming Object Detection in RGB-D Scenes

Potentials: pairwise Lab color Semantic Parsing for Priming Object Detection in RGB-D Scenes

Inference • We use belief propagation: • Exact results in MAP/marginals • Efficient computation, in Thanks to our graph structure choice! Semantic Parsing for Priming Object Detection in RGB-D Scenes

Results: NYU-D v2 Dataset GT MAP Semantic Parsing for Priming Object Detection in RGB-D Scenes

Results: NYU-D v2 Dataset • Confusion matrix: • Comparisons: Semantic Parsing for Priming Object Detection in RGB-D Scenes

Results: NYU-D v2 Dataset • Some failures: GT MAP Semantic Parsing for Priming Object Detection in RGB-D Scenes

Results: NYU-D v2 Dataset Semantic Parsing for Priming Object Detection in RGB-D Scenes

Marginal probabilities • Provide very useful information for specific tasks, e.g. : • Specific object detection • Support inference P(Ground) P(Structure) P(Furniture) P(Props) Semantic Parsing for Priming Object Detection in RGB-D Scenes

Conclusions • We have presented a computational efficient approach for semantic segmentation of priming objects in indoors. • Our approach effectively uses 3D and Images cues. Depth discontinuities are evidence for occlusions • The MST over 3D keeps intra-class components coherently connected. Semantic Parsing for Priming Object Detection in RGB-D Scenes

Discussion • Features: • Local classifier: • Graph structure Silberman et al. 2012 Couprie et al. 2013 Ours. Bunch of engineered features (>1000D) Learned features (>1000D) Select meaningful features (12D) Logistic Regression Neural Networks k-NN Dense Connections Image None MST over 3D Semantic Parsing for Priming Object Detection in RGB-D Scenes

Thanks!! Cesar Cadenaccadenal@gmu.edu Jana Koseckakosecka@.cs.gmu.edu Funded by the US Army Research Office Grant W911NF-1110476. Semantic Parsing for Priming Object Detection in RGB-D Scenes

Working on: • People detection by Shenghui Zhou Semantic Parsing for Priming Object Detection in RGB-D Scenes

Multi-view and video: Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes