270 likes | 380 Views
Toward Object Discovery and Modeling via 3-D Scene Comparison. Evan Herbst , Peter Henry, Xiaofeng Ren , Dieter Fox University of Washington; Intel Research Seattle. Overview. Goal: learn about an environment by tracking changes in it over time
E N D
Toward Object Discovery and Modeling via 3-D Scene Comparison Evan Herbst, Peter Henry, XiaofengRen, Dieter Fox University of Washington; Intel Research Seattle
Overview • Goal: learn about an environment by tracking changes in it over time • Detect objects that occur in different places at different times • Handle textureless objects • Avoid appearance/shape priors • Represent a map with static + dynamic parts
Algorithm Outline • Input: two RGB-D videos • Mapping & reconstruction of each video • Interscene alignment • Change detection • Spatial regularization • Outputs: reconstructed static background; segmented movable objects
Scene Reconstruction • Mapping based on RGB-D Mapping [Henry et al. ISER’10] • Visual odometry, loop-closure detection, pose-graph optimization, bundle adjustment
Scene Reconstruction • Mapping based on RGB-D Mapping [Henry et al. ISER’10] • Surface representation: surfels
Scene Differencing • Given two scenes, find parts that differ • Surfaces in two scenes similar iff object doesn’t move • Comparison at each surface point
Scene Differencing • Given two scenes, find parts that differ • Comparison at each surface point • Start by globally aligning scenes (2-D) (3-D)
Naïve Scene Differencing • Easy algorithm: closest point within δ→ same • Ignores color, surface orientation • Ignores occlusions
Scene Differencing • Model probability that a surface point moved • Sensor readings z • Expected measurement z* • m ϵ {0, 1} z3 z2 z1 frame 49 frame 25 z0 z* frame 10 frame 0
Sensor Models • Model probability that a surface point moved • Sensor readings z;expected measurement z* • By Bayes, • Two sensor measurement models • With no expected surface: • With expected surface:
Sensor Models • Two sensor measurement models • With expected surface • Depth: uniform + exponential + Gaussian 1 • Color: uniform + Gaussian • Orientation: uniform + Gaussian zd* 1Thrunet al., Probabilistic Robotics, 2005
Sensor Models • Two sensor measurement models • With expected surface • Depth: uniform + exponential + Gaussian 1 • Color: uniform + Gaussian • Orientation: uniform + Gaussian • With no expected surface • Depth: uniform + exponential • Color: uniform • Orientation: uniform zd* 1Thrunet al., Probabilistic Robotics, 2005
Example Result Scene 1 Scene 2
Spatial Regularization • Points treatedindependently so far • MRF to label each surfel moved or not moved • Data term given by pointwise evidence • Smoothness term: Potts, weighted by curvature
Spatial Regularization • Points treatedindependently so far • MRF to label each surfel moved or not moved Scene 1 Scene 2 regularized pointwise
Experiments • Trained MRF on four scenes (1.4M surfels) • Tested on twelve scene pairs (8.0M surfels) • 70% error reduction wrt max-class baseline Baseline Ours
Experiments • Results: complex scene
Experiments • Results: large object
Conclusion • Segment movable objects in 3-D using scene changes over time • Represent a map as static + dynamic parts • Extensible sensor model for RGB-D sensors • Next steps • All scenes in one optimization • Model completion from many scenes • Train more supervised object segmentation
Using More Than 2 Scenes • Given our framework, pretty easy to combine evidence from multiple scenes: • wscene could be chosen to weight all scenes (rather than frames) equally, or upweight those taken under good lighting • Other ways to subsample frames: as in keyframe selection in mapping
First Sensor Model: Surface Didn’t Move • Modeling sensor measurements: • Depth: uniform + exponential + Gaussian * • Color, normal: uniform + Gaussian; mixing controlled by probability that beam hit expected surface zd* * Fox et al., “Markov Localization…”, JAIR ‘99
Experiments • Trained MRF on four scenes (2.7 Msurfels) • Tested on twelve scene pairs (8.0 Msurfels) • 250k moved surfels; we get 4.5k FP, 51k FN • 65% error reduction wrt max-class baseline • Extract foreground segments as “objects”
Overview • Many visits to same area over time • Find objects by motion
(extra) Related Work • Prob. Sensor models • Depth only • Depth & color, extra indep. Assumptions • Static + dynamic maps • In 2-d • Usually not modeling objs
Spatial Regularization • Pointwise only so far • MRF to label each surfel moved or not moved • Data termgiven by pointwise evidence • Smoothness term: Potts, weighted by curvature
Depth-Dependent Color/Normal Model • Modeling sensor measurements: • Combine depth/color/normal:
Scene Reconstruction • Mapping based on RGB-D Mapping [Henry et al. ISER’10] • Surface representation: surfels