620 likes | 813 Views
Recognition of 3D Objects or, 3D Recognition of Objects. Alec Rivers. Overview. 3D object recognition was dead, now it’s coming back These papers are within the last 2 years Doesn’t really work yet, but it’s just a beginning. Papers.
E N D
Recognition of 3D Objectsor, 3D Recognition of Objects Alec Rivers
Overview • 3D object recognition was dead, now it’s coming back • These papers are within the last 2 years • Doesn’t really work yet, but it’s just a beginning
Papers • The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects • CVPR 2006 • 3D LayoutCRF for Multi-View Object Class Recognition and Segmentation • CVPR 2007 • 3D Generic Object Categorization, Localization and Pose Estimation • ICCV 2007
The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects John Winn Microsoft Research Cambridge Jamie Shotton University of Cambridge
Introduction • Needed to understand next paper • It’s 2D • What does it try to solve? • Recognize one class of object at one pose and one scale, but with occlusions • Does it work? • Yes, really well, especially given occlusions
Introduction • What is interesting about it? • Segments objects • Interesting methods • No sliding windows • Multiple instances for free
Overview • Instead of sparse parts at features, use a densely covering part grid [Fischler & Elschlager 73] [Winn & Shotton 06]
Recognizing New Image – Overview • Walk through an example
Recognizing a New Image – Overview 1. Pixels guess their part
Recognizing a New Image – Overview 2. Maximize layout consistency
Layout Consistency • Defined pairwise between two pixels: PI, PJ => Bool • Means pixels I, J could be part of one instance • Toy example: Object: 1,2,3,4,5 Image: 2,3,4,5,0,0,1,2,3,4,5,2,3,4,5,0,0
instance 1 instance 2 instance 3 occlusion Layout Consistency • Defined pairwise between two pixels: PI, PJ => Bool • Means pixels I, J could be part of one instance • Toy example: Object: 1,2,3,4,5 Image: 2,3,4,5,0,0,1,2,3,4,5,2,3,4,5,0,0
Layout Consistency • In 2D, consistent IFF their relative assignments could exist in a deformed regular grid • Formally:
Overview 2. Maximize layout consistency
Layout Consistency 3. Find consistent regions; create instances Possible due to layout inconsistency at occluding borders
Overview 1. Pixels guess parts 2. Maximize layout consistency 3. Create instances [Winn & Shotton 06]
Implementation Details • Trained on manually segmented data • Crux of algorithm is conditional distribution • Like a probability for each possibility, or a score • Algorithm is just finding maximum
Part Appearance • Each pixel prefers parts that match surrounding image data • Randomized decision trees • Multiple trees, each trained on a subset of the data • Node is maximal-information-gain binary test on two nearby pixels’ intensities • Leaf of node is histogram of part possibilities • Actual preference is average over all trees
Deformed Training Part Labelings • Fits parts tighter 1. Label by grid 2. Learn from data 3. Apply to data 4. Set guesses as truth 5. Relearn
Part Layout • Preference for layout consistency plus additional pairwise costs: • Helps remove noise • Align edges along image edges
Part Layout • Return to toy example Just appearance: 1,2,0,4,5,0,0,1,2,3,3,4,0,0,1,0 With layout costs: 1,2,3,4,5,0,0,1,2,3,3,4,0,0,0,0 instance 1 instance 2
Instance Layout • Apply weak force trying to keep parts at sane positions relative to instance data (centroid, L/R flip) • Toy example: 0,1,1,1,1,1,2,3,4,5 is bad!
Implementation • Theoretically, finding global maximum of • This is “MAP” estimation • MAP = Maximum A Posteriori • In reality, using tricks to find a local maximum • α-expansion, annealed expansion move
Approximating MAP Estimation • Global maximum is intractable • α-expansion • Start with given configuration • For a given new label, ask each pixel: do you want to switch? • Can be solved efficiently with graph cuts • Repeat over all part labels • Annealed expansion move • Relabel grid, but offset to avoid local maxima
Results Oh, snap!
Thoughts • Bottom-up system is great • No sliding windows • Multiple instances for free • Information about segment boundaries: occlusion vs. completion • Reason about complete segment boundaries?
Derek Hoiem Carnegie Mellon University Carsten Rother Microsoft Research Cambridge John Winn 3D LayoutCRF for Multi-View Object Class Recognition and Segmentation
Introduction • What does it try to solve? • Extend LayoutCRF to be pose and scale invariant • Does it work? • Improvements to LayoutCRF work;3D information does little • What is interesting about it? • One method for combining 2D methods with a 3D framework • The improvements to 2D are good
Overview • Generate rough 3D model of class • Parts created over 3D model
Overview • Probability distribution
Refinements • Part layout, instance layout take into account 3D position
Refinements • New term: Instance cost
Instance Cost • Eliminates false positives • LayoutCRF: object-background cost • Explain multiple groups with one instance
Refinements • New term: Instance appearance
Instance appearance • Learn color distribution for each instance • Separate groups of pixels: definitely object, definitely background • Use these to learn colors • Apply cost to non-standard-color pixels This would fail…
Implementation Details • Parts are learned separately for each 45o viewing range, and for different scales • Instance layout is also discretized by viewpoint
Results – Comparison to LCRF • A little better(+ 8% recall) • BUT they actually turn off 3D information for this comparison • Better segmentation
Results – PASCAL 2006 • 61% precision-recall • Previous best: 45% • But, reduced test set • Without 3D: -5% • Without color: -5%
Thoughts • Color, instance costs very nice • Shoehorns LCRF into 3D without much success • LCRF is already somewhat viewpoint-invariant: segments can stretch
Silvio Savarese University of Illinois at Urbana-Champaign Fei-Fei Li Princeton University 3D Generic Object Categorization, Localization and Pose Estimation
Introduction • What does it try to solve? • Multiclass pose-invariant, scale-invariant object recognition • Does it work? • Not well. But it may be due to implementation • Why is it interesting? • Attempt learn actual 3D structure of an object • Interesting data structure for 3D info
Overview – Data Structure • Decompose object into large parts; find “canonical view” • Relate parts by mutual appearance
Related Work – Aspect Graphs • Represent stable views rather than parts Aspect graph of a cube: Image [Khoh & Kovesi, 99]
Data Structure for Cube Top Back Left Front Right Bottom
Related Work • Constellation models • Similar, but wraps around in 3D vs.
Implementation – Links • Link from canonical PI to PJ consists of • Matrix defines transformation to observe PJ when PI is viewed canonically • AIJ is skew, tIJ is translation
Implementation – Links HIJ Part Jcanonical view Part Icanonical view
Implementation – Links HJI Part Icanonical view Part Jcanonical view