240 likes | 347 Views
How should we combine high level and low level knowledge?. Jitendra Malik UC Berkeley. Recognition using regions is joint work with Chunhui Gu, Joseph Lim & Pablo Arbelaez (CVPR 2009). The central problems of vision. Object and Scene Recognition. Grouping / Segmentation.
E N D
How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint work with Chunhui Gu, Joseph Lim & Pablo Arbelaez (CVPR 2009)
The central problems of vision Object and Scene Recognition Grouping / Segmentation 3D structure/ Figure-Ground
Detection and Segmentation: Giraffes Orig. Image Segmentation Orig. Image Segmentation
Detection and Segmentation: Mugs Orig. Image Segmentation Orig. Image Segmentation
Outline • Current paradigm: Multiscale scanning • Our approach • Bottom up region segmentation • Hough transform style voting (learned weights) • Top down segmentation • Results on ETHZ , Caltech 101, MSRC
Detection: Is this an X? Ask this question repeatedly, varying position, scale, category… Paradigm introduced by Rowley, Baluja & Kanade 96 for face detection Viola & Jones 01, Dalal & Triggs 05, Felzenszwalb, McAllester, Ramanan 08
Problems with the multi-scale scanning paradigm • Computational complexity • 10^6 windows, 10 scales, 10^4 categories • Not natural for irregularly shaped objects • Segmentation is delinked • Context is delinked
Our Approach Perceptual Organization provides the right primitives for visual recognition. After more than a decade of work, we finally have high quality, generic, detectors for contours and regions. We now only need to work with ~100 elements, each with its local scale estimate. In this talk, we demonstrate recognition using regions. Detection and segmentation happen in the same framework. There will always be some errors in the bottom-up grouping process, the recognition machinery needs to be robust to that.
Region detector wins on any measure! Region Benchmarks on BSDS Region Benchmarks on MSRC/PASCAL08 Probabilistic Rand Index on BSDS Variation of Information on BSDS
Parallelizing Image SegmentationCatanzaro et al, UC Berkeley, ICCV 09 GTX 280 is an Nvidia Graphics Processor, massively parallel general purpose computing platform 30 cores, 8 wide SIMD = 240 way parallelism 140 GB/s memory bandwidth (Modern CPUs have ~10-20 GB/s) Special memory subsystems for graphics processing Sequential Implementation: 5 minutes per image Parallel, Optimized Implementation: 2 seconds
Why Use Regions? • Local estimate of scale; no search necessary • Shape, color and texture in the same framework • Hierarchy of regions (“partonomy”) represents scenes, objects, parts. Makes use of context natural. • Do not suffer from background clutter • Reduce candidate windows on detection task • 1000 to 10000 times fewer windows on the ETHZ dataset • Need to be robust to segmentation errors
Object Representation using Regions Region Segmentation Bag of Regions
Region-based Hough Voting • Recover transformation from matched regions • Transform exemplar bounding box to query T(x,y,sx,sy) T(x,y,sx,sy) Exemplar Query 20
Region-based Voting Query Exemplar 1 21
Region-based Voting Query Exemplar 1 22
Region-based Voting Query Exemplar 1 23
Region-based Voting Query Exemplar 1 24