Part 4: Combined segmentation and recognition

Part 4: Combined segmentation and recognition by Rob Fergus (MIT)

Aim • Given an image and object category, to segment the object Object Category Model Segmentation Cow Image Segmented Cow • Segmentation should (ideally) be • shaped like the object e.g. cow-like • obtained efficiently in an unsupervised manner • able to handle self-occlusion Slide from Kumar ‘05

Feature-detector view

Examples of bottom-up segmentation • Using Normalized Cuts, Shi & Malik, 1997 Borenstein and Ullman, ECCV 2002

Jigsaw approach: Borenstein and Ullman, 2002

Matched Codebook Entries Probabilistic Voting Interest Points Voting Space(continuous) Segmentation Backprojectionof Maxima Refined Hypotheses(uniform sampling) BackprojectedHypotheses Implicit Shape Model - Liebe and Schiele, 2003 Liebe and Schiele, 2003, 2005

Random Fields for segmentation I = Image pixels (observed) h = foreground/background labels (hidden) – one label per pixel  = Parameters Posterior Joint Likelihood Prior • Generative approach models joint •  Markov random field (MRF) • 2. Discriminative approach models posterior directly •  Conditional random field (CRF)

Likelihood MRF Prior Pairwise Potential (MRF) ij(hi, hj|ij) hi h(labels) {foreground,background} hj Unary Potential i(I|hi,i) Generative Markov Random Field i Prior has no dependency on I j I(pixels) Image Plane

hi hj i j I(pixels) Image Plane Conditional Random Field Lafferty, McCallum and Pereira 2001 Discriminative approach Unary Pairwise • Dependency on I allows introduction of pairwise terms that make use of image. • For example, neighboring labels should be similar only if pixel colors are similar  Contrast term e.g Kumar and Hebert 2003

hi hj i j I(pixels) Figure from Kumar et al., CVPR 2005 Image Plane OBJCUT Kumar, Torr & Zisserman 2005 Unary Pairwise Color Likelihood Distance from Ω Label smoothness Contrast Ω(shape parameter) • Ω is a shape prior on the labels from a Layered Pictorial Structure (LPS) model • Segmentation by: • - Match LPS model to image (get number of samples, each with a different pose • Marginalize over the samples using a single graph cut • [Boykov & Jolly, 2001]

OBJCUT:Shape prior - Ω - Layered Pictorial Structures (LPS) • Generative model • Composition of parts + spatial layout Layer 2 Spatial Layout (Pairwise Configuration) Layer 1 Parts in Layer 2 can occlude parts in Layer 1 Kumar, et al. 2004, 2005

OBJCUT: Results Using LPS Model for Cow In the absence of a clear boundary between object and background Image Segmentation

Resulting min-cut segmentation Levin & Weiss [ECCV 2006] Consistency with fragments segmentation Segmentation alignment with image edges

Layout Consistent Random Field • Decision forest classifier • Features are differences of pixel intensities Classifier [Lepetit et al. CVPR 2005] Winn and Shotton 2006

Layout consistency (7,2) (8,2) (9,2) (7,3) (8,3) (9,3) (7,4) (8,4) (9,4) Winn and Shotton 2006 Neighboring pixels (p,q) ? (p,q) (p-1,q+1) (p,q+1) (p+1,q+1) Layoutconsistent

Layout Consistent Random Field Layout consistency Part detector Winn and Shotton 2006

Stability of part labelling Part color key

Object-Specific Figure-Ground Segregation Stella X. Yu and Jianbo Shi, 2002

Image parsing: Tu, Zhu and Yuille 2003

Todorovic and Ahuja, CVPR 2006 …. Multiscale Seg. Segmentation Trees Overview fused tree model for cars Training images Segment out all the cars Unseen image Segmented Cars Slide from T. Wu

LOCUS model Kannan, Jojic and Frey 2004 Winn and Jojic, 2005 Shared between images Class shape π Class edge sprite μo,σo Deformation field D Position & size T Different for each image Mask m Edge image e Background appearance λ0 Object appearance λ1 Image

In this section: brief paper reviews • Jigsaw approach: Borenstein & Ullman, 2001, 2002 • Concurrent recognition and segmentation: Yu and Shi, 2002 • Image parsing: Tu, Zhu & Yuille 2003 • Interleaved segmentation: Liebe & Schiele, 2004, 2005 • OBJCUT: Kumar, Torr, Zisserman 2005 • LOCUS: Winn and Jojic, 2005 • LayoutCRF: Winn and Shotton, 2006 • Levin and Weiss, 2006 • Todorovic and Ahuja, 2006

Summary • Strength • Explains every pixel of the image • Useful for image editing, layering, etc. • Issues • Invariance issues • (especially) scale, view-point variations • Inference difficulties

Conditional Random Fields for Segmentation • Segmentation map x • Image I Low-level pairwise term High-level local term Pixel-wise similarity

Object-Specific Figure-Ground Segregation Some segmentation/detection results Yu and Shi, 2002

Multiscale Conditional Random Fields for Image Labeling • Xuming He Richard S. Zemel Miguel A´ . Carreira-Perpin˜a´n • Conditional Random Fields for Object • Recognition • Ariadna Quattoni Michael Collins Trevor Darrell

OBJCUT • Probability of labelling in addition has • Unary potential which depend on distance from Θ (shape parameter) Θ (shape parameter) Unary Potential Φx(mx|Θ) mx m(labels) my Object Category Specific MRF x y D(pixels) Image Plane Kumar, et al. 2004, 2005

Localization using features

Levin and Weiss 2006 Levin and Weiss, ECCV 2006

Results: horses

Cows: Results • Segmentations from interest points Single-frame recognition - No temporal continuity used! Liebe and Schiele, 2003, 2005

Examples of low-level image segmentation • Normalized Cuts, Shi & Malik, 1997 Borenstein & Ullman, ECCV 2002

LayoutCRF

Part 4: Combined segmentation and recognition