460 likes | 649 Views
Learning to Segment from Diverse Data. M. Pawan Kumar. Haithem Turki. Dan Preston. Daphne Koller. Learn accurate parameters for a segmentation model. Aim. Segmentation without generic foreground or background classes Train using both strongly and weakly supervised data.
E N D
Learning to Segment from Diverse Data M. Pawan Kumar Haithem Turki Dan Preston Daphne Koller
Learn accurate parameters for a segmentation model Aim • Segmentation without generic foreground or background classes • Train using both strongly and weakly supervised data
“Strong” Supervision “Weak” Supervision “Car” Data in Vision “One hand tied behind the back…. “
Data for Vision “Strong” Supervision “Weak” Supervision “Car”
Specific foreground classes, generic background class Types of Data PASCAL VOC Segmentation Datasets
Specific background classes, generic foreground class Types of Data Stanford Background Dataset
Bounding boxes for objects Types of Data PASCAL VOC Detection Datasets Thousands of freely available images Current methods only use small, controlled datasets
Image-level labels Types of Data ImageNet, Caltech … Thousands of freely available images “Car”
Noisy data from web search Types of Data Google Image, Flickr, Picasa ….. Millions of freely available images
Outline • Region-based Segmentation Model • Problem Formulation • Inference • Results
Region-based Segmentation Model Regions Pixels Object Models
Outline • Region-based Segmentation Model • Problem Formulation • Inference • Results
Region features Detection features Pairwise contrast Pairwise context Problem Formulation Treat missing information as latent variables Image x Annotation y Complete Annotation (y,h) Joint Feature Vector (x,y,h)
Problem Formulation Treat missing information as latent variables Image x Annotation y Complete Annotation (y,h) Latent Structural SVM (y*,h*) = argmax wT (x,y,h) Trained by minimizing overlap loss ∆
Self-Paced Learning hi = maxhH wtT(xi,yi,h) Update Update wt+1 by solving a biconvex problem min ||w||2 + C∑i vii - K∑i vi wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Start with an initial estimate w0 Annotation Consistent Inference Loss Augmented Inference Kumar, Packer and Koller, 2010
Outline • Region-based Segmentation Model • Problem Formulation • Inference • Results
Generic Classes DICTIONARY OF REGIONS D MERGE AND INTERSECT WITH SEGMENTS TO FORM PUTATIVE REGIONS Current Regions Over-Segmentations ITERATE UNTIL CONVERGENCE SELECT REGIONS min Ty s.t. y SELECT(D) Kumar and Koller, 2010
Generic Classes Binary yr(0) = 1 iff r is not selected Binary yr(1) = 1 iff r is selected miny ∑r(i)yr(i) + ∑rs(i,j)yrs(i,j) Minimize the energy s.t. yr(0) + yr(1) = 1 Assign one label to r from L yrs(i,0) + yrs(i,1) = yr(i) Ensure yrs(i,j) = yr(i)ys(j) yrs(0,j) + yrs(1,j) = ys(j) ∑r “covers” u yr(1) = 1 Each super-pixel is covered by exactly one selected region yr(i), yrs(i,j) {0,1} Binary variables
Generic Classes DICTIONARY OF REGIONS D MERGE AND INTERSECT WITH SEGMENTS TO FORM PUTATIVE REGIONS Simultaneous region selection and labeling Current Regions Over-Segmentations ITERATE UNTIL CONVERGENCE SELECT REGIONS min Ty s.t. y SELECT(D) ∆new ≤ ∆prev Kumar and Koller, 2010
Examples Iteration 3 Iteration 6 Iteration 1
Examples Iteration 3 Iteration 6 Iteration 1
Examples Iteration 3 Iteration 6 Iteration 1
Bounding Boxes Each row and each column of bounding box is covered min Ty y SELECT(D) ∆new ≤ ∆prev + Ka (1-za) za {0,1} za ≤ r “covers” a yr(c)
Examples Iteration 2 Iteration 4 Iteration 1
Examples Iteration 2 Iteration 4 Iteration 1
Examples Iteration 2 Iteration 4 Iteration 1
Image-Level Labels Image must contain the specified object min Ty y SELECT(D) ∆new ≤ ∆prev + K (1-z) z {0,1} z≤ yr(c)
Outline • Region-based Segmentation Model • Problem Formulation • Inference • Results
PASCAL VOC 2009 Stanford Background Dataset + Generic background class 20 foreground classes Generic foreground class 7 background classes
PASCAL VOC 2009 Stanford Background Dataset + Train - 1274 images Validation - 225 images Test - 750 images Train - 572 images Validation - 53 images Test - 90 images Baseline: Closed-loop learning (CLL), Gould et al., 2009
Results PASCAL VOC 2009 CLL - 24.7% LSVM - 26.9% Improvement over CLL SBD CLL - 53.1% LSVM - 54.3% Improvement over CLL
PASCAL VOC 2009 + 2010 Stanford Background Dataset + Train - 1274 images Validation - 225 images Test - 750 images Bounding Boxes - 1564 images Train - 572 images Validation - 53 images Test - 90 images
Results PASCAL VOC 2009 CLL - 24.7% LSVM - 26.9% BOX - 28.3% Improvement over CLL SBD CLL - 53.1% LSVM - 54.3% BOX - 54.8% Improvement over CLL
PASCAL VOC 2009 + 2010 Stanford Background Dataset + Train - 1274 images Validation - 225 images Test - 750 images Bounding Boxes - 1564 images Train - 572 images Validation - 53 images Test - 90 images + 1000 image-level labels (ImageNet)
Results PASCAL VOC 2009 CLL - 24.7% LSVM - 26.9% BOX - 28.3% LABEL - 28.8% Improvement over CLL SBD CLL - 53.1% LSVM - 54.3% BOX - 54.8% LABEL - 55.3% Improvement over CLL
Specific foreground classes, generic background class Types of Data PASCAL VOC Segmentation Datasets
Specific background classes, generic foreground class Types of Data Stanford Background Dataset
Bounding boxes for objects Types of Data PASCAL VOC Detection Datasets Thousands of freely available images
Image-level labels Types of Data ImageNet, Caltech … Thousands of freely available images “Car”
Noisy data from web search Types of Data Google Image, Flickr, Picasa ….. Millions of freely available images
Two Problems The “Noise” Problem Self-Paced Learning The “Size” Problem Self-Paced Learning