Self-Paced Learning for Semantic Segmentation

Self-Paced Learning forSemantic Segmentation M. Pawan Kumar

Self-Paced Learning forLatent Structural SVM M. Pawan Kumar Benjamin Packer Daphne Koller

Aim To learn accurate parameters for latent structural SVM Input x Output y Y Hidden Variable h  H “Deer” Y = {“Bison”, “Deer”, ”Elephant”, “Giraffe”, “Llama”, “Rhino” }

Aim To learn accurate parameters for latent structural SVM Feature (x,y,h) (HOG, BoW) Parameters w (y*,h*) = maxyY,hH wT(x,y,h)

Motivation Math is for losers !! Real Numbers Imaginary Numbers eiπ+1 = 0 FAILURE … BAD LOCAL MINIMUM

Motivation Euler was a Genius!! Real Numbers Imaginary Numbers eiπ+1 = 0 SUCCESS … GOOD LOCAL MINIMUM

Motivation Start with “easy” examples, then consider “hard” ones Simultaneously estimate easiness and parameters Easiness is property of data sets, not single instances Easy vs. Hard Expensive Easy for human  Easy for machine

Outline • Latent Structural SVM • Concave-Convex Procedure • Self-Paced Learning • Experiments

Latent Structural SVM Felzenszwalb et al, 2008, Yu and Joachims, 2009 Training samples xi Ground-truth label yi Loss Function (yi, yi(w), hi(w))

Latent Structural SVM (yi(w),hi(w)) = maxyY,hH wT(x,y,h) min ||w||2 + C∑i(yi, yi(w), hi(w)) Non-convex Objective Minimize an upper bound

Latent Structural SVM (yi(w),hi(w)) = maxyY,hH wT(x,y,h) min ||w||2 + C∑i i maxhiwT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Still non-convex Difference of convex CCCP Algorithm - converges to a local minimum

Concave-Convex Procedure Start with an initial estimate w0 hi = maxhH wtT(xi,yi,h) Update Update wt+1 by solving a convex problem min ||w||2 + C∑i i wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i

Concave-Convex Procedure Looks at all samples simultaneously “Hard” samples will cause confusion Start with “easy” samples, then consider “hard” ones

Self-Paced Learning REMINDER Simultaneously estimate easiness and parameters Easiness is property of data sets, not single instances

Self-Paced Learning Start with an initial estimate w0 hi = maxhH wtT(xi,yi,h) Update Update wt+1 by solving a convex problem min ||w||2 + C∑i i wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i

Self-Paced Learning min ||w||2 + C∑i i wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i

Self-Paced Learning vi {0,1} min ||w||2 + C∑i vii wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Trivial Solution

Self-Paced Learning vi {0,1} min ||w||2 + C∑i vii - ∑ivi/K wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Large K Medium K Small K

Self-Paced Learning Alternating Convex Search Biconvex Problem vi [0,1] min ||w||2 + C∑i vii - ∑ivi/K wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Large K Medium K Small K

Self-Paced Learning Start with an initial estimate w0 hi = maxhH wtT(xi,yi,h) Update Update wt+1 by solving a convex problem min ||w||2 + C∑i vii - ∑i vi/K wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Decrease K  K/

Object Detection Input x - Image Output y Y Latent h - Box  - 0/1 Loss Y = {“Bison”, “Deer”, ”Elephant”, “Giraffe”, “Llama”, “Rhino” } Feature (x,y,h) - HOG

Object Detection Mammals Dataset 271 images, 6 classes 90/10 train/test split 4 folds

Object Detection Self-Paced CCCP

Object Detection Objective value Test error

Handwritten Digit Recognition Input x - Image Output y Y Latent h - Rotation  - 0/1 Loss MNIST Dataset Y = {0, 1, … , 9} Feature (x,y,h) - PCA + Projection

Handwritten Digit Recognition SPL C C C - Significant Difference

Motif Finding Input x - DNA Sequence Output y Y Y = {0, 1} Latent h - Motif Location  - 0/1 Loss Feature (x,y,h) - Ng and Cardie, ACL 2002

Motif Finding UniProbe Dataset 40,000 sequences 50/50 train/test split 5 folds

Average Hamming Distance of Inferred Motifs Motif Finding SPL SPL SPL SPL

Motif Finding SPL Objective Value

Motif Finding SPL Test Error

Noun Phrase Coreference Input x - Nouns Output y - Clustering Latent h - Spanning Forest over Nouns Feature (x,y,h) - Yu and Joachims, ICML 2009

Noun Phrase Coreference MUC6 Dataset 60 documents 1 predefined fold 50/50 train/test split

Noun Phrase Coreference MITRE Loss Pairwise Loss - Significant Improvement - Significant Decrement

Noun Phrase Coreference SPL MITRE Loss SPL Pairwise Loss

Summary • Automatic Self-Paced Learning • Concave-Biconvex Procedure • Generalization to other Latent models • Expectation-Maximization • E-step remains the same • M-step includes indicator variables vi Kumar, Packer and Koller, NIPS 2010

Self-Paced Learning for Semantic Segmentation

Self-Paced Learning for Semantic Segmentation

Presentation Transcript

An LSCC Learning Center Self Paced Tutorial

Problem: Semantic Segmentation

Self-paced asynchronous webinar program

Self paced question sets

Since 1993 Distance Learning (self-paced online) Residence (classroom lead)

Self-Paced Learning

Self-Paced Training for Destination Coordinators

Self-Paced Training for Destination Coordinators

Variety self-paced learning English Group

NCU Self-paced English Learning Team

SELF-PACED EVENING MBA PROGRAM

Differentiating Instruction Using Self-Paced Units

Introductory PC Self-Paced Training

A Self-Paced Introductory Programming Course

An LSCC Learning Center Self-Paced Tutorial

Self-Paced Training - www.elearningsuccess.com

An LSCC Learning Center Self-Paced Tutorial

Designing for Online, Self-paced, Competency-based Learning

Self-paced study guide for Online Performance Evaluations

Photos Self Paced

Self-Paced vs. Instructor-Paced Courses: Which Should You Choose?