490 likes | 650 Views
Self-Paced Learning for Semantic Segmentation. M. Pawan Kumar. Self-Paced Learning for Latent Structural SVM. M. Pawan Kumar. Benjamin Packer. Daphne Koller. Aim. To learn accurate parameters for latent structural SVM. Input x. Output y Y. Hidden Variable h H. “Deer”.
E N D
Self-Paced Learning forSemantic Segmentation M. Pawan Kumar
Self-Paced Learning forLatent Structural SVM M. Pawan Kumar Benjamin Packer Daphne Koller
Aim To learn accurate parameters for latent structural SVM Input x Output y Y Hidden Variable h H “Deer” Y = {“Bison”, “Deer”, ”Elephant”, “Giraffe”, “Llama”, “Rhino” }
Aim To learn accurate parameters for latent structural SVM Feature (x,y,h) (HOG, BoW) Parameters w (y*,h*) = maxyY,hH wT(x,y,h)
Motivation Math is for losers !! Real Numbers Imaginary Numbers eiπ+1 = 0 FAILURE … BAD LOCAL MINIMUM
Motivation Euler was a Genius!! Real Numbers Imaginary Numbers eiπ+1 = 0 SUCCESS … GOOD LOCAL MINIMUM
Motivation Start with “easy” examples, then consider “hard” ones Simultaneously estimate easiness and parameters Easiness is property of data sets, not single instances Easy vs. Hard Expensive Easy for human Easy for machine
Outline • Latent Structural SVM • Concave-Convex Procedure • Self-Paced Learning • Experiments
Latent Structural SVM Felzenszwalb et al, 2008, Yu and Joachims, 2009 Training samples xi Ground-truth label yi Loss Function (yi, yi(w), hi(w))
Latent Structural SVM (yi(w),hi(w)) = maxyY,hH wT(x,y,h) min ||w||2 + C∑i(yi, yi(w), hi(w)) Non-convex Objective Minimize an upper bound
Latent Structural SVM (yi(w),hi(w)) = maxyY,hH wT(x,y,h) min ||w||2 + C∑i i maxhiwT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Still non-convex Difference of convex CCCP Algorithm - converges to a local minimum
Outline • Latent Structural SVM • Concave-Convex Procedure • Self-Paced Learning • Experiments
Concave-Convex Procedure Start with an initial estimate w0 hi = maxhH wtT(xi,yi,h) Update Update wt+1 by solving a convex problem min ||w||2 + C∑i i wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i
Concave-Convex Procedure Looks at all samples simultaneously “Hard” samples will cause confusion Start with “easy” samples, then consider “hard” ones
Outline • Latent Structural SVM • Concave-Convex Procedure • Self-Paced Learning • Experiments
Self-Paced Learning REMINDER Simultaneously estimate easiness and parameters Easiness is property of data sets, not single instances
Self-Paced Learning Start with an initial estimate w0 hi = maxhH wtT(xi,yi,h) Update Update wt+1 by solving a convex problem min ||w||2 + C∑i i wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i
Self-Paced Learning min ||w||2 + C∑i i wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i
Self-Paced Learning vi {0,1} min ||w||2 + C∑i vii wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Trivial Solution
Self-Paced Learning vi {0,1} min ||w||2 + C∑i vii - ∑ivi/K wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Large K Medium K Small K
Self-Paced Learning Alternating Convex Search Biconvex Problem vi [0,1] min ||w||2 + C∑i vii - ∑ivi/K wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Large K Medium K Small K
Self-Paced Learning Start with an initial estimate w0 hi = maxhH wtT(xi,yi,h) Update Update wt+1 by solving a convex problem min ||w||2 + C∑i vii - ∑i vi/K wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Decrease K K/
Outline • Latent Structural SVM • Concave-Convex Procedure • Self-Paced Learning • Experiments
Object Detection Input x - Image Output y Y Latent h - Box - 0/1 Loss Y = {“Bison”, “Deer”, ”Elephant”, “Giraffe”, “Llama”, “Rhino” } Feature (x,y,h) - HOG
Object Detection Mammals Dataset 271 images, 6 classes 90/10 train/test split 4 folds
Object Detection Self-Paced CCCP
Object Detection Self-Paced CCCP
Object Detection Self-Paced CCCP
Object Detection Self-Paced CCCP
Object Detection Objective value Test error
Handwritten Digit Recognition Input x - Image Output y Y Latent h - Rotation - 0/1 Loss MNIST Dataset Y = {0, 1, … , 9} Feature (x,y,h) - PCA + Projection
Handwritten Digit Recognition SPL C C C - Significant Difference
Handwritten Digit Recognition SPL C C C - Significant Difference
Handwritten Digit Recognition SPL C C C - Significant Difference
Handwritten Digit Recognition SPL C C C - Significant Difference
Motif Finding Input x - DNA Sequence Output y Y Y = {0, 1} Latent h - Motif Location - 0/1 Loss Feature (x,y,h) - Ng and Cardie, ACL 2002
Motif Finding UniProbe Dataset 40,000 sequences 50/50 train/test split 5 folds
Average Hamming Distance of Inferred Motifs Motif Finding SPL SPL SPL SPL
Motif Finding SPL Objective Value
Motif Finding SPL Test Error
Noun Phrase Coreference Input x - Nouns Output y - Clustering Latent h - Spanning Forest over Nouns Feature (x,y,h) - Yu and Joachims, ICML 2009
Noun Phrase Coreference MUC6 Dataset 60 documents 1 predefined fold 50/50 train/test split
Noun Phrase Coreference MITRE Loss Pairwise Loss - Significant Improvement - Significant Decrement
Noun Phrase Coreference SPL MITRE Loss SPL Pairwise Loss
Noun Phrase Coreference SPL MITRE Loss SPL Pairwise Loss
Summary • Automatic Self-Paced Learning • Concave-Biconvex Procedure • Generalization to other Latent models • Expectation-Maximization • E-step remains the same • M-step includes indicator variables vi Kumar, Packer and Koller, NIPS 2010