1 / 45

Curriculum Learning for Latent Structural SVM

Curriculum Learning for Latent Structural SVM. (under submission). M. Pawan Kumar. Benjamin Packer. Daphne Koller. Aim. To learn accurate parameters for latent structural SVM. Input x. Output y  Y. Hidden Variable h  H. “Deer”.

vinnie
Download Presentation

Curriculum Learning for Latent Structural SVM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Curriculum Learning forLatent Structural SVM (under submission) M. Pawan Kumar Benjamin Packer Daphne Koller

  2. Aim To learn accurate parameters for latent structural SVM Input x Output y Y Hidden Variable h  H “Deer” Y = {“Bison”, “Deer”, ”Elephant”, “Giraffe”, “Llama”, “Rhino” }

  3. Aim To learn accurate parameters for latent structural SVM Feature (x,y,h) (HOG, BoW) Parameters w (y*,h*) = maxyY,hH wT(x,y,h)

  4. Motivation Math is for losers !! Real Numbers Imaginary Numbers eiπ+1 = 0 FAILURE … BAD LOCAL MINIMUM

  5. Motivation Euler was a Genius!! Real Numbers Imaginary Numbers eiπ+1 = 0 SUCCESS … GOOD LOCAL MINIMUM Curriculum Learning: Bengio et al, ICML 2009

  6. Motivation Start with “easy” examples, then consider “hard” ones Simultaneously estimate easiness and parameters Easiness is property of data sets, not single instances Easy vs. Hard Expensive Easy for human  Easy for machine

  7. Outline • Latent Structural SVM • Concave-Convex Procedure • Curriculum Learning • Experiments

  8. Latent Structural SVM Felzenszwalb et al, 2008, Yu and Joachims, 2009 Training samples xi Ground-truth label yi Loss Function (yi, yi(w), hi(w))

  9. Latent Structural SVM (yi(w),hi(w)) = maxyY,hH wT(x,y,h) min ||w||2 + C∑i(yi, yi(w), hi(w)) Non-convex Objective Minimize an upper bound

  10. Latent Structural SVM (yi(w),hi(w)) = maxyY,hH wT(x,y,h) min ||w||2 + C∑i i maxhiwT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Still non-convex Difference of convex CCCP Algorithm - converges to a local minimum

  11. Outline • Latent Structural SVM • Concave-Convex Procedure • Curriculum Learning • Experiments

  12. Concave-Convex Procedure Start with an initial estimate w0 hi = maxhH wtT(xi,yi,h) Update Update wt+1 by solving a convex problem min ||w||2 + C∑i i wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i

  13. Concave-Convex Procedure Looks at all samples simultaneously “Hard” samples will cause confusion Start with “easy” samples, then consider “hard” ones

  14. Outline • Latent Structural SVM • Concave-Convex Procedure • Curriculum Learning • Experiments

  15. Curriculum Learning REMINDER Simultaneously estimate easiness and parameters Easiness is property of data sets, not single instances

  16. Curriculum Learning Start with an initial estimate w0 hi = maxhH wtT(xi,yi,h) Update Update wt+1 by solving a convex problem min ||w||2 + C∑i i wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i

  17. Curriculum Learning min ||w||2 + C∑i i wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i

  18. Curriculum Learning vi {0,1} min ||w||2 + C∑i vii wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Trivial Solution

  19. Curriculum Learning vi {0,1} min ||w||2 + C∑i vii - ∑ivi/K wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Large K Medium K Small K

  20. Curriculum Learning Biconvex Problem vi [0,1] min ||w||2 + C∑i vii - ∑ivi/K wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Large K Medium K Small K

  21. Curriculum Learning Start with an initial estimate w0 hi = maxhH wtT(xi,yi,h) Update Update wt+1 by solving a convex problem min ||w||2 + C∑i vii - ∑i vi/K wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Decrease K  K/

  22. Outline • Latent Structural SVM • Concave-Convex Procedure • Curriculum Learning • Experiments

  23. Object Detection Input x - Image Output y Y Latent h - Box  - 0/1 Loss Y = {“Bison”, “Deer”, ”Elephant”, “Giraffe”, “Llama”, “Rhino” } Feature (x,y,h) - HOG

  24. Object Detection Mammals Dataset 271 images, 6 classes 90/10 train/test split 5 folds

  25. Object Detection Curriculum CCCP

  26. Object Detection Curriculum CCCP

  27. Object Detection Curriculum CCCP

  28. Object Detection Curriculum CCCP

  29. Object Detection Objective value Test error

  30. Handwritten Digit Recognition Input x - Image Output y Y Latent h - Rotation  - 0/1 Loss MNIST Dataset Y = {0, 1, … , 9} Feature (x,y,h) - PCA + Projection

  31. Handwritten Digit Recognition C C C - Significant Difference

  32. Handwritten Digit Recognition C C C - Significant Difference

  33. Handwritten Digit Recognition C C C - Significant Difference

  34. Handwritten Digit Recognition C C C - Significant Difference

  35. Motif Finding Input x - DNA Sequence Output y Y Y = {0, 1} Latent h - Motif Location  - 0/1 Loss Feature (x,y,h) - Ng and Cardie, ACL 2002

  36. Motif Finding UniProbe Dataset 40,000 sequences 50/50 train/test split 5 folds

  37. Average Hamming Distance of Inferred Motifs Motif Finding

  38. Motif Finding Objective Value

  39. Motif Finding Test Error

  40. Noun Phrase Coreference Input x - Nouns Output y - Clustering Latent h - Spanning Forest over Nouns Feature (x,y,h) - Yu and Joachims, ICML 2009

  41. Noun Phrase Coreference MUC6 Dataset 60 documents 1 predefined fold 50/50 train/test split

  42. Noun Phrase Coreference MITRE Loss Pairwise Loss - Significant Improvement - Significant Decrement

  43. Noun Phrase Coreference MITRE Loss Pairwise Loss

  44. Noun Phrase Coreference MITRE Loss Pairwise Loss

  45. Summary • Automatic Curriculum Learning • Concave-Biconvex Procedure • Generalization to other Latent models • Expectation-Maximization • E-step remains the same • M-step includes indicator variables vi

More Related