10 likes | 113 Views
Small K. Medium K. Large K. Objective. Training Error (%). Test Error (%). CCCP. SPL. h. x. y. h. x. 1 vs. 7. 2 vs. 7. 3 vs. 8. 8 vs. 9. Iteration 1. Iteration 3. Iteration 5. Iteration 7. Self-Paced Learning for Latent Variable Models.
E N D
Small K Medium K Large K Objective Training Error (%) Test Error (%) CCCP SPL h x y h x 1 vs. 7 2 vs. 7 3 vs. 8 8 vs. 9 Iteration 1 Iteration 3 Iteration 5 Iteration 7 Self-Paced Learning for Latent Variable Models M. Pawan Kumar, Ben Packer, and Daphne Koller Self-Paced Learning Experiments Aim: To learn an accurate set of parameters for latent variable models Restrict learning to a model-dependent “easy” set of samples General form of objective: Introduce indicator of “easiness” vi: K determines threshold for a set being easy, which is annealed over successive iterations until all samples used Compare Self-Paced Learning to standard CCCP as in [2] Object Classification – Mammals Dataset Motivation Image label y is object class only, h is bounding box Ψ(xi,yi,hi) is HOG features in bounding box (offset by class) minwr(w) + i li(w) • Intuitions from Human Learning: • all information at once may be confusing => bad local minima • start with “easy” examples the learner is prepared to handle Objective Training Error (%) Test Error (%) minwr(w) + ivili(w) – 1/K ivi Standard Learning Self-Paced Learning CCCP ?? Got it! Okay… SPL Motif Finding – UniProbe Dataset Bengio et al. [1]: user-specified ordering “Self-paced” schedule of examples is automatically set by learner x is DNA sequence, h is motif position, y is binding affinity • “easy for human” “easy for computer” • “easy for Learner A” “easy for Learner B” • task-specific • onerous on user Optimization Initialize K to be large Iterate: Run inference over h Alternatively update w and v: v set by sorting li(w), comparing to threshold 1/K Perform normal update for w over subset of data Until convergence Anneal K K/μ Until all vi = 1, cannot reduce objective within tolerance Latent Variable Models x : input or observed variables y : output or observed variables h : hidden/latent variables Handwriting Recognition - MNIST x is raw image, y is digit, h is image rotation, use linear kernel y = “Deer” Easier subsets in early iterations, avoids learning from samples whose hidden variables are imputed incorrectly h = Bounding Box Learning Latent Variable Models Goal: Given D = {(x1,y1), …, (xn,yn)}, learn parameters w. Noun Phrase Coreference – MUC6 x consists of pairwise features between pairs of nouns y is a clustering of nouns h specifies a forest of nouns s.t. each tree is a cluster of nouns Expectation-Maximization for Maximum Likelihood • Maximize log likelihood: • maxwi log P(xi,yi;w) • Iterate: • Find expected value of hidden variables using current w • Update w to maximize log likelihood subject • to this expectation Object Classification Latent Struct SVM [2] • Minimize upper bound on risk minw ||w||2 + C·i maxy’,h’ [w·Ψ(xi,y’,h’) + Δ(yi,y’,h’)] • - C·i maxh [w·Ψ (xi,yi,h)] • Iterate: • Impute hidden variables • Update weights to minimize upper bound on risk given these hidden variables Discussion • Self-paced strategy outperforms state of the art • Global solvers for biconvex optimization may improve accuracy • Method is ideally suited to handle multiple levels of annotations • [1] Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In ICML, 2009. • [2] C.-N. Yu and T. Joachims. Learning structural SVMs with latent variables. In ICML, 2009. Motif Finding