260 likes | 421 Views
Learning Structural SVMs with Latent Variables. Xionghao Liu. Annotation Mismatch. Action Classification. x. h. Input x. Annotation y. Latent h. y = “jumping”. Desired output during test time is y. Mismatch between desired and available annotations.
E N D
Learning Structural SVMs with Latent Variables Xionghao Liu
Annotation Mismatch Action Classification x h Input x Annotation y Latent h y = “jumping” Desired outputduring test time is y Mismatch between desired and available annotations Exact value of latent variable is not “important”
Outline – Annotation Mismatch • Latent SVM • Optimization • Practice • Extensions Andrews et al., NIPS 2001; Smola et al., AISTATS 2005; Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009
Weakly Supervised Data x Input x h Output y {-1,+1} Hidden h y = +1
Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Ψ(x,y,h) y = +1
Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Φ(x,h) Ψ(x,+1,h) = 0 y = +1
Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector 0 Ψ(x,-1,h) = Φ(x,h) y = +1
Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Ψ(x,y,h) y = +1 Score f : Ψ(x,y,h) (-∞, +∞) Optimize score over all possible y and h
Latent SVM Scoring function Parameters wTΨ(x,y,h) Prediction y(w),h(w) = argmaxy,hwTΨ(x,y,h)
Learning Latent SVM Training data {(xi,yi), i= 1,2,…,n} (yi, yi(w)) Σi minw Empirical risk minimization No restriction on the loss function Annotation mismatch
Learning Latent SVM Find a regularization-sensitive upper bound (yi, yi(w)) Σi minw Empirical risk minimization Non-convex Parameters cannot be regularized
Learning Latent SVM (yi, yi(w)) • wT(xi,yi(w),hi(w)) + • -wT(xi,yi(w),hi(w))
Learning Latent SVM (yi, yi(w)) • wT(xi,yi(w),hi(w)) + • -maxhiwT(xi,yi,hi) y(w),h(w) = argmaxy,hwTΨ(x,y,h)
Learning Latent SVM • minw ||w||2 + C Σiξi (yi, y) • maxy,h • wT(xi,y,h) + • ≤ ξi • -maxhiwT(xi,yi,hi) Parameters can be regularized Is this also convex?
Learning Latent SVM • minw ||w||2 + C Σiξi (yi, y) • maxy,h • wT(xi,y,h) + • ≤ ξi • -maxhiwT(xi,yi,hi) Convex - Convex Difference of convex (DC) program
Recap Scoring function wTΨ(x,y,h) Prediction y(w),h(w) = argmaxy,hwTΨ(x,y,h) Learning minw ||w||2 + C Σiξi wTΨ(xi,y,h) + Δ(yi,y) - maxhiwTΨ(xi,yi,hi)≤ ξi
Outline – Annotation Mismatch • Latent SVM • Optimization • Practice • Extensions
Learning Latent SVM • minw ||w||2 + C Σiξi (yi, y) • maxy,h • wT(xi,y,h) + • ≤ ξi • -maxhiwT(xi,yi,hi) Difference of convex (DC) program
Concave-Convex Procedure + Linear upper-bound of concave part • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +
Concave-Convex Procedure + Optimize the convex upper bound • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +
Concave-Convex Procedure + Linear upper-bound of concave part • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +
Concave-Convex Procedure + Until Convergence • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +
Concave-Convex Procedure + Linear upper bound? • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +
Linear Upper Bound • -maxhiwT(xi,yi,hi) Current estimate = wt • hi* = argmaxhiwtT(xi,yi,hi) • -wT(xi,yi,hi*) • ≥ -maxhiwT(xi,yi,hi)
CCCP for Latent SVM Start with an initial estimate w0 hi* = argmaxhiHwtT(xi,yi,hi) Update Update wt+1as the ε-optimal solution of min ||w||2 + C∑i i wT(xi,yi,hi*) - wT(xi,y,h) ≥ (yi, y) - i Repeat until convergence