810 likes | 934 Views
Loss-based Learning with Weak Supervision. M. Pawan Kumar. About the Talk. Methods that use latent structured SVM A little math-y Initial stages. Outline. Latent SSVM Ranking Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI.
E N D
Loss-based Learning with Weak Supervision M. Pawan Kumar
About the Talk • Methods that use latent structured SVM • A little math-y • Initial stages
Outline • Latent SSVM • Ranking • Brain Activation Delays in M/EEG • Probabilistic Segmentation of MRI Andrews et al., NIPS 2001; Smola et al., AISTATS 2005; Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009
Weakly Supervised Data x Input x h Output y {-1,+1} Hidden h y = +1
Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Ψ(x,y,h) y = +1
Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Φ(x,h) Ψ(x,+1,h) = y = +1 0
Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector 0 Ψ(x,-1,h) = y = +1 Φ(x,h)
Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Ψ(x,y,h) y = +1 Score f : Ψ(x,y,h) (-∞, +∞) Optimize score over all possible y and h
Latent SSVM Scoring function wTΨ(x,y,h) Prediction y(w),h(w) = argmaxy,hwTΨ(x,y,h)
Learning Latent SSVM Training data {(xi,yi), i= 1,2,…,n} w* = argminwΣiΔ(yi,yi(w)) Minimize empirical risk specified by loss function Highly non-convex in w Cannot regularize w to prevent overfitting
Learning Latent SSVM Training data {(xi,yi), i= 1,2,…,n} wTΨ(x,yi(w),hi(w)) + Δ(yi,yi(w)) - wTΨ(x,yi(w),hi(w)) ≤ wTΨ(x,yi(w),hi(w)) + Δ(yi,yi(w)) - maxhiwTΨ(x,yi,hi) ≤ maxy,h{wTΨ(x,y,h) + Δ(yi,y)} - maxhiwTΨ(x,yi,hi)
Learning Latent SSVM Training data {(xi,yi), i= 1,2,…,n} minw ||w||2 + C Σiξi wTΨ(xi,y,h) + Δ(yi,y) - maxhiwTΨ(xi,yi,hi)≤ ξi Difference-of-convex program in w Local minimum or saddle point solution (CCCP)
CCCP Start with an initial estimate of w Impute hidden variables Loss independent hi*= argmaxhwTΨ(xi,yi,h) Update w Loss dependent minw ||w||2 + C Σiξi wTΨ(xi,y,h) + Δ(yi,y) - wTΨ(xi,yi,hi*)≤ ξi Repeat until convergence
Recap Scoring function wTΨ(x,y,h) Prediction y(w),h(w) = argmaxy,hwTΨ(x,y,h) Learning minw ||w||2 + C Σiξi wTΨ(xi,y,h) + Δ(yi,y) - maxhiwTΨ(xi,yi,hi)≤ ξi
Outline • Latent SSVM • Ranking • Brain Activation Delays in M/EEG • Probabilistic Segmentation of MRI Joint Work with AseemBehl and C. V. Jawahar
Ranking Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Average Precision = 1
Ranking Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Average Precision = 1 Accuracy = 1 Average Precision = 0.92 Average Precision = 0.81 Accuracy = 0.67
Ranking During testing, AP is frequently used During training, a surrogate loss is used Contradictory to loss-based learning Optimize AP directly
Outline • Latent SSVM • Ranking • Supervised Learning • Weakly Supervised Learning • Latent AP-SVM • Experiments • Brain Activation Delays in M/EEG • Probabilistic Segmentation of MRI Yue, Finley, Radlinski and Joachims, 2007
Supervised Learning - Input P N = {HP,HN} Training images X Bounding boxes H
Supervised Learning - Output Ranking matrix Y +1 if i is better ranked than k Yik = -1 if k is better ranked than i 0 if i and k are ranked equally Optimal ranking Y*
SSVM Formulation Joint feature vector ΣiPΣkNYik (Φ(xi,hi)-Φ(xk,hk)) Ψ(X,Y,{HP,HN}) = |P||N| Scoring function wTΨ(X,Y,{HP,HN})
Prediction using SSVM Y(w) = argmaxYwTΨ(X,Y, {HP,HN}) Sort by value of sample score wTΦ(xi,hi) Same as standard binary SVM
Learning SSVM minw Δ(Y*,Y(w)) Loss = 1 – AP of prediction
Learning SSVM wTΨ(X,Y(w),{HP,HN}) + Δ(Y*,Y(w)) - wTΨ(X,Y(w),{HP,HN})
Learning SSVM wTΨ(X,Y(w),{HP,HN}) + Δ(Y*,Y(w)) - wTΨ(X,Y*,{HP,HN})
Learning SSVM minw ||w||2+ C ξ wTΨ(X,Y,{HP,HN}) + maxY Δ(Y*,Y) - wTΨ(X,Y*,{HP,HN}) ≤ ξ
Learning SSVM minw ||w||2+ C ξ wTΨ(X,Y,{HP,HN}) + maxY Δ(Y*,Y) - wTΨ(X,Y*,{HP,HN}) ≤ ξ Loss Augmented Inference
Loss Augmented Inference Rank 1 Rank 2 Rank 3 Rank positives according to sample scores
Loss Augmented Inference Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Rank negatives according to sample scores
Loss Augmented Inference Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Slide best negative to a higher rank Terminate after considering last negative Continue until score stops increasing Slide next negative to a higher rank Continue until score stops increasing Optimal loss augmented inference
Recap Scoring function wTΨ(X,Y,{HP,HN}) Prediction Y(w) = argmaxYwTΨ(X,Y, {HP,HN}) Learning Using optimal loss augmented inference
Outline • Latent SSVM • Ranking • Supervised Learning • Weakly Supervised Learning • Latent AP-SVM • Experiments • Brain Activation Delays in M/EEG • Probabilistic Segmentation of MRI
Weakly Supervised Learning - Input Training images X
Weakly Supervised Learning - Latent Bounding boxes HP Training images X All bounding boxes in negative images are negative
Intuitive Prediction Procedure Select the best bounding boxes in all images
Intuitive Prediction Procedure Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Rank them according to their sample scores
Weakly Supervised Learning - Output Ranking matrix Y +1 if i is better ranked than k Yik = -1 if k is better ranked than i 0 if i and k are ranked equally Optimal ranking Y*
Latent SSVM Formulation Joint feature vector ΣiPΣkNYik (Φ(xi,hi)-Φ(xk,hk)) Ψ(X,Y,{HP,HN}) = |P||N| Scoring function wTΨ(X,Y,{HP,HN})
Prediction using Latent SSVM maxY,HwTΨ(X,Y, {HP,HN})
Prediction using Latent SSVM maxY,HwTΣiPΣkNYik (Φ(xi,hi)-Φ(xk,hk)) Choose best bounding box for positives Choose worst bounding box for negatives Not what we wanted
Learning Latent SSVM minw Δ(Y*,Y(w)) Loss = 1 – AP of prediction
Learning Latent SSVM wTΨ(X,Y(w),{HP(w),HN(w)}) + Δ(Y*,Y(w)) - wTΨ(X,Y(w),{HP(w),HN(w)})
Learning Latent SSVM wTΨ(X,Y(w),{HP(w),HN(w)}) + Δ(Y*,Y(w)) - wTΨ(X,Y*,{HP,HN}) maxH
Learning Latent SSVM minw ||w||2+ C ξ wTΨ(X,Y,{HP,HN}) + maxY,H Δ(Y*,Y) - wTΨ(X,Y*,{HP,HN}) ≤ ξ maxH
Learning Latent SSVM minw ||w||2+ C ξ wTΨ(X,Y,{HP,HN}) + maxY,H Δ(Y*,Y) - wTΨ(X,Y*,{HP,HN}) ≤ ξ maxH Loss Augmented Inference Cannot be solved optimally
Recap Unintuitive prediction Unintuitive objective function Non-optimal loss augmented inference Can we do better?
Outline • Latent SSVM • Ranking • Supervised Learning • Weakly Supervised Learning • Latent AP-SVM • Experiments • Brain Activation Delays in M/EEG • Probabilistic Segmentation of MRI
Latent AP-SVM Formulation Joint feature vector ΣiPΣkNYik (Φ(xi,hi)-Φ(xk,hk)) Ψ(X,Y,{HP,HN}) = |P||N| Scoring function wTΨ(X,Y,{HP,HN})
Prediction using Latent AP-SSVM Choose best bounding box for all samples hi(w) = argmaxhwTΦ(xi,h) Optimize over the ranking Y(w) = argmaxYwTΨ(X,Y, {HP(w),HN(w)}) Sort by sample scores