570 likes | 698 Views
Ranking with High-Order and Missing Information. M. Pawan Kumar Ecole Centrale Paris. Aseem Behl. Puneet Kumar. Pritish Mohapatra. C. V. Jawahar. PASCAL VOC. “Jumping” Classification. Processing. Features. Training. Classifier. PASCAL VOC. “Jumping” Classification. Processing.
E N D
Ranking with High-Orderand Missing Information M. Pawan Kumar Ecole Centrale Paris AseemBehl Puneet Kumar PritishMohapatra C. V. Jawahar
PASCAL VOC “Jumping” Classification Processing Features Training Classifier
PASCAL VOC “Jumping” Classification Processing Features ✗ Training Classifier Think of a classifier !!!
PASCAL VOC “Jumping” Ranking Processing Features ✗ Training Classifier Think of a classifier !!!
Ranking vs. Classification Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Average Precision = 1
Ranking vs. Classification Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 = 0.92 = 0.81 Average Precision = 1 Accuracy = 1 = 0.67
Ranking vs. Classification Ranking is not the same as classification Average precision is not the same as accuracy Should we use 0-1 loss based classifiers? No (basic “machine learning” principle) !!
Outline • Structured Output SVM • Optimizing Average Precision • High-Order Information • Missing Information • Related Work Taskar, Guestrinand Koller, NIPS 2003; Tsochantaridis, Hofmann, Joachimsand Altun, ICML 2004
Structured Output SVM Input x Output y Joint Feature Ψ(x,y) Scoring function s(x,y;w)=wTΨ(x,y) Prediction y(w) = argmaxy s(x,y;w)
Parameter Estimation Training data {(xi,yi), i= 1,2,…,m} Loss function for i-th sample Δ(yi,yi(w)) Minimize the regularized sum of loss over training data Highly non-convex in w Regularization plays no role (overfitting may occur)
Parameter Estimation Training data {(xi,yi), i= 1,2,…,m} wTΨ(x,yi(w)) + Δ(yi,yi(w)) - wTΨ(x,yi(w)) - wTΨ(x,yi) ≤wTΨ(x,yi(w)) + Δ(yi,yi(w)) ≤maxy{ wTΨ(x,y) + - wTΨ(x,yi) Δ(yi,y) } Sensitive to regularization of w Convex
Parameter Estimation Training data {(xi,yi), i= 1,2,…,m} minw ||w||2 + C Σiξi for all y wTΨ(x,y) + Δ(yi,y) - wTΨ(x,yi)≤ ξi Quadratic program, which only requires cutting planes maxy{ wTΨ(x,y) + Δ(yi,y) }
Parameter Estimation Training data {(xi,yi), i= 1,2,…,m} minw ||w||2 + C Σiξi for all y s(x,y;w) + Δ(yi,y) - s(x,yi;w)≤ ξi Quadratic program, which only requires cutting planes maxy{ s(x,y;w) + Δ(yi,y) }
Recap • Problem Formulation • Input • Output • Joint Feature Vector or Scoring Function • Learning Formulation • Loss function (‘test’ evaluation criterion) • Optimization for Learning • Cutting plane (loss-augmented inference) • Prediction • Inference
Outline • Structured Output SVM • Optimizing Average Precision (AP-SVM) • High-Order Information • Missing Information • Related Work Yue, Finley, Radlinski and Joachims, SIGIR 2007
Problem Formulation Single Input X Φ(xi) for all i P Φ(xk) for all k N
Problem Formulation Single Output R +1 if i is better ranked than k Rik = -1 if k is better ranked than i
Problem Formulation Scoring Function si(w) = wTΦ(xi) for all i P sk(w) = wTΦ(xk) for all k N S(X,R;w) = Σi PΣk NRik(si(w) - sk(w))
Learning Formulation Loss Function Δ(R*,R) = 1 – AP of rank R
Optimization for Learning Cutting Plane Computation Optimal greedy algorithm is O(|P||N|) run time. Yue, Finley, Radlinski and Joachims, SIGIR 2007
Ranking Sort in decreasing order of individual score si(w) Yue, Finley, Radlinski and Joachims, SIGIR 2007
Experiments Images Classes PASCAL VOC 2011 Jumping 10 ranking tasks Phoning Playing Instrument Poselets Features Reading Riding Bike Riding Horse Cross-validation Running Taking Photo UsingComputer Walking
AP-SVM vs. SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 8 classes, tied in 2 classes
AP-SVM vs. SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP AP-SVM is statistically better in 3 classes SVM is statistically better in 0 classes
Outline • Structured Output SVM • Optimizing Average Precision • High-Order Information (M4-AP-SVM) • Missing Information • Related Work Kumar, Behl, Jawahar and Kumar, Submitted
High-Order Information • People perform similar actions • People strike similar poses • Objects are of same/similar sizes • “Friends” have similar habits • How can we use them for ranking? classification
Problem Formulation x Input x = {x1,x2,x3} Output y = {-1,+1}3 Ψ1(x,y) Unary Features Ψ(x,y) = Ψ2(x,y) Pairwise Features
Learning Formulation x Input x = {x1,x2,x3} Output y = {-1,+1}3 Δ(y*,y) = Fraction of incorrectly classified persons
Optimization for Learning x Input x = {x1,x2,x3} Output y = {-1,+1}3 maxywTΨ(x,y) + Δ(y*,y) Graph Cuts (if supermodular) LP Relaxation, or exhaustive search
Classification x Input x = {x1,x2,x3} Output y = {-1,+1}3 maxywTΨ(x,y) Graph Cuts (if supermodular) LP Relaxation, or exhaustive search
Ranking? x Input x = {x1,x2,x3} Output y = {-1,+1}3 Use difference of max-marginals
Max-Marginal for Positive Class x Input x = {x1,x2,x3} Output y = {-1,+1}3 Best possible score when person i is positive mm+(i;w) = maxy,yi=+1wTΨ(x,y) Convex in w
Max-Marginal for Negative Class x Input x = {x1,x2,x3} Output y = {-1,+1}3 Best possible score when person i is negative mm-(i;w) = maxy,yi=-1wTΨ(x,y) Convex in w
Ranking x Input x = {x1,x2,x3} Output y = {-1,+1}3 HOB-SVM Use difference of max-marginals si(w) = mm+(i;w) – mm-(i;w) Difference-of-Convex in w
Ranking Why not optimize AP directly? Max-Margin Max-Marginal AP-SVM M4-AP-SVM si(w) = mm+(i;w) – mm-(i;w)
Problem Formulation Single Input X Φ(xi) for all i P Φ(xk) for all k N
Problem Formulation Single Input R +1 if i is better ranked than k Rik = -1 if k is better ranked than i
Problem Formulation Scoring Function si(w) = mm+(i;w) – mm-(i;w) for all i P sk(w) = mm+(k;w) – mm-(k;w) for all k N S(X,R;w) = Σi PΣk NRik(si(w) - sk(w))
Learning Formulation Loss Function Δ(R*,R) = 1 – AP of rank R
Optimization for Learning Difference-of-convex program Very efficient CCCP Linearization step by Dynamic Graph Cuts Kohli and Torr, ECCV 2006 Update step equivalent to AP-SVM Kumar, Behl, Jawahar and Kumar, Submitted
Ranking Sort in decreasing order of individual score si(w)
Experiments Images Classes PASCAL VOC 2011 Jumping 10 ranking tasks Phoning Playing Instrument Poselets Features Reading Riding Bike Riding Horse Cross-validation Running Taking Photo UsingComputer Walking
HOB-SVM vs. AP-SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 4, worse in 3 and tied in 3 classes
HOB-SVM vs. AP-SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP HOB-SVM is statistically better in 0 classes AP-SVM is statistically better in 0 classes
M4-AP-SVM vs. AP-SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 7, worse in 2 and tied in 1 class
M4-AP-SVM vs. AP-SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP M4-AP-SVM is statistically better in 4 classes AP-SVM is statistically better in 0 classes
Outline • Structured Output SVM • Optimizing Average Precision • High-Order Information • Missing Information (Latent-AP-SVM) • Related Work Behl, Jawahar and Kumar, CVPR 2014