Ranking with High-Order and Missing Information

Ranking with High-Orderand Missing Information M. Pawan Kumar Ecole Centrale Paris AseemBehl Puneet Kumar PritishMohapatra C. V. Jawahar

PASCAL VOC “Jumping” Classification Processing Features Training Classifier

PASCAL VOC “Jumping” Classification Processing Features ✗ Training Classifier Think of a classifier !!!

PASCAL VOC “Jumping” Ranking Processing Features ✗ Training Classifier Think of a classifier !!!

Ranking vs. Classification Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Average Precision = 1

Ranking vs. Classification Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 = 0.92 = 0.81 Average Precision = 1 Accuracy = 1 = 0.67

Ranking vs. Classification Ranking is not the same as classification Average precision is not the same as accuracy Should we use 0-1 loss based classifiers? No (basic “machine learning” principle) !!

Outline • Structured Output SVM • Optimizing Average Precision • High-Order Information • Missing Information • Related Work Taskar, Guestrinand Koller, NIPS 2003; Tsochantaridis, Hofmann, Joachimsand Altun, ICML 2004

Structured Output SVM Input x Output y Joint Feature Ψ(x,y) Scoring function s(x,y;w)=wTΨ(x,y) Prediction y(w) = argmaxy s(x,y;w)

Parameter Estimation Training data {(xi,yi), i= 1,2,…,m} Loss function for i-th sample Δ(yi,yi(w)) Minimize the regularized sum of loss over training data Highly non-convex in w Regularization plays no role (overfitting may occur)

Parameter Estimation Training data {(xi,yi), i= 1,2,…,m} wTΨ(x,yi(w)) + Δ(yi,yi(w)) - wTΨ(x,yi(w)) - wTΨ(x,yi) ≤wTΨ(x,yi(w)) + Δ(yi,yi(w)) ≤maxy{ wTΨ(x,y) + - wTΨ(x,yi) Δ(yi,y) } Sensitive to regularization of w Convex

Parameter Estimation Training data {(xi,yi), i= 1,2,…,m} minw ||w||2 + C Σiξi for all y wTΨ(x,y) + Δ(yi,y) - wTΨ(x,yi)≤ ξi Quadratic program, which only requires cutting planes maxy{ wTΨ(x,y) + Δ(yi,y) }

Parameter Estimation Training data {(xi,yi), i= 1,2,…,m} minw ||w||2 + C Σiξi for all y s(x,y;w) + Δ(yi,y) - s(x,yi;w)≤ ξi Quadratic program, which only requires cutting planes maxy{ s(x,y;w) + Δ(yi,y) }

Recap • Problem Formulation • Input • Output • Joint Feature Vector or Scoring Function • Learning Formulation • Loss function (‘test’ evaluation criterion) • Optimization for Learning • Cutting plane (loss-augmented inference) • Prediction • Inference

Outline • Structured Output SVM • Optimizing Average Precision (AP-SVM) • High-Order Information • Missing Information • Related Work Yue, Finley, Radlinski and Joachims, SIGIR 2007

Problem Formulation Single Input X Φ(xi) for all i P Φ(xk) for all k  N

Problem Formulation Single Output R +1 if i is better ranked than k Rik = -1 if k is better ranked than i

Problem Formulation Scoring Function si(w) = wTΦ(xi) for all i P sk(w) = wTΦ(xk) for all k  N S(X,R;w) = Σi PΣk NRik(si(w) - sk(w))

Learning Formulation Loss Function Δ(R*,R) = 1 – AP of rank R

Optimization for Learning Cutting Plane Computation Optimal greedy algorithm is O(|P||N|) run time. Yue, Finley, Radlinski and Joachims, SIGIR 2007

Ranking Sort in decreasing order of individual score si(w) Yue, Finley, Radlinski and Joachims, SIGIR 2007

Experiments Images Classes PASCAL VOC 2011 Jumping 10 ranking tasks Phoning Playing Instrument Poselets Features Reading Riding Bike Riding Horse Cross-validation Running Taking Photo UsingComputer Walking

AP-SVM vs. SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 8 classes, tied in 2 classes

AP-SVM vs. SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP AP-SVM is statistically better in 3 classes SVM is statistically better in 0 classes

Outline • Structured Output SVM • Optimizing Average Precision • High-Order Information (M4-AP-SVM) • Missing Information • Related Work Kumar, Behl, Jawahar and Kumar, Submitted

High-Order Information • People perform similar actions • People strike similar poses • Objects are of same/similar sizes • “Friends” have similar habits • How can we use them for ranking? classification

Problem Formulation x Input x = {x1,x2,x3} Output y = {-1,+1}3 Ψ1(x,y) Unary Features Ψ(x,y) = Ψ2(x,y) Pairwise Features

Learning Formulation x Input x = {x1,x2,x3} Output y = {-1,+1}3 Δ(y*,y) = Fraction of incorrectly classified persons

Optimization for Learning x Input x = {x1,x2,x3} Output y = {-1,+1}3 maxywTΨ(x,y) + Δ(y*,y) Graph Cuts (if supermodular) LP Relaxation, or exhaustive search

Classification x Input x = {x1,x2,x3} Output y = {-1,+1}3 maxywTΨ(x,y) Graph Cuts (if supermodular) LP Relaxation, or exhaustive search

Ranking? x Input x = {x1,x2,x3} Output y = {-1,+1}3 Use difference of max-marginals

Max-Marginal for Positive Class x Input x = {x1,x2,x3} Output y = {-1,+1}3 Best possible score when person i is positive mm+(i;w) = maxy,yi=+1wTΨ(x,y) Convex in w

Max-Marginal for Negative Class x Input x = {x1,x2,x3} Output y = {-1,+1}3 Best possible score when person i is negative mm-(i;w) = maxy,yi=-1wTΨ(x,y) Convex in w

Ranking x Input x = {x1,x2,x3} Output y = {-1,+1}3 HOB-SVM Use difference of max-marginals si(w) = mm+(i;w) – mm-(i;w) Difference-of-Convex in w

Ranking Why not optimize AP directly? Max-Margin Max-Marginal AP-SVM M4-AP-SVM si(w) = mm+(i;w) – mm-(i;w)

Problem Formulation Single Input X Φ(xi) for all i P Φ(xk) for all k  N

Problem Formulation Single Input R +1 if i is better ranked than k Rik = -1 if k is better ranked than i

Problem Formulation Scoring Function si(w) = mm+(i;w) – mm-(i;w) for all i P sk(w) = mm+(k;w) – mm-(k;w) for all k  N S(X,R;w) = Σi PΣk NRik(si(w) - sk(w))

Learning Formulation Loss Function Δ(R*,R) = 1 – AP of rank R

Optimization for Learning Difference-of-convex program Very efficient CCCP Linearization step by Dynamic Graph Cuts Kohli and Torr, ECCV 2006 Update step equivalent to AP-SVM Kumar, Behl, Jawahar and Kumar, Submitted

Ranking Sort in decreasing order of individual score si(w)

Experiments Images Classes PASCAL VOC 2011 Jumping 10 ranking tasks Phoning Playing Instrument Poselets Features Reading Riding Bike Riding Horse Cross-validation Running Taking Photo UsingComputer Walking

HOB-SVM vs. AP-SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 4, worse in 3 and tied in 3 classes

HOB-SVM vs. AP-SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP HOB-SVM is statistically better in 0 classes AP-SVM is statistically better in 0 classes

M4-AP-SVM vs. AP-SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 7, worse in 2 and tied in 1 class

M4-AP-SVM vs. AP-SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP M4-AP-SVM is statistically better in 4 classes AP-SVM is statistically better in 0 classes

Outline • Structured Output SVM • Optimizing Average Precision • High-Order Information • Missing Information (Latent-AP-SVM) • Related Work Behl, Jawahar and Kumar, CVPR 2014

Ranking with High-Order and Missing Information

Ranking with High-Order and Missing Information

Presentation Transcript

Missing Information About Vaccine Safety

Missing Information

Boltzmann, Shannon, and (Missing) Information

High Order Languages

Working with Missing Values

Ranking in Information Retrieval Systems

Information The Missing piece

Information Density and Word Order

Missing Information

Guide to Handling Missing Information

Purchase Order Information

Missing and Extra Information

Mixture model clustering for mixed data with missing information

Mutual Information Scheduling for Ranking

Order information

Data Processing with Missing Information

Finding missing entropy with neutrons in the “hidden order”

Complete the Missing information

GPA and High School Ranking

Optimizing Average Precision (Ranking) Incorporating High -Order Information

Extranet - Order information

Increase Traffic And Ranking With SEM