1 / 28

A Support Vector Method for Optimizing Average Precision

A Support Vector Method for Optimizing Average Precision. SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski, Thorsten Joachims (Cornell University). Motivation. Learn to Rank Documents Optimize for IR performance measures

Faraday
Download Presentation

A Support Vector Method for Optimizing Average Precision

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski, Thorsten Joachims (Cornell University)

  2. Motivation • Learn to Rank Documents • Optimize for IR performance measures • Mean Average Precision • Leverage Structural SVMs [Tsochantaridis et al. 2005]

  3. MAP vs Accuracy • Average precision is the average of the precision scores at the rank locations of each relevant document. • Ex: has average precision • Mean Average Precision (MAP) is the mean of the Average Precision scores for a group of queries. • A machine learning algorithm optimizing for Accuracy might learn a very different model than optimizing for MAP. • Ex: has average precision of about 0.64, but has a max accuracy of 0.8 vs 0.6 in above ranking.

  4. Recent Related Work Greedy & Local Search • [Metzler & Croft 2005] – optimized for MAP, used gradient descent, expensive for large number of features. • [Caruana et al. 2004] – iteratively built an ensemble to greedily improve arbitrary performance measures. Surrogate Performance Measures • [Burges et al. 2005] – used neural nets optimizing for cross entropy. • [Cao et al. 2006] – used SVMs optimizing for modified ROC-Area. Relaxations • [Xu & Li2007] – used Boosting with exponential loss relaxation

  5. Conventional SVMs • Input examples denoted by x (high dimensional point) • Output targets denoted by y (either +1 or -1) • SVMs learns a hyperplane w, predictions are sign(wTx) • Training involves finding w which minimizes • subject to • The sum of slacks upper bounds the accuracy loss

  6. Conventional SVMs • Input examples denoted by x (high dimensional point) • Output targets denoted by y (either +1 or -1) • SVMs learns a hyperplane w, predictions are sign(wTx) • Training involves finding w which minimizes • subject to • The sum of slacks upper bounds the accuracy loss

  7. Adapting to Average Precision • Let x denote the set of documents/query examples for a query • Let y denote a (weak) ranking (each yij2 {-1,0,+1}) • Same objective function: • Constraints are defined for each incorrect labeling y’ over the set of documents x. • Joint discriminant score for the correct labeling at least as large as incorrect labeling plus the performance loss.

  8. Adapting to Average Precision • Maximize subject to where and • Sum of slacks upper bound MAP loss. • After learning w, a prediction is made by sorting on wTxi

  9. Adapting to Average Precision • Maximize subject to where and • Sum of slacks upper bound MAP loss. • After learning w, a prediction is made by sorting on wTxi

  10. Too Many Constraints! • For Average Precision, the true labeling is a ranking where the relevant documents are all ranked in the front, e.g., • An incorrect labeling would be any other ranking, e.g., • This ranking has Average Precision of about 0.8 with (y,y’) ¼ 0.2 • Exponential number of rankings, thus an exponential number of constraints!

  11. Structural SVM Training • STEP 1: Solve the SVM objective function using only the current working set of constraints. • STEP 2: Using the model learned in STEP 1, find the most violated constraint from the exponential set of constraints. • STEP 3: If the constraint returned in STEP 2 is more violated than the most violated constraint the working set by some small constant, add that constraint to the working set. • Repeat STEP 1-3 until no additional constraints are added. Return the most recent model that was trained in STEP 1. STEP 1-3 is guaranteed to loop for at most a polynomial number of iterations. [Tsochantaridis et al. 2005]

  12. Original SVM Problem Exponential constraints Most are dominated by a small set of “important” constraints Structural SVM Approach Repeatedly finds the next most violated constraint… …until set of constraints is a good approximation. Illustrative Example

  13. Original SVM Problem Exponential constraints Most are dominated by a small set of “important” constraints Structural SVM Approach Repeatedly finds the next most violated constraint… …until set of constraints is a good approximation. Illustrative Example

  14. Original SVM Problem Exponential constraints Most are dominated by a small set of “important” constraints Structural SVM Approach Repeatedly finds the next most violated constraint… …until set of constraints is a good approximation. Illustrative Example

  15. Original SVM Problem Exponential constraints Most are dominated by a small set of “important” constraints Structural SVM Approach Repeatedly finds the next most violated constraint… …until set of constraints is a good approximation. Illustrative Example

  16. Finding Most Violated Constraint • Structural SVM is an oracle framework. • Requires subroutine to find the most violated constraint. • Dependent on formulation of loss function and joint feature representation. • Exponential number of constraints! • Efficient algorithm in the case of optimizing MAP.

  17. Finding Most Violated Constraint Observation • MAP is invariant on the order of documents within a relevance class • Swapping two relevant or non-relevant documents does not change MAP. • Joint SVM score is optimized by sorting by document score, wTx • Reduces to finding an interleaving between two sorted lists of documents

  18. Finding Most Violated Constraint • Start with perfect ranking • Consider swapping adjacent relevant/non-relevant documents ►

  19. Finding Most Violated Constraint • Start with perfect ranking • Consider swapping adjacent relevant/non-relevant documents • Find the best feasible ranking of the non-relevant document ►

  20. Finding Most Violated Constraint • Start with perfect ranking • Consider swapping adjacent relevant/non-relevant documents • Find the best feasible ranking of the non-relevant document • Repeat for next non-relevant document ►

  21. Finding Most Violated Constraint • Start with perfect ranking • Consider swapping adjacent relevant/non-relevant documents • Find the best feasible ranking of the non-relevant document • Repeat for next non-relevant document • Never want to swap past previous non-relevant document ►

  22. Finding Most Violated Constraint • Start with perfect ranking • Consider swapping adjacent relevant/non-relevant documents • Find the best feasible ranking of the non-relevant document • Repeat for next non-relevant document • Never want to swap past previous non-relevant document • Repeat until all non-relevant documents have been considered ►

  23. Quick Recap SVM Formulation • SVMs optimize a tradeoff between model complexity and MAP loss • Exponential number of constraints (one for each incorrect ranking) • Structural SVMs finds a small subset of important constraints • Requires sub-procedure to find most violated constraint Find Most Violated Constraint • Loss function invariant to re-ordering of relevant documents • SVM score imposes an ordering of the relevant documents • Finding interleaving of two sorted lists • Loss function has certain monotonic properties • Efficient algorithm

  24. Experiments • Used TREC 9 & 10 Web Track corpus. • Features of document/query pairs computed from outputs of existing retrieval functions. (Indri Retrieval Functions & TREC Submissions) • Goal is to learn a recombination of outputs which improves mean average precision.

  25. Moving Forward • Approach also works (in theory) for other measures. • Some promising results when optimizing for NDCG (with only 1 level of relevance). • Currently working on optimizing for NDCG with multiple levels of relevance. • Preliminary MRR results not as promising.

  26. Conclusions • Principled approach to optimizing average precision. (avoids difficult to control heuristics) • Performs at least as well as alternative SVM methods. • Can be generalized to a large class of rank-based performance measures. • Software available at http://svmrank.yisongyue.com

More Related