1 / 19

Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning

Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning. Presented by Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute Carnegie Mellon University

monifa
Download Presentation

Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning Presented by Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute Carnegie Mellon University MSR Redmond, June 27 2008

  2. Road Map • The Challenge: Active Rank Learning • Related Work • DiffLoss: New Method for Active Learning for RankSVM and RankBoost • Results: DiffLoss vs. Margin-Based and Random Sampling • Conclusion

  3. Active Rank Learning:Why do we care? • Challenge: Labeling for rank learning • requires eliciting relative ordering over a set of alternatives • costly • time-consuming • extensive human effort • Numerous applications • document retrieval • collaborative filtering • product rating...

  4. Active Rank Learning:How to address? • an optimal active learner samples those with the lowest estimated expected error on the test set (Roy & McCallum, 2001) • impractical for large-scale ranking problems even with efficient re-training • Our solution: • estimate how likely adding a new instance will result in the lowest expected error on the test data without any re-training • based on the likelihood of the change of the current hypothesis • the greater this change, the greater the chance to learn the true hypothesis faster

  5. Related Work • Margin-based Sampling (Brinker, 2004; Yu, 2005) • margin := minimum difference of scores between two instances in the ranked order • selects the examples with minimum margin • pro: simple to implement, generalizable to real-valued ranking function • con: similar instances with the same rank label may have minimum margin • Divergence-based Sampling (Amini et al, 2006) • similar to query-by-committee sampling • selects instances at which two ranking functions maximally disagree • major drawback: effective only when provided with a sufficiently large initial labeled set

  6. Active Sampling for RankSVM I • Consider a candidate • Assume is added to training set with • Total loss on pairs that include is: • n is the # of training instances with a different label than • Objective function to be minimized becomes:

  7. Active Sampling for RankSVM II • Assume the current ranking function is • There are two possible cases: • Assume • Derivative w.r.t at a single point or

  8. Active Sampling for RankSVM III • Substitute in the previous equation to estimate • Magnitude of the total derivative • estimates the ability of to change the current ranker if added into training • Finally,

  9. Active Sampling for RankBoost I • Again, estimate how the current ranker would change if was in the training set • Estimate this change by the difference in ranking loss before and after is added • Ranking loss w.r.t is (Freund et al., 2003):

  10. Active Sampling for RankBoost II • Difference in the ranking loss between the current and the enlarged set: • indicates how much the current ranker needs to change to compensate for the loss introduced by the new instance • Finally, the instance with the highest loss differential is sampled:

  11. Data & Settings • TREC 2003 and TREC 2004 topic distillations datasets in LETOR • Initial training set has 16 docs/query (1 relevant & 15 non-relevant) • Select 5 docs/query at each iteration

  12. Performance Measures • MAP (Mean Average Precision) • MAP is the average of AP values for all queries • NDCG (Normalized Discounted Cumulative Gain) • The impact of each relevant document is discounted as a function of rank position

  13. Results on TREC03 * Horizontal line indicates the performance if all the data is used as the training set.

  14. Results on TREC04

  15. Results at a Glance • Our method (DiffLoss) is significantly superior over the entire operating range (p<0.0001). • DiffLoss achieves 30% relative improvement over the margin-based sampling on TREC03. • DiffLoss using RankSVM reaches the optimal performance after ~10 rounds. • DiffLoss using RankBoost reaches 90-95% of the optimal performance after ~10 rounds.

  16. Conclusion • Two new active sampling methods for RankSVM and RankBoost • Instances with the largest expected loss differential are sampled • Our method has a significantly faster learning rate compared to baselines • In the future, we plan to focus on • sampling by directly optimizing performance metrics • automatically determining when to stop sampling

  17. THE END!

More Related