240 likes | 263 Views
Learning to Rank with Ties. Authors: Ke Zhou, Gui-Rong Xue, Hongyuan Zha, and Yong Yu Presenter: Davidson Date: 2009/12/29 Published in SIGIR 2008. Contents. Introduction Pairwise learning to rank Models for paired comparisons with ties General linear models Bradley-Terry model
E N D
Learning to Rank with Ties Authors: Ke Zhou, Gui-Rong Xue, Hongyuan Zha, and Yong Yu Presenter: Davidson Date: 2009/12/29 Published in SIGIR 2008
Contents • Introduction • Pairwise learning to rank • Models for paired comparisons with ties • General linear models • Bradley-Terry model • Thurstone-Mosteller model • Loss functions • Learning by functional gradient boosting • Experiments • Conclusions and future work
Introduction • Learning to rank: • Ranking objects for some queries • Document retrieval, expert finding, anti web spam, and product ratings, etc. • Learning to rank methods: • Pointwise approach • Pairwise approach • Listwise approach
Existing approaches (1/2) • Vector space model methods • Represent query and documents as vectors of features • Compute the distance as similarity measure • Language modeling based methods • Use a probabilistic framework for the relevance of a document with respect to a query • Estimate the parameters in probability models
Existing approaches (2/2) • Supervised machine learning framework • Learn a ranking function from pairwise preference data • Minimize the number of contradicting pairs in training data • Direct optimization of loss function designed from performance measure • Obtain a ranking function that is optimal with respect to some performance measure • Require absolute judgment data
Pairwise learning to rank • Use pairs of preference data • = is preferred to • How to obtain preference judgments? • Clickthroughs • User click count on search results • Use heuristic rules such as clichthrough rates • Absolute relevant judgments • Human labels • E.g. (4-level judgments) perfect, excellent, good, and bad • Need to convert preference judgments to preference data
Conversion from preference judgments to preference data • E.g. 4 samples with 2-level judgment Preference judgment Preference data • (A, C) • (A, D) • (B, C) • (B, D) • Tie cases (A, B) and (C, D) are ignored!
Models for paired comparisons with ties • Notations: • => is preferred to • => and are preferred equally • Proposed framework • General linear models for paired comparisons (in Oxford University Press, 1988) • Statistical models for paired comparisons • Bradley-Terry model (in Biometrika, 1952) • Thurstone-Mosteller model (in Psychological Review, 1927)
General linear models (1/4) • For a non-decreasing function • The probability that document is preferred to document is: = scoring/ranking function
General linear models (2/4) • With ties, the function becomes: • = a threshold that controls the tie probability • , this model is identical to the original general linear models.
Bradley-Terry model • The function is set to be so that • With ties: where
Thurstone-Mosteller model • The function is set to be the Gaussian cumulative distribution • With ties:
Loss functions • Training data • preference data, tie data • Minimize the empirical risk: • Loss function:
Learning by functional gradient boosting • Obtain a function from a function space that minimizes the empirical loss: • Apply gradient boosting algorithm • Approximate by iteratively constructing a sequence of base learners • Base learners are regression trees • The number of iteration and shrinkage factor in boosting algorithm are found by using cross validation
Experiments • Learning to rank methods: • BT (Bradley-Terry model) and TM (Thurstone-Mosteller model) • BT-noties and TM-noties (BT and TM without ties) • RankSVM, RankBoost, AdaRank, Frank • Datasets (Letor data collection) • OHSUMED (16,140 pairs, 3-level judgment) • TREC2003 (49,171 pairs, binary judgment) • TREC2004 (74,170 pairs, binary judgment)
Performance measures • Precision • Binary judgment only • Mean Average Precision (MAP) • Binary judgment only • Sensitive to the entire ranking order • Normalized Discount Cumulative Gain (NDCG) • Multi-level judgment
Performance comparison: ties from different relevance levels
Conclusions and future work • Conclusions • Tie data improve the performance • Common features of relevant documents are extracted • Irrelevant documents have more diverse tie features and are less effective • BT and TM are comparable in most cases • Future work • Theoretical analysis of ties • New methods/algorithms/loss functions of incorporating ties