230 likes | 356 Views
Learning to Rank at Query-Time using Association Rules. Adriano Veloso Humberto Almeida Marcos Gonçalves Wagner Meira Jr. Ranking. Paramount for search engines Given one query, rank retrieved documents according to their relevance wrt the query
E N D
Learning to Rank at Query-Time using Association Rules Adriano Veloso Humberto Almeida Marcos Gonçalves Wagner Meira Jr.
Ranking • Paramount for search engines • Given one query, rank retrieved documents according to their relevance wrt the query • The most relevant docs should appear first to the user • Typical ranking strategies • Similarity Models (similarity may not be the same as relevance) • Link Analysis: PageRank, HITS, etc. • … • Could we automatically learn a good rank? • Observe the phenomenon • Model the phenomenon • Use the model for predicting the relevance Collect sufficient training data document/query/judgment Apply a learning algorithm This is learning to rank (L2R)
Motivations for L2R • The Web is growing exponentially in size • Manually designing good ranking functions is unfeasible • Combination of multiple features is more accurate than a single feature • PageRank, for instance, is only one feature • It becomes harder for malicious users to manipulate the ranking • For other collections • Possibility of generation of collection-adapted ranking functions with better effectiveness
Talk Outline • Techniques for L2R • L2R using Association Rules • Association Rules • Relevance Estimation • L2R at Query-Time • On-demand Rule Generation • Caching Rules • Incorporating Query Information • Results • Effectiveness • Comparison against other Techniques 5 Conclusions
Some L2R Techniques • Many methods proposed: Ranking SVM, Rank-Boost, Frank, ListNet, AdaRank, ... Major Questions in this paper: • Are association rules an effective tool for L2R? • AR have been shown to be very effective in text classification tasks, outperforming SVMs and other methods (Veloso et al@CIKM06) • What strategy is better? • Learning a model which is good on average for all problems • Learning a model which is particularly good for a specific set of documents. • Can we use additional information, which is not exploited by other methods? • This can improve ranking performance
L2R using Association Rules • Rules of the form X → r are generated from the training data • X is a set of features (i.e., PageRank, BM25 etc.) • r is a relevance judgment • Association rules map features to relevance judgments • The degree of association between X and r is given in terms of confidence (θ) • The conditional probability of r given X
L2R using Association Rules • Likelihood of Relevance • Avg conf. associated with rules predicting each level of relevance • Relevance Estimation • Linear combination of the likelihoods • Challenges • The number of rules is exponential • We need to restrict the number of rules without discarding important information • Big question: Which rules are useful? Which are not?
L2R at Query-Time • Useful rules are those that carry information about the retrieved documents • In other words … • A rule X → r is only useful to rank a document d if X ⊆ d • Useful rules are only a very small subset of all possible rules that can be extracted from the training data • They can be found quickly • But retrieved documents are only known at query-time • Useful rules must be generated on a demand-driven basis, depending on the retrieved documents
On-Demand Rule Generation • Given a retrieved document • Project the training data • Remove from the training data all features that are not included in the retrieved document • Extract rules from the project training data
On-Demand Rule Generation • Different documents may demand different rule sets • Different rule sets may share common rules • The same rule is generated multiple times • Caching common rules • The cache stores rules in main memory • Limited size • Less frequent rules are discarded first • Using a cached rule is much faster than processing it • No access to the training data
Original rules With query terms rank=0.43 rank=0.47 Incorporating Query Information • Query terms can enhance the quality of rules • Likelihood of relevance may become more reliable • By exploiting associations between terms and other features
Training Data in LETOR(query-document-judgment) • OHSUMED, MEDLINE subset for IR • 106 queries • Judgments of definitely, partially, and not relevant • 16,140 query-document pairs with judgments • 25 features • .GOV Collection • TD2003 (50 queries, 1000 retrieved documents) • TD2004 (75 queries, 1000 retrieved documents) • Judgments of relevant and not relevant • 44 features
Evaluation • Three approaches are evaluated • AR (R) extracts association rules before docs are retrieved • AR (Rd) extracts rules on a demand-driven basis • AR (Rq) exploits query terms while generating rules • Criteria • MAP, NDCG@x and precision@x • Methodology • 5-fold X-validation • Significance test • t-test • Baselines • State-of-the-art L2R methods • Execution Time • Varying cache sizes
OHSUMED • AR (R) is competitive with the baselines • AR (Rd) shows moderate gains (1.8 to 3.8%) • AR (Rq) shows larger improvements (3.3 to 5.4%)
OHSUMED • Both AR (Rd) and AR (Rq) show impressive gains in terms of NDCG • Performance in terms of precision was not improved • AR (R) showed poor performance
TD2003 • Gains are impressive • All proposed approaches provided improvements • More features are available (more space for improvement)
TD2003 • High gains in terms of NDCG and precision are also observed
TD2004 • AR (R) showed poor performance • Several important rules were not included in the ranking model (this collection is large, and some associations are not enough frequent) • AR (Rd) and AR (Rq) showed significant gains • They were able to extract important rules
TD2004 • AR (R) showed poor performance in terms of NDCG and precision • AR (Rd) and AR (Rq) showed improvements • Specially in terms of NDCG
Caching • Caching is extremely effective • The same rule is used several times for ranking different documents • It makes on-demand rule generation feasible
Conclusions • We proposed three L2R methods based on the use of association rules • AR (R) extracts frequent rules from the training data • AR (Rd) extracts rules on a demand-driven basis • AR (Rq) uses additional information as features to form the rules: the query terms • The proposed approaches were evaluated using the LETOR benchmark • AR (R) showed to be competitive with the baselines • Important rules may not be included in the ranking model • AR (Rd) and AR (Rq) showed superior performance • Important rules are more likely to be included in the model • Query terms showed to be useful features for L2R