120 likes | 251 Views
Boosting the Ranking Function Learning Process using Clustering. WIDM 2008. Outline. Introduction Problem definition Approach Evaluation Conclusion. Introduction. Abstract Web continuously grows, the results returned by search engines are too many to review
E N D
Boosting the Ranking Function Learning Process using Clustering WIDM 2008
Outline • Introduction • Problem definition • Approach • Evaluation • Conclusion
Introduction • Abstract • Web continuously grows, the results returned by search engines are too many to review • User feedback has gained a lot of attention • Require a big amount of user feedback on the results • Goal: • Produce user feedback “automatically” by using some methods
Problem definition • User feedback • Explicit feedback (user relevnacejudgement) • Implicit feedback • Click information • Users usually inspect only the first few results returned by a search engine, and click even fewer • Collect relevance judgements from clickthrough data is time consuming process • Problem • How to use explicit feedback to generate implicit feedback?(relevance relations expansion)
Approach procedure • Process • Assume that only the relevance judgements of the top-10 results are available for each query (by BM25 feature) • Group all the search results into clusters of documents having similar content • Expand the initial set(top-10 results) of relevance judgements using cluster information
Clustering • Represent each document by a feature vector • total number of distinct terms in all documents • Cluster method • Bisetion clustering • Similarity • Cosine similarity
Relation expansion Train query Train query expansion
Relation expansion • Expansion Algorithm:
Evaluation • Dataset • Letor OHSUMED collection • 348,566 records and 16,140 relevance judgements • 84 training queries and 22 testing queries • Relevance judgement • 0(irrelevant), 1(partially relevant), 2(strongly relevant) • Training method • RankSVM
Evaluation • Clustering precision
Evaluation Use 160 relevance judgements
Conclusion • We presented a methodology for increasing the training input of ranking function learning systems • Future work • Decision on whether a cluster is valid • Different Cluster label ways