170 likes | 323 Views
Diversifying Search Results. Rakesh Agrawal Sreenivas Gollapudi Search Labs Search Labs Microsoft Research Microsoft Research rakesha@microsoft.com sreenig@microsoft.com Alan Halverson Samuel Ieong Search Labs Search Labs Microsoft Research Microsoft Research
E N D
Diversifying Search Results RakeshAgrawalSreenivasGollapudi Search Labs Search Labs Microsoft Research Microsoft Research rakesha@microsoft.comsreenig@microsoft.com Alan Halverson Samuel Ieong Search Labs Search Labs Microsoft Research Microsoft Research alanhal@microsoft.comsaieong@microsoft.com WSDM ’09
Outline • Introduction • Problem Formulation • A Greedy Algorithm for DIVERSIFY(K) • Performance Metrics • Evaluation • Conclusions
Introduction • Minimize the risk of dissatisfaction of the average user • Assume that there exist • a taxonomy • a model user intents • Consider both the relevance of the documents and diversity of the search result • Tradeoff relevance and diversity
Problem Formulation • The number of results to show for each category according to the percentage of users interested in that category may perform poorly • Example : Flash • Technology : 0.6
Problem Formulation • Non-order • Our algorithm is also designed to generate an ordering of results rather than just a set of results
Problem Formulation • DIVERSIFY(k) is NP-hard • Optimal for DIVERSIFY (k-1) need not be a subset of documents optimal for DIVERSIFY (k) • Example : p(c1|q)=p(c2|q)=0.5 DIVERSIFY(1):d1,d2,d3 DIVERSIFY(2):d2,d3,d1
Performance Metrics • NDCG,MRR,MAP do not take into account the value of diversification • Intent Aware Measure example: p(c2|q)>>p(c1|q) d1 is Excellent for c1(but unrelated to c2) d2is Good for c2(but unrelated to c1) Classical IR metrics:d1,d2 Intent aware measures:d2,d1
Evaluation • Evaluate our approach against three commercial search engine • Conduct three sets of experiments • Differ in how the distributions of intents and how the relevance of the documents are obtained
Experiment 1 • The distributions of intents for both queries and documents via standard classifiers • The relevance of documents from a proprietary repository of human judgements that we have been granted access to • Dataset : 10,000 random queries with top 50 documents • Many documents are assigned human judgments in the top 10 for each query
Experiment 1 • sample about 900 queries • at least two categories • a significant fraction of associated documents have human judgments
Experiment 2 • Obtain the distributions of intents for queries and the document relevance using the Amazon Mechanical Turk platform • Sample 200 queries from the dataset • at least three categories • Submit these queries along with the three most likely categories as estimated by the classier and the top five results produced by IA-Select to the Turks
Experiment 3 • IA-Select : p(c|q) from Amazon Mechanical Turk platform • Metrics : p(c|q) and relevance documents are the same as used in Experiment 1
Conclusions • Provide a greedy algorithm with good approximation gurantees • To evaluate the effectiveness of our approach, we proposed generalizations of well-studied metrics to take into account of the intentions of the users • Our approach outperforms results produced by commercial search engines over all of the metrics