120 likes | 338 Views
Search Result Diversification by M. Drosou and E. Pitoura. Presenter: Bilge Koroglu June 14, 2011. Introduction. Result Diversification solution to over - speci a lizati on problem: retrieval of too homogeneous results personalization: complementing preferences Problem to be solved
E N D
Search Result Diversificationby M. Drosou and E. Pitoura Presenter: Bilge Koroglu June 14, 2011
Introduction • Result Diversification • solution to over-specialization problem: retrieval of too homogeneous results • personalization: complementing preferences • Problem to be solved • all itemset: X , |X| = n • select k divergent item, include in S • diversity among S is maximized
Introduction (con’t...) • Ways of diversification • Content: (dis)similarity btw. items • Novelty: most dissimilar compared to previous ones • Coverage : items from different categories • Approaches in diversification algorithms • Greedy • Interchange
Content-based Diversification • p-dispersion problem [1] • choosing p out of n points s.t. min. distance btw. any 2 pair is maximized • The objective function in web search diversification: • maximizing average intra-list similarity • Extension of k-nearest neighbor: Gower coefficient [2] • spatially closest but enough to be divergent from the rest
Novelty-based Diversification • Novelty vs. Diversity • novelty: avoiding redundancy • diversity: resolving ambiguity • Information nuggets: intents or classes of query [3] • Another diversification measure [4] where iff
Coverage-based Diversification • Typical example, employing classes [5] • Maximizes the probability that each relevant category is represented with a document in diversified search result list
Greedy Heuristics in Diversification • itemset distance • Flow of recommender algorithm • Calculate an itemset distance of new items to S • Sort new items according to relevance to the query and item-set distance • Combine the ranks of these sorted lists minimum ranked one is added to S by removing the last one • Continue with Step 1 until k new items are added
Interchange Heuristics in Diversification • Flow of algorithm [6] • S is initialized with k most relevant items • The item which contributes the diversity least is interchanged with the most relevant one in X/S • Structured Search Results [7] • identification of subset of features that can differentiate the instances more than others
Evaluation • Redundancy-aware Precision and Recall [8] • For NDCG calculation Gain is updated as
Conclusion • 3 factors • Content-based • Novelty-based • Coverage-based • 2 approaches • Heuristics • Interchanges • Employing more than 1 factor in an approach • Updated evluation metrics to measure the diversity are used
References [1]E. Erkut Y. Ulkusal, O. Yenicerioglu. A comparison of p-dispersion heuristics. Computers and OR, 21 (10): 1103-1113, 1994 [2] J. R. Haritsa. The KNDN problem: A quest for unityin diversity. IEEE Data Eng. Bull., 32(4):15–22, 2009. [3] C. L. A. Clarke, M. Kolla, G. V. Cormack,O. Vechtomova, A. Ashkan, S. Buttcher, andI. MacKinnon. Novelty and diversity in informationretrievalevaluation. In SIGIR, pages 659–666, 2008. [4] Y. Zhang, J. P. Callan, and T. P. Minka. Novelty andredundancy detection in adaptive filtering. In SIGIR,pages 81–88, 2002. [5] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5–14, 2009. [6] C. Yu, L. V. S. Lakshmanan, and S. Amer-Yahia. Ittakes variety to make a world: diversification inrecommender systems. In EDBT, pages 368–378, 2009. [7] Z. Liu, P. Sun, and Y. Chen. Structured search resultdifferentiation.PVLDB, 2(1):313–324, 2009. [8] Y. Zhang, J. P. Callan, and T. P. Minka. Novelty andredundancy detection in adaptive filtering. In SIGIR,pages 81–88, 2002.