140 likes | 241 Views
Result Diversification Based On Query Specific Cluster Ranking. By Jiyin He, Edgar Meij, and Maarten de Rijke Presenter: Bilge Koroglu June 6, 2011. Introduction. Multi-faceted queries Jaguar: cocktail, car, animal, etc... Ambiguity in search result lists
E N D
Result Diversification Based On Query Specific Cluster Ranking By Jiyin He, Edgar Meij, and Maarten de Rijke Presenter: Bilge Koroglu June 6, 2011
Introduction • Multi-faceted queries • Jaguar: cocktail, car, animal, etc... • Ambiguity in search result lists • which of the web pages should be included • cocktail, car, animal, etc... • Possible solution: diversification • query specific clusters • important clusters (=high quality)
Related Work • Probabilistic Ranking Principle • traditional retrieval strategy • maximize query-document similarity • Diversification requirements: • maximize query-document similarity • minimize document-document similarity • irrelevant document retrieval • Maximum Marginal Relevance (MMR) • Precision & diversity are inversely proportional (Figure 1) • Aim to increase early precision while giving diversified results
Related Work (cont’d...) Figure 1. The trade-off between precision anddiversity for MMR
Diversification Methods • MMR • Facet modeling with Latent Dirichlet Allocation (FM-LDA) • ItentAware Select (IA-Select) • Round Robin Facet Selection • Selection of T • diversification on top T clusters • remaining documents ranked by their retrieval score • The ways of ranking clusters: • query likelihood • oracle ranker
Result Diversification with Cluster Ranking • Similar documents are in the same group • documents with the same subtopic • Relatively small number of clusters include actually relevant documents • For each query • search result list is constructed by MRF • top-ranked documents are clustered • clusters are ranked for the relevancy of query in decreasing order • from high quality clusters, new search result list is composed
Experiments • 4 research questions are investigated: • What is the impact of diversification with cluster ranking on the effectiveness of existing result diversification methods? • What are the impacts of cluster ranker and the value of T ? • How sensitive is the performance of the framework to the number of documents selected? • What conditions should clusters fulfill to be effective clusters for cluster ranking?
Experiments: Question 1 • How much performance is gained by employing query specific clustering and applying result diversification to the retrieved documents? • query likelihood cluster ranker • automatically determined T (leave-one-out cross validation) • the higher T, less performance in FM-LDA • positive effect on performance with cluster ranking except IA-select algorithm
Experiments: Question 2 • What are the impacts of cluster ranker and the value of T ? Figure 3. Query likelihood ranker Figure 4. Oracle Cluster Ranking
Experiments: Question 3 • How sensitive is the performance of the framework to the number of documents selected? Table 1. The effect of search result lists’ length
Experiments: Question 3 (cont’d...) Table 2. The effect of search result lists’ length
Experiments: Question 4 • What conditions should clusters fulfill to be effective clusters for cluster ranking: Diversified result from small number of high quality clusters Figure 5. Accumulated precision scores for hierarchical (above) and LDA clusters (below)
Conclusion • Taking the advantage of cluster-based retrieval for diversification • Aim to increase the diversity while preserving the early precision • Evaluated that the technique is effective and applicable • Worth to further investigate with rigorous learning algorithms for parameters