Result Diversification Based On Query Specific Cluster Ranking

Result Diversification Based On Query Specific Cluster Ranking By Jiyin He, Edgar Meij, and Maarten de Rijke Presenter: Bilge Koroglu June 6, 2011

Introduction • Multi-faceted queries • Jaguar: cocktail, car, animal, etc... • Ambiguity in search result lists • which of the web pages should be included • cocktail, car, animal, etc... • Possible solution: diversification • query specific clusters • important clusters (=high quality)

Related Work • Probabilistic Ranking Principle • traditional retrieval strategy • maximize query-document similarity • Diversification requirements: • maximize query-document similarity • minimize document-document similarity • irrelevant document retrieval • Maximum Marginal Relevance (MMR) • Precision & diversity are inversely proportional (Figure 1) • Aim to increase early precision while giving diversified results

Related Work (cont’d...) Figure 1. The trade-off between precision anddiversity for MMR

Diversification Methods • MMR • Facet modeling with Latent Dirichlet Allocation (FM-LDA) • ItentAware Select (IA-Select) • Round Robin Facet Selection • Selection of T • diversification on top T clusters • remaining documents ranked by their retrieval score • The ways of ranking clusters: • query likelihood • oracle ranker

Result Diversification with Cluster Ranking • Similar documents are in the same group • documents with the same subtopic • Relatively small number of clusters include actually relevant documents • For each query • search result list is constructed by MRF • top-ranked documents are clustered • clusters are ranked for the relevancy of query in decreasing order • from high quality clusters, new search result list is composed

Figure 2. Diversification with Cluster Ranking

Experiments • 4 research questions are investigated: • What is the impact of diversification with cluster ranking on the effectiveness of existing result diversification methods? • What are the impacts of cluster ranker and the value of T ? • How sensitive is the performance of the framework to the number of documents selected? • What conditions should clusters fulfill to be effective clusters for cluster ranking?

Experiments: Question 1 • How much performance is gained by employing query specific clustering and applying result diversification to the retrieved documents? • query likelihood cluster ranker • automatically determined T (leave-one-out cross validation) • the higher T, less performance in FM-LDA • positive effect on performance with cluster ranking except IA-select algorithm

Experiments: Question 2 • What are the impacts of cluster ranker and the value of T ? Figure 3. Query likelihood ranker Figure 4. Oracle Cluster Ranking

Experiments: Question 3 • How sensitive is the performance of the framework to the number of documents selected? Table 1. The effect of search result lists’ length

Experiments: Question 3 (cont’d...) Table 2. The effect of search result lists’ length

Experiments: Question 4 • What conditions should clusters fulfill to be effective clusters for cluster ranking: Diversified result from small number of high quality clusters Figure 5. Accumulated precision scores for hierarchical (above) and LDA clusters (below)

Conclusion • Taking the advantage of cluster-based retrieval for diversification • Aim to increase the diversity while preserving the early precision • Evaluated that the technique is effective and applicable • Worth to further investigate with rigorous learning algorithms for parameters

Result Diversification Based On Query Specific Cluster Ranking

Result Diversification Based On Query Specific Cluster Ranking

Presentation Transcript

Ranking of Database Query Results

Structured Query Result Differentiation

Query Result Caching

Probabilistic Ranking of Database Query Results

Automated Ranking Of Database Query Results

Automated Ranking Of Database Query Results

Term Level Search Result Diversification

DisC Diversity: Result Diversification based on Dissimilarity and Coverage

Ranking Refactoring Suggestions based on Historical Volatility

Understanding Temporal Intent of User Query based on Time-based Query Classification

Ranking and Recommendation Based on Usage Data

RESULT BASED PLANNING / RESULT BASED MANAGEMENT (RBM)

Probabilistic Ranking of Database Query Result

Query-Based Debugging

Depth Estimation for Ranking Query Optimization

Probabilistic Ranking of Database Query Results

Query Ranking in Probabilistic XML Data

Query Specific Ranking

Cluster policy in Russia: on the way to economic diversification?

Probabilistic Ranking of Database Query Results

Collaborative query processing based on reducts

1 Topic-specific Authority Ranking