1 / 48

Modeling Diversity in Information Retrieval

Modeling Diversity in Information Retrieval. ChengXiang (“Cheng”) Zhai Department of Computer Science Graduate School of Library & Information Science Institute for Genomic Biology Department of Statistics University of Illinois, Urbana-Champaign. Different Needs for Diversification.

beckk
Download Presentation

Modeling Diversity in Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Diversity in Information Retrieval • ChengXiang (“Cheng”) Zhai • Department of Computer Science • Graduate School of Library & Information Science • Institute for Genomic Biology • Department of Statistics • University of Illinois, Urbana-Champaign

  2. Different Needs for Diversification Redundancy reduction Diverse information needs (e.g., overview, subtopic retrieval) Active relevance feedback …

  3. Outline Risk minimization framework Capturing different needs for diversification Language models for diversification

  4. IR as Sequential Decision Making (Information Need) (Model of Information Need) User System A1 : Enter a query Which documents to present? How to present them? Which documents to view? Ri: results (i=1, 2, 3, …) Which part of the document to show? How? A2 :View document R’: Document content View more? A3 : Click on “Back” button

  5. Retrieval Decisions History H={(Ai,Ri)} i=1, …, t-1 Rt =? Rt r(At) Given U, C, At , and H, choose the best Rt from all possible responses to At Query=“Jaguar” Click on “Next” button User U: A1 A2 … … At-1 At System: R1 R2 … … Rt-1 The best ranking for the query The best k unseen docs C All possible rankings of C Document Collection All possible size-k subsets of unseen docs

  6. A Risk Minimization Framework User Model Seen docs M=(S, U…) Information need L(ri,At,M) Loss Function Optimal response: r* (minimum loss) Bayes risk Inferred Observed Observed User: U Interaction history: H Current user action: At Document collection: C All possible responses: r(At)={r1, …, rn}

  7. A Simplified Two-Step Decision-Making Procedure • Approximate the Bayes risk by the loss at the mode of the posterior distribution • Two-step procedure • Step 1: Compute an updated user model M* based on the currently available information • Step 2: Given M*, choose a response to minimize the loss function

  8. Optimal Interactive Retrieval M*1 P(M1|U,H,A1,C) L(r,A1,M*1) R1 A2 M*2 P(M2|U,H,A2,C) L(r,A2,M*2) R2 A3 … User U C Collection A1 IR system

  9. Refinement of Risk Minimization • Rt{query, clickthrough, feedback,…} • r(At): decision space (At dependent) • r(At) = all possible subsets of C + presentation strategies • r(At) = all possible rankings of docs in C • r(At) = all possible rankings of unseen docs • … • M: user model • Essential component: U = user information need • S = seen documents • n = “Topic is new to the user” • L(Rt ,At,M): loss function • Generally measures the utility of Rt for a user modeled as M • Often encodes retrieval criteria (e.g., using M to select a ranking of docs) • P(M|U, H, At, C): user model inference • Often involves estimating a unigram language model U

  10. Generative Model of Document & Query [Lafferty & Zhai 01] U q User Query Partially observed observed R d S Document Source inferred

  11. Risk Minimization with Language Models [Lafferty & Zhai 01, Zhai & Lafferty 06] Loss L L L query q user U q Choice: (D1,1) 1 Choice: (D2,2) doc setC sourceS ... Choice: (Dn,n) N loss hidden observed RISK MINIMIZATION Bayes risk for choice (D, )

  12. Optimal Ranking for Independent Loss “Risk ranking principle” [Zhai 02, Zhai & Lafferty 06] Decision space = {rankings} Sequential browsing Independent loss Independent risk = independent scoring

  13. Risk Minimization for Diversification • Redundancy reduction: Loss function includes a redundancy measure • Special case: list presentation + MMR [Zhai et al. 03] • Diverse information needs: loss function defined on latent topics • Special case: PLSA/LDA + topic retrieval [Zhai 02] • Active relevance feedback: loss function considers both relevance and benefit for feedback • Special case: hard queries + feedback only [Shen & Zhai 05]

  14. Subtopic Retrieval Query: What are the applications of robotics in the world today? Find as many DIFFERENT applications as possible. Subtopic judgments A1 A2 A3 … ... Ak d1 1 1 0 0 … 0 0 d2 0 1 1 1 … 0 0 d3 0 0 0 0 … 1 0 …. dk 1 0 1 0 ... 0 1 Example subtopics: A1:spot-welding robotics A2: controlling inventory A3: pipe-laying robots A4: talking robot A5: robots for loading & unloading memory tapes A6: robot [telephone] operators A7: robot cranes … … This is a non-traditional retrieval task …

  15. Diversify = Remove Redundancy Greedy Algorithm for Ranking: Maximal Marginal Relevance (MMR) “Willingness to tolerate redundancy” C2<C3, since a redundant relevant doc is better than a non-relevant doc

  16. A Mixture Model for Redundancy Ref. document P(w|Old) Collection P(w|Background) 1- =?  p(New|d)= = probability of “new” (estimated using EM) p(New|d) can also be estimated using KL-divergence

  17. Evaluation metrics • Intuitive goals: • Should see documents from many different subtopicsappear early in a ranking (subtopic coverage/recall) • Should not see many different documents that cover the same subtopics (redundancy). • How do we quantify these? • One problem: the “intrinsic difficulty” of queries can vary.

  18. Evaluation metrics: a proposal • Definition: Subtopic recall at rank K is the fraction of subtopics a so that one of d1,..,dK is relevant to a. • Definition: minRank(S,r) is the smallest rank K such that the ranking produced by IR system S has subtopic recall r at rank K. • Definition: Subtopic precision at recall level r for IR system S is: This generalizes ordinary recall-precision metrics. It does not explicitly penalize redundancy.

  19. Evaluation metrics: rationale precision 1.0 0.0 K minRank(S,r) For subtopics, the minRank(Sopt,r) curve’s shape is not predictable and linear. minRank(Sopt,r) recall

  20. Evaluating redundancy Definition: the cost of a ranking d1,…,dK is where b is cost of seeing document, a is cost of seeing a subtopic inside a document (before a=0). Definition: minCost(S,r) is the minimal cost at which recall r is obtained. Definition: weighted subtopic precision at r is will use a=b=1

  21. Evaluation Metrics Summary • Measure performance (size of ranking minRank, cost of ranking minCost)relative to optimal. • Generalizes ordinary precision/recall. • Possible problems: • Computing minRank, minCost is NP-hard! • A greedy approximation seems to work well for our data set

  22. Experiment Design • Dataset: TREC “interactive track” data. • London Financial Times: 210k docs, 500Mb • 20 queries from TREC 6-8 • Subtopics: average 20, min 7, max 56 • Judged docs: average 40, min 5, max 100 • Non-judged docs assumed not relevant to any subtopic. • Baseline: relevance-based ranking (using language models) • Two experiments • Ranking only relevant documents • Ranking all documents

  23. S-Precision: re-ranking relevant docs

  24. WS-precision: re-ranking relevant docs

  25. Results for ranking all documents “Upper bound”: use subtopic namesto build an explicit subtopic model.

  26. Summary: Remove Redundancy • Mixture model is effective for identifying novelty in relevant documents • Trading off novelty and relevance is hard • Relevance seems to be dominating factor in TREC interactive-track data

  27. Diversity = Satisfy Diverse Info. Need[Zhai 02] Need to directly model latent aspects and then optimize results based on aspect/topic matching Reducing redundancy doesn’t ensure complete coverage of diverse aspects

  28. Aspect Generative Model of Document & Query U q User Query d S Document Source PLSI: LDA:  =(1,…, k)

  29. Aspect Loss Function U q  S d

  30. Aspect Loss Function: Illustration perfect redundant Desired coverage p(a|Q) “Already covered” p(a|1)... p(a|k -1) non-relevant New candidate p(a|k) Combined coverage p(a|k)

  31. Evaluation Measures • Aspect Coverage (AC): measures per-doc coverage • #distinct-aspects/#docs • Equivalent to the “set cover” problem • Aspect Uniqueness(AU): measures redundancy • #distinct-aspects/#aspects • Equivalent to the “volume cover” problem • Examples 0 0 0 1 0 0 1 0 1 0 1 1 0 0 1 0 0 0 1 0 1 … ... d1 d2 d3 #doc 1 2 3 … … #asp 2 5 8 … … #uniq-asp 2 4 5 AC: 2/1=2.0 4/2=2.0 5/3=1.67 AU: 2/2=1.0 4/5=0.8 5/8=0.625

  32. Effectiveness of Aspect Loss Function (PLSI)

  33. Effectiveness of Aspect Loss Function (LDA)

  34. Comparison of 4 MMR Methods CC - Cost-based Combination QB - Query Background Model MQM - Query Marginal Model MDM - Document Marginal Model

  35. Summary: Diverse Information Need • Mixture model is effective for capturing latent topics • Direct modeling of latent aspects/topics is more effective than indirect modeling through MMR in improving aspect coverage, but MMR is better for improving aspect uniqueness • With direct topic modeling and matching, aspect coverage can be improved at the price of lower relevance-based precision

  36. Diversify = Active Feedback [Shen & Zhai 05] Decision problem: Decide subset of documents for relevance judgment

  37. Independent Loss Independent Loss

  38. Independent Loss (cont.) Top K Uncertainty Sampling

  39. Dependent Loss Heuristics: consider relevance first, then diversity Select Top N documents … Cluster N docs into K clusters MMR K Cluster Centroid Gapped Top K

  40. Illustration of Three AF Methods Top-K (normal feedback) Gapped Top-K K-cluster centroid 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 … Aiming at high diversity …

  41. Evaluating Active Feedback K docs Select K docs Judgment File + - + Feedback Results Feedback - + + Judged docs Query Initial Results No feedback (Top-k, gapped, clustering)

  42. Retrieval Methods (Lemur toolkit) Kullback-Leibler Divergence Scoring Results Active Feedback Feedback Docs F={d1, …, dn} Mixture Model Feedback Only learn from relevant docs Document D Query Q Default parameter settings unless otherwise stated

  43. Comparison of Three AF Methods Top-K is the worst! Clustering uses fewest relevant docs bold font = worst * = best

  44. Appropriate Evaluation of Active Feedback Original DB with judged docs (AP88-89, HARD) New DB (AP88-89, AP90) Original DB without judged docs + + - - + + See the learning effect more explicitly But the docs must be similar to original docs Can’t tell if the ranking of un-judged documents is improved Different methods have different test documents

  45. Comparison of Different Test Data Top-K is consistently the worst! Clustering generates fewer, but higher quality examples

  46. Summary: Active Feedback • Presenting the top-k is not the best strategy • Clustering can generate fewer, higher quality feedback examples

  47. Conclusions • There are many reasons for diversifying search results (redundancy, diverse information needs, active feedback) • Risk minimization framework can model all these cases of diversification • Different scenarios may need different techniques and different evaluation measures

  48. Thank You!

More Related