480 likes | 489 Views
Modeling Diversity in Information Retrieval. ChengXiang (“Cheng”) Zhai Department of Computer Science Graduate School of Library & Information Science Institute for Genomic Biology Department of Statistics University of Illinois, Urbana-Champaign. Different Needs for Diversification.
E N D
Modeling Diversity in Information Retrieval • ChengXiang (“Cheng”) Zhai • Department of Computer Science • Graduate School of Library & Information Science • Institute for Genomic Biology • Department of Statistics • University of Illinois, Urbana-Champaign
Different Needs for Diversification Redundancy reduction Diverse information needs (e.g., overview, subtopic retrieval) Active relevance feedback …
Outline Risk minimization framework Capturing different needs for diversification Language models for diversification
IR as Sequential Decision Making (Information Need) (Model of Information Need) User System A1 : Enter a query Which documents to present? How to present them? Which documents to view? Ri: results (i=1, 2, 3, …) Which part of the document to show? How? A2 :View document R’: Document content View more? A3 : Click on “Back” button
Retrieval Decisions History H={(Ai,Ri)} i=1, …, t-1 Rt =? Rt r(At) Given U, C, At , and H, choose the best Rt from all possible responses to At Query=“Jaguar” Click on “Next” button User U: A1 A2 … … At-1 At System: R1 R2 … … Rt-1 The best ranking for the query The best k unseen docs C All possible rankings of C Document Collection All possible size-k subsets of unseen docs
A Risk Minimization Framework User Model Seen docs M=(S, U…) Information need L(ri,At,M) Loss Function Optimal response: r* (minimum loss) Bayes risk Inferred Observed Observed User: U Interaction history: H Current user action: At Document collection: C All possible responses: r(At)={r1, …, rn}
A Simplified Two-Step Decision-Making Procedure • Approximate the Bayes risk by the loss at the mode of the posterior distribution • Two-step procedure • Step 1: Compute an updated user model M* based on the currently available information • Step 2: Given M*, choose a response to minimize the loss function
Optimal Interactive Retrieval M*1 P(M1|U,H,A1,C) L(r,A1,M*1) R1 A2 M*2 P(M2|U,H,A2,C) L(r,A2,M*2) R2 A3 … User U C Collection A1 IR system
Refinement of Risk Minimization • Rt{query, clickthrough, feedback,…} • r(At): decision space (At dependent) • r(At) = all possible subsets of C + presentation strategies • r(At) = all possible rankings of docs in C • r(At) = all possible rankings of unseen docs • … • M: user model • Essential component: U = user information need • S = seen documents • n = “Topic is new to the user” • L(Rt ,At,M): loss function • Generally measures the utility of Rt for a user modeled as M • Often encodes retrieval criteria (e.g., using M to select a ranking of docs) • P(M|U, H, At, C): user model inference • Often involves estimating a unigram language model U
Generative Model of Document & Query [Lafferty & Zhai 01] U q User Query Partially observed observed R d S Document Source inferred
Risk Minimization with Language Models [Lafferty & Zhai 01, Zhai & Lafferty 06] Loss L L L query q user U q Choice: (D1,1) 1 Choice: (D2,2) doc setC sourceS ... Choice: (Dn,n) N loss hidden observed RISK MINIMIZATION Bayes risk for choice (D, )
Optimal Ranking for Independent Loss “Risk ranking principle” [Zhai 02, Zhai & Lafferty 06] Decision space = {rankings} Sequential browsing Independent loss Independent risk = independent scoring
Risk Minimization for Diversification • Redundancy reduction: Loss function includes a redundancy measure • Special case: list presentation + MMR [Zhai et al. 03] • Diverse information needs: loss function defined on latent topics • Special case: PLSA/LDA + topic retrieval [Zhai 02] • Active relevance feedback: loss function considers both relevance and benefit for feedback • Special case: hard queries + feedback only [Shen & Zhai 05]
Subtopic Retrieval Query: What are the applications of robotics in the world today? Find as many DIFFERENT applications as possible. Subtopic judgments A1 A2 A3 … ... Ak d1 1 1 0 0 … 0 0 d2 0 1 1 1 … 0 0 d3 0 0 0 0 … 1 0 …. dk 1 0 1 0 ... 0 1 Example subtopics: A1:spot-welding robotics A2: controlling inventory A3: pipe-laying robots A4: talking robot A5: robots for loading & unloading memory tapes A6: robot [telephone] operators A7: robot cranes … … This is a non-traditional retrieval task …
Diversify = Remove Redundancy Greedy Algorithm for Ranking: Maximal Marginal Relevance (MMR) “Willingness to tolerate redundancy” C2<C3, since a redundant relevant doc is better than a non-relevant doc
A Mixture Model for Redundancy Ref. document P(w|Old) Collection P(w|Background) 1- =? p(New|d)= = probability of “new” (estimated using EM) p(New|d) can also be estimated using KL-divergence
Evaluation metrics • Intuitive goals: • Should see documents from many different subtopicsappear early in a ranking (subtopic coverage/recall) • Should not see many different documents that cover the same subtopics (redundancy). • How do we quantify these? • One problem: the “intrinsic difficulty” of queries can vary.
Evaluation metrics: a proposal • Definition: Subtopic recall at rank K is the fraction of subtopics a so that one of d1,..,dK is relevant to a. • Definition: minRank(S,r) is the smallest rank K such that the ranking produced by IR system S has subtopic recall r at rank K. • Definition: Subtopic precision at recall level r for IR system S is: This generalizes ordinary recall-precision metrics. It does not explicitly penalize redundancy.
Evaluation metrics: rationale precision 1.0 0.0 K minRank(S,r) For subtopics, the minRank(Sopt,r) curve’s shape is not predictable and linear. minRank(Sopt,r) recall
Evaluating redundancy Definition: the cost of a ranking d1,…,dK is where b is cost of seeing document, a is cost of seeing a subtopic inside a document (before a=0). Definition: minCost(S,r) is the minimal cost at which recall r is obtained. Definition: weighted subtopic precision at r is will use a=b=1
Evaluation Metrics Summary • Measure performance (size of ranking minRank, cost of ranking minCost)relative to optimal. • Generalizes ordinary precision/recall. • Possible problems: • Computing minRank, minCost is NP-hard! • A greedy approximation seems to work well for our data set
Experiment Design • Dataset: TREC “interactive track” data. • London Financial Times: 210k docs, 500Mb • 20 queries from TREC 6-8 • Subtopics: average 20, min 7, max 56 • Judged docs: average 40, min 5, max 100 • Non-judged docs assumed not relevant to any subtopic. • Baseline: relevance-based ranking (using language models) • Two experiments • Ranking only relevant documents • Ranking all documents
Results for ranking all documents “Upper bound”: use subtopic namesto build an explicit subtopic model.
Summary: Remove Redundancy • Mixture model is effective for identifying novelty in relevant documents • Trading off novelty and relevance is hard • Relevance seems to be dominating factor in TREC interactive-track data
Diversity = Satisfy Diverse Info. Need[Zhai 02] Need to directly model latent aspects and then optimize results based on aspect/topic matching Reducing redundancy doesn’t ensure complete coverage of diverse aspects
Aspect Generative Model of Document & Query U q User Query d S Document Source PLSI: LDA: =(1,…, k)
Aspect Loss Function U q S d
Aspect Loss Function: Illustration perfect redundant Desired coverage p(a|Q) “Already covered” p(a|1)... p(a|k -1) non-relevant New candidate p(a|k) Combined coverage p(a|k)
Evaluation Measures • Aspect Coverage (AC): measures per-doc coverage • #distinct-aspects/#docs • Equivalent to the “set cover” problem • Aspect Uniqueness(AU): measures redundancy • #distinct-aspects/#aspects • Equivalent to the “volume cover” problem • Examples 0 0 0 1 0 0 1 0 1 0 1 1 0 0 1 0 0 0 1 0 1 … ... d1 d2 d3 #doc 1 2 3 … … #asp 2 5 8 … … #uniq-asp 2 4 5 AC: 2/1=2.0 4/2=2.0 5/3=1.67 AU: 2/2=1.0 4/5=0.8 5/8=0.625
Comparison of 4 MMR Methods CC - Cost-based Combination QB - Query Background Model MQM - Query Marginal Model MDM - Document Marginal Model
Summary: Diverse Information Need • Mixture model is effective for capturing latent topics • Direct modeling of latent aspects/topics is more effective than indirect modeling through MMR in improving aspect coverage, but MMR is better for improving aspect uniqueness • With direct topic modeling and matching, aspect coverage can be improved at the price of lower relevance-based precision
Diversify = Active Feedback [Shen & Zhai 05] Decision problem: Decide subset of documents for relevance judgment
Independent Loss Independent Loss
Independent Loss (cont.) Top K Uncertainty Sampling
Dependent Loss Heuristics: consider relevance first, then diversity Select Top N documents … Cluster N docs into K clusters MMR K Cluster Centroid Gapped Top K
Illustration of Three AF Methods Top-K (normal feedback) Gapped Top-K K-cluster centroid 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 … Aiming at high diversity …
Evaluating Active Feedback K docs Select K docs Judgment File + - + Feedback Results Feedback - + + Judged docs Query Initial Results No feedback (Top-k, gapped, clustering)
Retrieval Methods (Lemur toolkit) Kullback-Leibler Divergence Scoring Results Active Feedback Feedback Docs F={d1, …, dn} Mixture Model Feedback Only learn from relevant docs Document D Query Q Default parameter settings unless otherwise stated
Comparison of Three AF Methods Top-K is the worst! Clustering uses fewest relevant docs bold font = worst * = best
Appropriate Evaluation of Active Feedback Original DB with judged docs (AP88-89, HARD) New DB (AP88-89, AP90) Original DB without judged docs + + - - + + See the learning effect more explicitly But the docs must be similar to original docs Can’t tell if the ranking of un-judged documents is improved Different methods have different test documents
Comparison of Different Test Data Top-K is consistently the worst! Clustering generates fewer, but higher quality examples
Summary: Active Feedback • Presenting the top-k is not the best strategy • Clustering can generate fewer, higher quality feedback examples
Conclusions • There are many reasons for diversifying search results (redundancy, diverse information needs, active feedback) • Risk minimization framework can model all these cases of diversification • Different scenarios may need different techniques and different evaluation measures