210 likes | 231 Views
Explore the model-based feedback approach in language modeling for effective information retrieval. Learn about different feedback algorithms, evaluation, and future research directions.
E N D
Model-based Feedback in the Language Modeling Approach to Information Retrieval Chengxiang Zhai and John Lafferty School of Computer Science Carnegie Mellon University
Outline • The Language Modeling Approach to IR • Feedback: Expansion-based vs. Model-based • Two Model-based feedback algorithms • Evaluation • Conclusions & Future Work
Text Retrieval (TR) • Given a query, find relevant documents in a document collection ( Ranking documents) • Many applications (Web pages, News, Email, …) • Many models developed (vector space, probabilistic) • The “language modeling approach” is a new model that is promising …
Document language model Retrieval as Language Model Estimation • Document ranking based on query likelihood(Ponte & Croft 98, Miller et al. 99, Berger & Lafferty 99, Hiemstra 2000, etc.) • Retrieval problem Estimation of p(wi|d) • Many advantages:good statistical foundation, reuse existing LM methods ... • But, feedback is awkward …
Feedback in Text Retrieval • Learning from examples • In effect, new, related terms are extracted to enhance the original query • Generally leads to performance increase (both average precision and recall)
Results: d1 3.5 d2 2.4 … dk 0.5 ... Retrieval Engine Query Updated query User Document collection Judgments: d1 + d2 - d3 + … dk - ... Feedback Relevance Feedback
top 10 Pseudo/Blind/Automatic Feedback Results: d1 3.5 d2 2.4 … dk 0.5 ... Retrieval Engine Query Updated query Document collection Judgments: d1 + d2 + d3 + … dk - ... Feedback
Feedback in the Language Modeling Approach • Mostly expansion-based : adding new terms to query (Ponte 1998, Miller et al. 1999, Ng 1999) • Query term reweighting, no expansion(Hiemstra 2001) • Implicit feedback(Berger & Lafferty 99) • Conceptual inconsistency in expansion-based approaches • Original query : as text • Expanded query: as text + {terms}
Answer: Introduce a query model & treat feedback as query model updating Retrieval function: Query-likelihood => KL-Divergence Feedback: Expansion-based => Model-based Question: How to exploit language modeling to perform natural and effective feedback?
A KL-Divergence Unigram Retrieval Model • A special case of the general risk minimization retrieval framework (Lafferty & Zhai 2001) • Retrieval formula • Retrieval Estimation of Q and D • Special case: = empirical distribution of q recovers “query-likelihood” query entropy (ignored for ranking)
modify Expansion-based Feedback Model-based Feedback modify Expansion-based vs. Model-based Doc model Scoring Document D Results Query Q Query likelihood Feedback Docs Doc model Document D Scoring Results KL-divergence Query model Query Q Feedback Docs
Feedback as Model Interpolation ML+smooth Document D Results Query Q ML Feedback Docs F={d1, d2 , …, dn} =0 =1 Generative model Divergence minimization No feedback Full feedback
Background words w P(w| C) F={d1,…,dn} P(source) Topic words w 1- P(w| ) Maximum Likelihood Use EM to find F F Estimation Method I: Generative Mixture Model
d1 close C F={d1,…,dn} far () dn Empirical divergence Divergence minimization Given F, C, , solution is F Estimation Method II:Empirical Divergence Minimization
Example of Feedback Query Model Trec topic 412: “airport security” Mixture model approach Web database Top 10 docs =0.9 =0.7
Div. Min less sensitive Mixture model more sensitive origial query model =0 feedback model only =1 Sensitivity of Precision to
Mixture model less sensitive No feedback Div. min. more sensitive More common words “ignored” Sensitivity of Precision to (Mixture Model & Divergence Min., =0.5) Over discrimination can be harmful
The Lemur Toolkit • Language Modeling and Information Retrieval Toolkit • Under development at CMU and UMass • All experiments reported here were run using Lemur • http://www.cs.cmu.edu/~lemur • Contact us if you are interested in using it
Conclusions • Model-based feedback is natural and effective • Performance is sensitive to both and • Mixture model: more sensitive to , but less to (0.5) • Divergence min: more sensitive to , but less to (0.3) • The sensitivity suggests more robust models are needed. E.g., use query to focus the model • Markov chain query model (Lafferty & Zhai, 2001) • Relevance language model(Lavrenko & Croft, 2001)
Future Work • Evaluating methods for relevance feedback • Examples in pseudo feedback can be quite noisy • Relevance feedback better reflects “learning ability” • More robust feedback models, e.g., • Query-focused feedback (e.g., Query translation model) • Passage-based feedback (e.g., Hidden Markov model)