480 likes | 586 Views
Risk Minimization and Language Modeling in Text Retrieval. ChengXiang Zhai. Thesis Committee: John Lafferty (Chair), Jamie Callan Jaime Carbonell David A. Evans W. Bruce Croft (Univ. of Massachusetts, Amherst). Information Overflow. Web Site Growth. query. “Tips on thesis defense”.
E N D
Risk Minimization and Language Modeling in Text Retrieval ChengXiang Zhai Thesis Committee: John Lafferty (Chair), Jamie Callan Jaime Carbonell David A. Evans W. Bruce Croft (Univ. of Massachusetts, Amherst)
Information Overflow Web Site Growth
query “Tips on thesis defense” Text Retrieval (TR) database/collection Retrieval System User relevant docs text docs
Utility Challenges in TR Ad hoc parameter tuning (independent,topical) Relevance
Sophisticated Parameter Tuningin the Okapi System “k1, b and k3 are parameters which depend on the nature of the queries and possibly on the database; k1 and b default to 1.2 and 0.75 respectively, but smaller values of b are sometimes advantageous; in long queries k3 is often set to 7 or 1000 (effectively infinite).” (Robertson et al. 1999)
Desired Ranking Redundancy Readability More Than “Relevance” Relevance Ranking
Meeting the Challenges Risk Minimization Framework Parameter Estimation Statistical Language Models Bayesian Decision Theory Utility-based Retrieval
Map of Thesis New TR Framework New TR Models Features Two-stage Language Model Automatic parameter setting Risk Minimization Framework KL-divergence Retrieval Model Natural incorporation of feedback Aspect Retrieval Model Non-traditional ranking
? Unordered subset ? … Query Ranked list 1 2 3 4 ? Clustering Retrieval as Decision-Making Given a query, - Which documents should be selected? (D) - How should these docs be presented to the user? () Choose: (D,)
U q User Query Partially observed observed d S Document Source inferred Generative Model of Document & Query
Loss L L L queryq userU q Choice: (D1,1) 1 Choice: (D2,2) doc setC sourceS ... Choice: (Dn,n) N loss hidden observed RISK MINIMIZATION Bayes risk for choice (D, ) Bayesian Decision Theory
Special Cases • Set-based models (choose D) • Ranking models (choose ) • Independent loss ( PRP) • Relevance-based loss • Distance-based loss • Dependent loss • MMR loss • MDR loss Boolean model Probabilistic relevance model Vector-space Model Two-stage LM KL-divergence model Aspect retrieval model
Relevance P(d q) or P(q d) Probabilistic inference (R(q), R(d)) Similarity P(r=1|q,d) r {0,1} Probability of Relevance Regression Model (Fox 83) Generative Model Different inference system Different rep & similarity Query generation Doc generation … Inference network model (Turtle & Croft, 91) Prob. concept space model (Wong & Yao, 95) Vector space model (Salton et al., 75) Prob. distr. model (Wong & Yao, 89) Classical prob. Model (Robertson & Sparck Jones, 76) LM approach (Ponte & Croft, 98) (Lafferty & Zhai, 01a) Map of Existing TR Models
Where Are We? Two-stage Language Model Risk Minimization Framework KL-divergence Retrieval Model Aspect Retrieval Model
Loss function U q Stage 2: compute Stage 2 (Mixture model) Stage 1 Two-stage smoothing S d Stage 1: compute (Dirichlet prior smoothing) Two-stage Language Models Risk ranking formula
Keyword queries Verbose queries The Need of Query-Modeling(Dual-Role of Smoothing)
Stage-1 -Explain unseen words -Dirichlet prior(Bayesian) Stage-2 -Explain noise in query -2-component mixture c(w,d) +p(w|C) (1-) + p(w|U) |d| + P(w|d) = Two-stage Smoothing
w1 Leave-one-out P(w1|d- w1) log-likelihood w2 P(w2|d- w2) Maximum Likelihood Estimator ... wn Newton’s Method P(wn|d- wn) Estimating using leave-one-out
Stage-2 Stage-1 1 d1 P(w|d1) (1-)p(w|d1)+p(w|U) ... … ... query N dN P(w|dN) (1-)p(w|dN)+p(w|U) Maximum Likelihood Estimator Expectation-Maximization (EM) algorithm Estimating using Mixture Model
Automatic 2-stage results Optimal 1-stage results Average precision (3 DB’s + 4 query types, 150 topics)
Where Are We? Two-stage Language Model Risk Minimization Framework KL-divergence Retrieval Model Aspect Retrieval Model
Loss function Risk ranking formula U q S d KL-divergence Retrieval Models
modify Expansion-based Feedback Model-based Feedback modify Expansion-based vs. Model-based Doc model Scoring Document D Results Query Q Query likelihood Feedback Docs Doc model Document D Scoring Results KL-divergence Query model Query Q Feedback Docs
=0 =1 No feedback Full feedback Feedback as Model Interpolation Document D Results Query Q Feedback Docs F={d1, d2 , …, dn} Generative model Divergence minimization
Background words w P(w| C) F={d1,…,dn} P(source) Topic words w 1- P(w| ) Maximum Likelihood F Estimation Method I: Generative Mixture Model
d1 Background model close C F={d1,…,dn} far () dn F Estimation Method II:Empirical Divergence Minimization Empirical divergence Divergence minimization
Example of Feedback Query Model Trec topic 412: “airport security” Mixture model approach Web database Top 10 docs =0.9 =0.7
Where Are We? Two-stage Language Model Risk Minimization Framework KL-divergence Retrieval Model Aspect Retrieval Model
Aspect Retrieval Query: What are the applications of robotics in the world today? Find as many DIFFERENT applications as possible. Aspect judgments A1 A2 A3 … ... Ak d1 1 1 0 0 … 0 0 d2 0 1 1 1 … 0 0 d3 0 0 0 0 … 1 0 …. dk 1 0 1 0 ... 0 1 Example Aspects: A1: spot-welding robotics A2: controlling inventory A3: pipe-laying robots A4: talking robot A5: robots for loading & unloading memory tapes A6: robot [telephone] operators A7: robot cranes … …
#doc 1 2 3 … … #asp 2 5 8 … … #uniq-asp 2 4 5 AC: 2/1=2.0 4/2=2.0 5/3=1.67 AU: 2/2=1.0 4/5=0.8 5/8=0.625 Accumulated counts Evaluation Measures • Aspect Coverage (AC): measures per-doc coverage • #distinct-aspects/#docs • Equivalent to the “set cover” problem, NP-hard • Aspect Uniqueness(AU): measures redundancy • #distinct-aspects/#aspects • Equivalent to the “volume cover” problem, NP-hard • Examples 0 0 0 1 0 0 1 0 1 0 1 1 0 0 1 0 0 0 1 0 1 … ... d1 d2 d3
Maximal Marginal Relevance (MMR) 1 Novelty/Redundancy Nov ( k+1| 1 … k) The best dk+1 is novel & relevant k Relevance Rel( k+1) ? dk+1 k+1 Maximal Diverse Relevance (MDR) Aspect Coverage Distrib. p(a|i) 1 The best dk+1 is complementary in coverage k k+1 Loss Function L( k+1| 1 … k) known d1 … dk
Maximal Marginal Relevance (MMR) Models • Maximizing aspect coverage indirectly through redundancy elimination • Elements • Redundancy/Novelty measure • Combination of novelty and relevance • Proposed & studied six novelty measures • Proposed & studied four combination strategies
Ref. document Maximum Likelihood Expectation-Maximization P(w|Old) Collection P(w|Background) A Mixture Model for Redundancy =? 1-
Cost-based Combination of Relevance and Novelty Relevance score Novelty score
Maximal Diverse Relevance (MDR) Models • Maximizing aspect coverage directly through aspect modeling • Elements • Aspect loss function • Generative Aspect Model • Proposed & studied KL-divergence aspect loss function • Explored two aspect models (PLSI, LDA)
U q User Query d Document S Source PLSI: LDA: Aspect Generative Model of Document & Query =(1,…, k)
U q S d Aspect Loss Function
perfect redundant “Already covered” p(a|1)... p(a|k -1) non-relevant New candidate p(a|k) Combined coverage Aspect Loss Function: Illustration Desired coverage p(a|Q)
Preliminary Evaluation: MMR vs. MDR • On the relevant data set, both MMR and MDR are effective, but they complement each other • - MMR improves AU more than AC • - MDR improves AC more than AU • On the mixed data set, however, • - MMR is only effective when relevance ranking is accurate • - MDR improves AC, even though relevance ranking is degraded.
Further Work is Needed • Controlled experiments with synthetic data • Level of redundancy • Density of relevant documents • Per-document aspect counts • Alternative loss functions • Aspect language models, especially along the line of LDA • Aspect-based feedback
New TR Models Specific Contributions • Empirical study of smoothing (dual role of smoothing) • New smoothing method (two-stage smoothing) • Automatic parameter setting (leave-one-out, mixture) New TR Framework Two-stage Language Model Risk Minimization Framework • Query/document distillation • Feedback with LMs (mixture model & div. min.) KL-divergence Retrieval Model • Unifies existing models • Incorporates LMs • Serves as a map for • exploring new models • Evaluation criteria (AC, AU) • Redundancy/novelty measures (mixture weight) • MMR with LMs (cost-comb.) • Aspect-based loss function (“collective KL-div”) Aspect Retrieval Model Summary of Contributions
Future Research Directions • Better Approximation of the risk integral • More effective LMs for “traditional” retrieval • Can we beat TF-IDF without increasing computational complexity? • Automatic parameter setting, especially for feedback models • Flexible passage retrieval, especially with HMM • Beyond unigrams (more linguistics)
More Future Research Directions • Aspect Retrieval Models • Document structure/sub-topic modeling • Aspect-based feedback • Interactive information retrieval models • Risk minimization for information filtering • Personalized & context-sensitive retrieval