220 likes | 332 Views
Two-stage Language Models for Information Retrieval. ChengXiang Zhai*, John Lafferty School of Computer Science Carnegie Mellon University. *New Address Department of Computer Science University of Illinois, Urbana-Champaign. Motivation. Retrieval parameters are needed to
E N D
Two-stage Language Models for Information Retrieval ChengXiang Zhai*, John Lafferty School of Computer Science Carnegie Mellon University *New Address Department of Computer Science University of Illinois, Urbana-Champaign
Motivation • Retrieval parameters are needed to • model different user preferences • customize a retrieval model according to different queries and documents • So far, parameters have been set through empirical experimentation • Can we set parameters automatically?
Parameters in Traditional Models • EXTERNAL to the model, hard to interpret • Most parameters are introduced heuristically to implement our “intuition” • As a result, no principles to quantify them • Set through empirical experiments • Lots of experimentation • Optimality for new queries is not guaranteed
Example of Parameter Tuning (Okapi) “k1, b and k3 are parameters which depend on the nature of the queries and possibly on the database; k1 and b default to 1.2 and 0.75 respectively, but smaller values of b are sometimes advantageous; in long queries k3 is often set to 7 or 1000 (effectively infinite).” (Robertson et al. 1999)
The Way to Automatic Tuning ... • Parameters must be PART of the model! • Query modeling (explain difference in query) • Document modeling (explain difference in doc) • De-couple the influence of a query on parameter setting from that of documents • To achieve stable setting of parameters • To pre-compute query-independent parameters
The Rest of the Talk Risk Minimization Retrieval Framework Two-stage Language Models Two-stage Dirichlet-Mixture smoothing Parameter estimation
The Risk Minimization Framework(Lafferty & Zhai 01, Zhai 02) QUERY MODELING Query Language Model Query USER MODELING ? User Retrieval Decision: Loss Function Documents Document Language Models DOC MODELING
Parameter Setting in Risk Minimization Estimate Estimate Query model parameters Set User model parameters Doc model parameters Query Language Model Query User Loss Function Documents Document Language Models
Two-stage Language Models stage-2 stage-1 Risk ranking formula 1 2 Query Query Language Model q Loss Function Smoothing! d Document Language Model Doc
Sensitivity in Traditional (“one-stage”) Smoothing Keyword Verbose (sentence-like)
The Need of Two-stage Smoothing (I) Accurate Estimation of Doc Model Language Model P(w|d) Document Query = “data mining algorithms” … text 10/500=0.02 mining 3/500=0.006 assocation 1/500=0.002 algorithm 2/500=0.004 … data 0/500=0 … ? Text mining paper p(q) = p(“data”|d)p(“mining”|d)p(“algorithm”|d) = 0*0.006*0.004 = 0! P(“data”|d) = ? P(“unicorn”|d) = ?
The Need of Two-stage Smoothing (II)Explanation of Noise in Query Query = “the algorithms for data mining” d1: 0.04 0.001 0.02 0.002 0.003 d2: 0.02 0.001 0.01 0.003 0.004 p( “algorithms”|d1) = p(“algorithm”|d2) p( “data”|d1) < p(“data”|d2) p( “mining”|d1) < p(“mining”|d2) But p(q|d1)>p(q|d2)! We should make p(“the”) and p(“for”) less different for all docs.
Two-stage Dirichlet-Mixture Smoothing Stage-1 Smoothing -Explain unseen words -Dirichlet prior -Add pseudo counts Stage-2 Smoothing -Explain noise in query -2-component mixture -Linear interpolation c(w,d) +p(w|C) (1-) + p(w|U) |d| + P(w|d) =
Estimating using leave-one-out w1 Leave-one-out P(w1|d- w1) log-likelihood w2 P(w2|d- w2) Maximum Likelihood Estimator ... wn Newton’s Method P(wn|d- wn)
Estimating using Mixture Model Stage-2 Stage-1 1 d1 P(w|d1) (1-)p(w|d1)+ p(w|U) ... … ... query N dN P(w|dN) (1-)p(w|dN)+ p(w|U) Simultaneously adjust , and 1,…, N to maximize query likelihood Maximum Likelihood Estimator Expectation-Maximization (EM) algorithm
Effectiveness of Parameter Estimation • Five databases • News articles (AP, WSJ, ZIFF, FBIS, FT, LA) • Government documents (Federal Register) • Web pages • Four types of queries • Long vs. short • Verbose (sentence-like) vs. keyword • Results: Automatic 2-stage Optimal 1-stage
Automatic 2-stage results Optimal 1-stage results Average precision (3 DB’s + 4 query types, 150 topics)
Automatic 2-stage results Optimal 1-stage results Average precision ( 2 large DB’s + 2 query types, 50 topics)
Conclusions • Two-stage language models • Direct modeling of both queries and documents • Parameters are part of a probabilistic model • Parameters can be estimated using standard estimation techniques • Two-stage Dirichlet-Mixture smoothing • Involves two meaningful parameters (I.e., document sample size and query noise) • Achieves very good performance through automatically setting smoothing parameters • It is possible to set parameters automatically!
Future Work • Optimality analysis in the two-stage parameter space • Offline vs. online estimation • Alternative estimation methods • Parameter estimation for more sophisticated language models (e.g., with feedback)