1 / 20

Cumulative Progress in Language Models for Information Retrieval

Cumulative Progress in Language Models for Information Retrieval. Antti Puurula 6/12/2013 Australasian Language Technology Workshop University of Waikato. Ad-hoc Information Retrieval. Ad-hoc Information Retrieval (IR) forms the basic task in IR:

jera
Download Presentation

Cumulative Progress in Language Models for Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cumulative Progress in Language Models for Information Retrieval Antti Puurula 6/12/2013 Australasian Language Technology Workshop University of Waikato

  2. Ad-hoc Information Retrieval • Ad-hoc Information Retrieval (IR) forms the basic task in IR: • Given a query, retrieve and rank documents in a collection • Origins: • Cranfield 1 (1958-1960), Cranfield 2 (1962-1966), SMART (1961-1999) • Major evaluations: • TREC Ad-hoc (1990-1999), TREC Robust (2003-2005), CLEF (2000-2009), INEX (2009-2010), NTCIR (1999-2013), FIRE (2008-2013)

  3. Illusionary Progress in Ad-hoc IR • TREC ad-hoc evaluations stopped in 1999, as progress plateaued • More diverse tasks became the foci of research • “There is little evidence of improvement in ad-hoc retrieval technology over the past decade” (Armstrong et al. 2009) • Weak baselines, non-cumulative improvements • ⟶“no way of using LSI achieves a worthwhile improvement in retrieval accuracy over BM25” (Atreya & Elkan, 2010) • ⟶ “there remains very little room for improvement in ad hoc search” (Trotman & Keeler, 2011)

  4. Progress in Language Models for IR? • Language Models (LM) form one of the main approaches to IR • Many improvements to LMs not adopted generally or evaluated systematically • TF-IDF feature weighting • Pitman-Yor Process smoothing • Feedback models • Are these improvements consistent across standard datasets, cumulative, and do they improve on a strong baseline?

  5. Query Likelihood Language Models • Query Likelihood (QL) (Kalt 1996, Hiemstra 1998, Ponte & Croft 1998) is the basic application of LMs for IR • Unigram case: using count vectors to represent documents and queries, rank documents given a query according to • Assuming a generative model , and uniform priors over :

  6. Query Likelihood Language Models 2 • The unigram QL-score for each document becomes: • where is the Multinomial coefficient, and document models are given by the Maximum Likelihood estimates:

  7. Pitman-Yor Process Smoothing • Standard methods for smoothing in IR LMs are Dirichlet Prior (DP) and 2-Stage Smoothing (2SS) (Zhai & Lafferty 2004, Smucker & Allan 2007) • Recent suggested improvement is Pitman-Yor Process smoothing (PYP), an approximation to inference on a Pitman-Yor Process (Momtazi & Klakow 2010, Huang & Renals 2010) • All methods interpolate unsmoothed parameters with a background distribution. PYP additionally discounts the unsmoothed counts

  8. Pitman-Yor Process Smoothing 2 • All methods share the form: • DP: • 2SS: • PYP: , and

  9. Pitman-Yor Process Smoothing 2 • All methods share the form: • DP: • 2SS: • PYP: , and ,

  10. Pitman-Yor Process Smoothing 3 • The background model is most commonly estimated by concatenating all collection documents into a single document: • Less commonly, a uniform background model is used:

  11. TF-IDF Feature Weighting • Multinomial modelling assumptions of text can be corrected with TF-IDF weighting (Rennie et al. 2003, Frank & Bouckaert 2006) • Traditional view: IDF-weighting unnecessary with IR LMs (Zhai& Lafferty 2004) • Recent view: combination is complementary (Smucker & Allan 2007, Momtazi et al. 2010)

  12. TF-IDF Feature Weighting 2 • Dataset documents can be weighted by TF-IDF: • , where is the unweighted count vector, the number of documents, and number of documents where word occurs • First factor is TF log transform using unique length normalization (Singhal et al. 1996) • Second factor is Robertson-Walker IDF(Robertson & Zaragoza 2009)

  13. TF-IDF Feature Weighting 3 • IDF has a overlapping function to collection smoothing (Hiemstra & Kraaij 1998) • Interaction taken into account by replacing collection model by a uniform model in smoothing:

  14. Model-based Feedback • Pseudo-feedback is a traditional method in Ad-hoc IR: • Using the retrieved documents for original query , construct and rank using a new query • With LMs two different formalizations enable model-based feedback: • Kl-Divergence Retrieval (Zhai & Lafferty 2001) • Relevance Models (Lavrenko & Croft 2001) • Both enable replacing the original query counts by a model

  15. Model-based Feedback 2 • Many modeling choices exist for the feedback models, such as: • Using top retrieved documents (commonly ) • Truncating the word vector to words present in the original query • Weighting the feedback documents using • Interpolating the feedback model with the original query • These modeling choices are combined here

  16. Model-based Feedback 3 • The interpolated query model is estimated for the query words from the top document models : • , where is the interpolation weight and is normalizer:

  17. Experimental Setup • Ad-hoc IR experiments conducted on 13 standard datasets • TREC1-5 split according to data source • OHSU-TREC • FIRE 2008-2011 English • Preprocessing: stopword & short word() removal, Porter-stemming • Each dataset split into development and evaluation subsets

  18. Experimental Setup 2 • Software used for experiments was the SGMWeka 1.44 toolkit: • http://sourceforge.net/projects/sgmweka/ • Smoothing parameters optimized on development sets using Gaussian Random Searches (Luke 2009) • Evaluation performed on evaluation sets, using Mean Average Precision of top documents (MAP@50) • Significance tested with paired one-tailed t-tests between the datasets, with

  19. Results • Significant differences: • PYP > DP • PYP+TI > 2SS • PYP+TI+FB > PYP+TI • PYP+TI+FB improves on 2SS by 4.07 MAP@50 absolute, a 17.1% relative improvement

  20. Discussion • The 3 evaluated improvements in language models for IR: • require little additional computation • can be implemented with small modifications to existing IR systems • are substantial, significant and cumulative across 13 standard datasets, compared to DP and 2SS baselines (4.07 MAP@50 absolute, 17.1% relative) • Improvements requiring more computation possible • document neighbourhood smoothing, word correlation models, passage-based LMs, bigram LMs, … • More extensive evaluations needed for confirming progress

More Related