1 / 9

A LANGUAGE MODELING APPROACH TO INFORMATION RETR I E VAL J AY M. Ponte & W. B RUCE Croft

This paper explores a language modeling approach to information retrieval, focusing on the integration of document indexing and retrieval models. The non-parametric approach is inspired by speech recognition and aims to create a single model that combines indexing and retrieval. Experimental results demonstrate the effectiveness of this approach.

jaynef
Download Presentation

A LANGUAGE MODELING APPROACH TO INFORMATION RETR I E VAL J AY M. Ponte & W. B RUCE Croft

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Murat Açar - Zeynep Çipiloğlu Yıldız A LANGUAGE MODELING APPROACH TO INFORMATION RETRIEVALJAYM. Ponte & W. BRUCECroft

  2. The problem is: • the integration of document indexing and retrieval models • the lack of an adequate indexing model • parametric assumptions • prior assumptions about the similarity of documents • The novel approach is: • non-parametric • based on probabilistic language modeling • to integrate document indexing and document retrieval models into a single model • inspired by speech recognition Introduction

  3. 2-Poisson model [Harter] • probabilistic indexing model • a subset of terms in a document is useful for indexing • identify words by distribution and assign indexing words • Robertson and Spark Jones model • estimates the probability of relevance of each document to the query • INQUERY inference network model [Turtle and Croft] • integrate indexing and retrieval by making inferences of concepts from features • features: words, phrases, or more complex structures • Bayesian network (for multiple feature sets/queries) Previous Work

  4. Method: • infer a language model for each document individually • estimate the probability of producing the query • rank the documents with respect to probabilities • Estimate the prob. of the query, given the LM of doc. d • MLE of the prob. of term t under term distribution of doc. d •  Problem: only document sized sample Language Model

  5. Risk function (geometric distribution): • Probability of producing the query for a given document model • Compute               for each candidate document and rank Language Model (cont.)

  6. 11 point recall/precision experiments on TREC data • Labrador(a research prototype retrieval engine) • Wilcoxon test • LM:  •  has better precision             at all levels •  significantly better at several levels Experimental Results

  7. Text retrieval based on probabilistic language modeling • It is both conceptually simple and explanatory • The improvement in the performance is not the main point • More significant is that a different approach to retrieval was shown to be effective • It can be improved: • Additional knowledge about the language generation process will yield better estimates • Textual/graphical tools to sense the distribution of terms Conclusion / FUTURE WORK

  8. [1]  Harter,S. P. "A  Probabilistic  Approach  to Automatic   Keyword  Indexing”  Journal  of  the  American Society  for  Information  Science,  July-August,  1975.  [2]  Robertson,  S.  E.  and  K.  Sparck Jones.  “Relevance        Weighting  Of  Search  Terms,”  Journal  of  the  American Society  for  Information  Science,  vol.  27,  1977.  [3]  Turtle  H.  and  W.  B.  Croft.  “Efficient  Probabilistic  Inference  for  Text  Retrieval,”  Proceedings  of  RIAO 3,  1991.  References

  9. THANK YOU FOR LISTENING

More Related