Language Modeling Frameworks for Information Retrieval

Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University

? Excerpt ? … Ranked list Query 1 2 3 4 ? Clustering Retrieval As Decision Making Given a query, - Which documents should be selected? (D) - How should these docs be presented to the user? () Language Modeling and Information Retrieval Workshop

U q User Query Partially observed observed R inferred d S Document Source Decision Theory Framework Unified framework can be built on Bayesian decision theory: Models, loss function, risk minimization (Zhai, 2002) Language Modeling and Information Retrieval Workshop

Example: Aspect Retrieval Query: What are current applications of robotics? Find as many different applications as possible. Aspect judgments A1 A2 A3 … ... Ak d1 1 1 0 0 … 0 0 d2 0 1 1 1 … 0 0 d3 0 0 0 0 … 1 0 …. dk 1 0 1 0 ... 0 1 Example Aspects A1: spot-welding robotics A2: controlling inventory A3: pipe-laying robots A4: talking robot A5: robots for loading & unloading memory tapes A6: robot telephone operators A7: robot cranes … … Language Modeling and Information Retrieval Workshop

l l 1 2 l ~ Dirichlet (for example) Generative: Aspect Models(Hofmann 1999, Blei, Ng and Jordan., 2001) Aspect 1 Aspect 2 Inference: Given aspects and document, what is posterior for l? Learning: Given documents, what are the (ML) aspects? Studied recently in (Minka and Lafferty, 2002) Language Modeling and Information Retrieval Workshop

Evaluation Measures • What is the best measure? • Requires concrete specification of task • Several natural measures are computationally intractable, even assuming aspects known (e.g., aspect coverage, aspect uniqueness) • Defining aspects is difficult • Maximum likelihood cannot be expected to capture “true” semantic relationships in aspects Language Modeling and Information Retrieval Workshop

Aspect Retrieval Baselines Aspect Precision Aspect Recall

Challenges for IR Models Probabilistic language models have proven to be an effective way to reason about IR systems. We now need: • Better task specification and data • e.g., TREC interactive data inadequate • More advanced models • Fewer independence assumptions, greater structure • Improved inference and learning algorithms • Accuracy and efficiency • To handle user preferences, background knowledge • Loss function and priors/constraints Language Modeling and Information Retrieval Workshop

Language Modeling Frameworks for Information Retrieval

Language Modeling Frameworks for Information Retrieval

Presentation Transcript

Language Models for Information Retrieval

Natural Language Processing for Information Retrieval

Cross-Language Information Retrieval

Information Retrieval – Language models for IR

Cross-Language Information Retrieval

Modeling frameworks

Two-stage Language Models for Information Retrieval

Retrieval Algorithm Frameworks

Natural Language Processing for Information Retrieval

Cross Language Information Retrieval (CLIR)

A Language Modeling Approach to Information Retrieval

Formal Retrieval Frameworks

Statistical Language Modeling for Speech Recognition and Information Retrieval

Challenges in Information Retrieval and Language Modeling

Dependence Language Model for Information Retrieval

Modeling Diversity in Information Retrieval

Information Retrieval Modeling

Statistical Language Modeling for Speech Recognition and Information Retrieval