80 likes | 207 Views
Language Modeling Frameworks for Information Retrieval. John Lafferty School of Computer Science Carnegie Mellon University. ?. Excerpt. ?. …. Ranked list. Query. 1. 2. 3. 4. ?. Clustering. Retrieval As Decision Making. Given a query,
E N D
Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University
? Excerpt ? … Ranked list Query 1 2 3 4 ? Clustering Retrieval As Decision Making Given a query, - Which documents should be selected? (D) - How should these docs be presented to the user? () Language Modeling and Information Retrieval Workshop
U q User Query Partially observed observed R inferred d S Document Source Decision Theory Framework Unified framework can be built on Bayesian decision theory: Models, loss function, risk minimization (Zhai, 2002) Language Modeling and Information Retrieval Workshop
Example: Aspect Retrieval Query: What are current applications of robotics? Find as many different applications as possible. Aspect judgments A1 A2 A3 … ... Ak d1 1 1 0 0 … 0 0 d2 0 1 1 1 … 0 0 d3 0 0 0 0 … 1 0 …. dk 1 0 1 0 ... 0 1 Example Aspects A1: spot-welding robotics A2: controlling inventory A3: pipe-laying robots A4: talking robot A5: robots for loading & unloading memory tapes A6: robot telephone operators A7: robot cranes … … Language Modeling and Information Retrieval Workshop
l l 1 2 l ~ Dirichlet (for example) Generative: Aspect Models(Hofmann 1999, Blei, Ng and Jordan., 2001) Aspect 1 Aspect 2 Inference: Given aspects and document, what is posterior for l? Learning: Given documents, what are the (ML) aspects? Studied recently in (Minka and Lafferty, 2002) Language Modeling and Information Retrieval Workshop
Evaluation Measures • What is the best measure? • Requires concrete specification of task • Several natural measures are computationally intractable, even assuming aspects known (e.g., aspect coverage, aspect uniqueness) • Defining aspects is difficult • Maximum likelihood cannot be expected to capture “true” semantic relationships in aspects Language Modeling and Information Retrieval Workshop
Aspect Retrieval Baselines Aspect Precision Aspect Recall
Challenges for IR Models Probabilistic language models have proven to be an effective way to reason about IR systems. We now need: • Better task specification and data • e.g., TREC interactive data inadequate • More advanced models • Fewer independence assumptions, greater structure • Improved inference and learning algorithms • Accuracy and efficiency • To handle user preferences, background knowledge • Loss function and priors/constraints Language Modeling and Information Retrieval Workshop