Conceptual structures in modern information retrieval

Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Roma carpinet@fub.it

Overview • Keyword-based IR and early conceptual approaches • Context and concepts in modern topical IR • Emerging IR tasks requiring knowledge structures • Research at FUB • Conclusions

Documents Query Vectors of weighted keywords Vector of weighted keywords Matching Retrieved documents Vector-based IR

Term weighting • tf.idf and vector space model (Salton) very popular • in70’s and 80’s • BM25 (Robertson) has been the state of the art • in the 90’s • Several recent term-weighting functions based on • statistical language modeling (Ponte, Lafferty) • A new weighting framework based on deviation • from randomness + information gain (FUB + UG)

Inherent limitations of keyword-based IR • Vocabulary problem • Relations are ignored

Early approaches to conceptual IR • n-grams(Salton 1975, Maarek 1989) • parse tree(Dillon 1983, Metzler 1989) • case relations(Fillmore 1968, Somers 1987) • conceptualgraphs(Dick 1991)

Why early conceptual IR not successful • No best representation scheme • Manual coding too costly • Automated coding too hard • Training required both for the indexer and the user • Effectiveness not clearly demonstrated • Retrieval task often not appropriate

Overview • Vector-based IR and early conceptual approaches • Context and concepts in modern topical IR • Emerging IR tasks requiring knowledge structures • Research at FUB • Conclusions

Evolution of topical IR • Very short queries • Heterogeneous collections • Unreliable sources • Interactive sessions

Docs Query Context Indexing Indexing Ranking Visualization Interaction Use Model of modern topical IR

Performance of retrieval feedback versus query difficulty

Ranking based on interdocument similarity • Cluster hypothesis (van Rijsbergen 1978) • Approaches • - Matching the query against document clusters (Willet 1988) • - Matching the query against transformed document • representations (GVSM, Wong 1987, LSI, Deerwester 1990) • Computing the conceptual distance between query and • documents (Order-theoretical ranking, Carpineto 2000)

4 KBS 3 1 1 CREDIT 3 KBS BANK FINANCE NNS (D5) 2 NNS 0 4 FINANCE 2 BANK FINANCE CREDIT NN S KBS WATERS KBS BANK (Query) (D6) (D4) 2 3 NNS NNS BANK BANK RIVER ACCOUNT (D2) (D3) 1 1 NNS NNS FINANCE FINANCE CREDIT BANK KBS ACCOUNT (D7) (D1) Order-theoretical ranking

Performance of order-theoretical ranking • Better than hierarchic clustering and comparable to • best matching on the whole collection • Markedly better than both hierarchic clustering and • best matching on non-matching relevant documents • Order-theoretical ranking does not scale up well but • it is synergistic with best matching document ranking

Question Answering Task: Closed-class questions in unrestricted domains with no guarantee of answer and result possibly scattered over multiple documents

Question Answering • Approach: • Recognize type of queries • Retrieve relevant documents • Find sought entities near question words • Fall back to best-matching passage • retrieval in case of failure

Web Information Retrieval

Web Information Retrieval Current tasks: named-entity finding task topic distillation task • Approach: • Use of multiple methods • Combination of results via interpolation and • normalization schemes

XML document retrieval Goal: Use document structure to improve precision and recall of unstructured queries “concerts this weekend at Sofia under 20 euros” • Approaches: • Automatic inference of query structure • Semi-automatic query annotation • Hybrid query languages

Recommender systems “Related keyword” feature versus Context-dependent query reformulation

Combining text retrieval and text mining with concept lattices Goal Integration of multiple search strategies (querying, browsing, thesaurus climbing, bounding) into a unique Webinterface

Conclusions The use of conceptual structures surfaces in traditional topic relevance retrieval and it is at the heart of many non-topical retrieval tasks Towards conceptual search • Understand term meaning • Adapt to the user • Can translate between applications • Explainable • Capable of filtering and summarization

Conceptual structures in modern information retrieval

Conceptual structures in modern information retrieval

Presentation Transcript

Modern Information Retrieval

Modern Information Retrieval: A Brief Overview

Modern Information Retrieval

Modern Information Retrieval

Modern Information Retrieval

Modern Information Retrieval

Modern Information Retrieval

Modern Information Retrieval Chapter 1: Introduction

Modern Information Retrieval

Modern information retrieval

Modern information retrieval

Modern Information Retrieval Chapter 1: Introduction

Modern information retrieval

Modern Information Retrieval

Modern Information Retrieval

Modern Information Retrieval

Modern information retrieval

Modern Information Retrieval