280 likes | 402 Views
Conceptual structures in modern information retrieval. Claudio Carpineto Fondazione Ugo Bordoni Roma carpinet@fub.it. Overview. Keyword-based IR and early conceptual approaches Context and concepts in modern topical IR Emerging IR tasks requiring knowledge structures Research at FUB
E N D
Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Roma carpinet@fub.it
Overview • Keyword-based IR and early conceptual approaches • Context and concepts in modern topical IR • Emerging IR tasks requiring knowledge structures • Research at FUB • Conclusions
Documents Query Vectors of weighted keywords Vector of weighted keywords Matching Retrieved documents Vector-based IR
Term weighting • tf.idf and vector space model (Salton) very popular • in70’s and 80’s • BM25 (Robertson) has been the state of the art • in the 90’s • Several recent term-weighting functions based on • statistical language modeling (Ponte, Lafferty) • A new weighting framework based on deviation • from randomness + information gain (FUB + UG)
Inherent limitations of keyword-based IR • Vocabulary problem • Relations are ignored
Early approaches to conceptual IR • n-grams(Salton 1975, Maarek 1989) • parse tree(Dillon 1983, Metzler 1989) • case relations(Fillmore 1968, Somers 1987) • conceptualgraphs(Dick 1991)
Why early conceptual IR not successful • No best representation scheme • Manual coding too costly • Automated coding too hard • Training required both for the indexer and the user • Effectiveness not clearly demonstrated • Retrieval task often not appropriate
Overview • Vector-based IR and early conceptual approaches • Context and concepts in modern topical IR • Emerging IR tasks requiring knowledge structures • Research at FUB • Conclusions
Evolution of topical IR • Very short queries • Heterogeneous collections • Unreliable sources • Interactive sessions
Docs Query Context Indexing Indexing Ranking Visualization Interaction Use Model of modern topical IR
Ranking based on interdocument similarity • Cluster hypothesis (van Rijsbergen 1978) • Approaches • - Matching the query against document clusters (Willet 1988) • - Matching the query against transformed document • representations (GVSM, Wong 1987, LSI, Deerwester 1990) • Computing the conceptual distance between query and • documents (Order-theoretical ranking, Carpineto 2000)
4 KBS 3 1 1 CREDIT 3 KBS BANK FINANCE NNS (D5) 2 NNS 0 4 FINANCE 2 BANK FINANCE CREDIT NN S KBS WATERS KBS BANK (Query) (D6) (D4) 2 3 NNS NNS BANK BANK RIVER ACCOUNT (D2) (D3) 1 1 NNS NNS FINANCE FINANCE CREDIT BANK KBS ACCOUNT (D7) (D1) Order-theoretical ranking
Performance of order-theoretical ranking • Better than hierarchic clustering and comparable to • best matching on the whole collection • Markedly better than both hierarchic clustering and • best matching on non-matching relevant documents • Order-theoretical ranking does not scale up well but • it is synergistic with best matching document ranking
Overview • Vector-based IR and early conceptual approaches • Context and concepts in modern topical IR • Emerging IR tasks requiring knowledge structures • Research at FUB • Conclusions
Question Answering Task: Closed-class questions in unrestricted domains with no guarantee of answer and result possibly scattered over multiple documents
Question Answering • Approach: • Recognize type of queries • Retrieve relevant documents • Find sought entities near question words • Fall back to best-matching passage • retrieval in case of failure
Web Information Retrieval Current tasks: named-entity finding task topic distillation task • Approach: • Use of multiple methods • Combination of results via interpolation and • normalization schemes
XML document retrieval Goal: Use document structure to improve precision and recall of unstructured queries “concerts this weekend at Sofia under 20 euros” • Approaches: • Automatic inference of query structure • Semi-automatic query annotation • Hybrid query languages
Overview • Vector-based IR and early conceptual approaches • Context and concepts in modern topical IR • Emerging IR tasks requiring knowledge structures • Research at FUB • Conclusions
Recommender systems “Related keyword” feature versus Context-dependent query reformulation
Combining text retrieval and text mining with concept lattices Goal Integration of multiple search strategies (querying, browsing, thesaurus climbing, bounding) into a unique Webinterface
Conclusions The use of conceptual structures surfaces in traditional topic relevance retrieval and it is at the heart of many non-topical retrieval tasks Towards conceptual search • Understand term meaning • Adapt to the user • Can translate between applications • Explainable • Capable of filtering and summarization