150 likes | 242 Views
Expanding Query Terms in Context. Chris Staff and Robert Muscat Department of Computer Science & AI University of Malta. Aims of this presentation. Background The Vocabulary Problem in IR Scenario Using retrieved documents to determine how to expand query Approach Evaluation.
E N D
Expanding Query Terms in Context Chris Staff and Robert Muscat Department of Computer Science & AI University of Malta cstaff@cs.um.edu.mt
Aims of this presentation • Background • The Vocabulary Problem in IR • Scenario • Using retrieved documents to determine how to expand query • Approach • Evaluation cstaff@cs.um.edu.mt
The Vocabulary Problem • Furnas et al, 1987, find that any two people describe the same concept/object using the same term with a probability of less than .2 • This is a huge problem for IR • High probability of finding some documents about your term (but watch ambiguous terms!) • Low probability of finding all documents about your concept (so low ‘coverage’) cstaff@cs.um.edu.mt
What’s Query Expansion? • Adding terms to query to improve recall while keeping precision high • Recall is 1 when all relevant docs are retrieved • Precision is 1 when all retrieved docs are relevant cstaff@cs.um.edu.mt
What’s Query Expansion? • Attempts to improve recall (adding synonyms) usually involve constructed thesaurus (Qiu et al, 1995, Mandala et al, 1999, Voorhees, 1994) • Attempts to improve precision (by adding restricting terms) now based around automatic relevance feedback (e.g., Mitra et al, 1998) • Indiscriminate query expansion can lead to loss of precision (Voorhees, 1994) or hurt recall cstaff@cs.um.edu.mt
Scenario • Two users search for information related to the same concept C • User queries Q1 and Q2 have no terms in common • R1 and R2 are results sets of Q1 and Q2 respectively • Rcommon = R1 R2 cstaff@cs.um.edu.mt
Scenario • We assume that Rcommon is small and non-empty (Furnas, 1985 and Furnas et al, 1987) • If Rcommon is large then Q1 and Q2 will both retrieve same set of documents • Can determine (using WordNet) if any term in Q1 is the synonym of a term in Q2 • Some doc Dk in Rcommon probably includes both terms (because of way Web IR works)! cstaff@cs.um.edu.mt
Scenario • If t1 in Q1 and t2 in Q2 are synonyms • Can expand either in future queries containing t1 or t2 • As long as doc Dk appears in results set (the context) cstaff@cs.um.edu.mt
Approach • ‘Learning’ synonyms in context • Query Expansion cstaff@cs.um.edu.mt
‘Learning’ Synonyms in Context • A document is associated with a “bag of words” ever used to retrieve doc • A term, document pair is associated with a synset for the term in the context of the doc • Word sense from WordNet also recorded to reduce ambiguity cstaff@cs.um.edu.mt
Query Expansion in Context • Submit unexpanded original user query Q to obtain results set R • For each document Dk in R (k is rank) retrieve synsets for terms in Q • Same query term in context of different docs in R may yield inconsistent synsets • Countered using Inverse Document Relevance cstaff@cs.um.edu.mt
Inverse Document Relevance • IDR is relative frequency with which doc d is retrieved in rank k when term q occurs in the query • IDRq,d = Wq,d / Wd (where Wdis number of times d retrieved, Wq,dnumber of times d retrieved when q occurs in query) cstaff@cs.um.edu.mt
Term Document Relevance • We then re-rank documents in R based on their TDR • TDRq,d,k = IDRq,d x Wq,d,k / Wd,k • Synsets of top-10 re-ranked document are merged according to word category and sense • Most frequently occurring word category, word sense pair synset used to expand q in query cstaff@cs.um.edu.mt
Evaluation • Need huge query log, ideally, with relevance judgements for queries • We have TREC QA collection, but we’ll need to index them before running the test queries through them (using, e.g., SMART) • Disadvantage that there might not be enough queries • User Studies cstaff@cs.um.edu.mt
Thank you! cstaff@cs.um.edu.mt