1 / 15

Expanding Query Terms in Context

Expanding Query Terms in Context. Chris Staff and Robert Muscat Department of Computer Science & AI University of Malta. Aims of this presentation. Background The Vocabulary Problem in IR Scenario Using retrieved documents to determine how to expand query Approach Evaluation.

loring
Download Presentation

Expanding Query Terms in Context

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Expanding Query Terms in Context Chris Staff and Robert Muscat Department of Computer Science & AI University of Malta cstaff@cs.um.edu.mt

  2. Aims of this presentation • Background • The Vocabulary Problem in IR • Scenario • Using retrieved documents to determine how to expand query • Approach • Evaluation cstaff@cs.um.edu.mt

  3. The Vocabulary Problem • Furnas et al, 1987, find that any two people describe the same concept/object using the same term with a probability of less than .2 • This is a huge problem for IR • High probability of finding some documents about your term (but watch ambiguous terms!) • Low probability of finding all documents about your concept (so low ‘coverage’) cstaff@cs.um.edu.mt

  4. What’s Query Expansion? • Adding terms to query to improve recall while keeping precision high • Recall is 1 when all relevant docs are retrieved • Precision is 1 when all retrieved docs are relevant cstaff@cs.um.edu.mt

  5. What’s Query Expansion? • Attempts to improve recall (adding synonyms) usually involve constructed thesaurus (Qiu et al, 1995, Mandala et al, 1999, Voorhees, 1994) • Attempts to improve precision (by adding restricting terms) now based around automatic relevance feedback (e.g., Mitra et al, 1998) • Indiscriminate query expansion can lead to loss of precision (Voorhees, 1994) or hurt recall cstaff@cs.um.edu.mt

  6. Scenario • Two users search for information related to the same concept C • User queries Q1 and Q2 have no terms in common • R1 and R2 are results sets of Q1 and Q2 respectively • Rcommon = R1 R2 cstaff@cs.um.edu.mt

  7. Scenario • We assume that Rcommon is small and non-empty (Furnas, 1985 and Furnas et al, 1987) • If Rcommon is large then Q1 and Q2 will both retrieve same set of documents • Can determine (using WordNet) if any term in Q1 is the synonym of a term in Q2 • Some doc Dk in Rcommon probably includes both terms (because of way Web IR works)! cstaff@cs.um.edu.mt

  8. Scenario • If t1 in Q1 and t2 in Q2 are synonyms • Can expand either in future queries containing t1 or t2 • As long as doc Dk appears in results set (the context) cstaff@cs.um.edu.mt

  9. Approach • ‘Learning’ synonyms in context • Query Expansion cstaff@cs.um.edu.mt

  10. ‘Learning’ Synonyms in Context • A document is associated with a “bag of words” ever used to retrieve doc • A term, document pair is associated with a synset for the term in the context of the doc • Word sense from WordNet also recorded to reduce ambiguity cstaff@cs.um.edu.mt

  11. Query Expansion in Context • Submit unexpanded original user query Q to obtain results set R • For each document Dk in R (k is rank) retrieve synsets for terms in Q • Same query term in context of different docs in R may yield inconsistent synsets • Countered using Inverse Document Relevance cstaff@cs.um.edu.mt

  12. Inverse Document Relevance • IDR is relative frequency with which doc d is retrieved in rank k when term q occurs in the query • IDRq,d = Wq,d / Wd (where Wdis number of times d retrieved, Wq,dnumber of times d retrieved when q occurs in query) cstaff@cs.um.edu.mt

  13. Term Document Relevance • We then re-rank documents in R based on their TDR • TDRq,d,k = IDRq,d x Wq,d,k / Wd,k • Synsets of top-10 re-ranked document are merged according to word category and sense • Most frequently occurring word category, word sense pair synset used to expand q in query cstaff@cs.um.edu.mt

  14. Evaluation • Need huge query log, ideally, with relevance judgements for queries • We have TREC QA collection, but we’ll need to index them before running the test queries through them (using, e.g., SMART) • Disadvantage that there might not be enough queries • User Studies cstaff@cs.um.edu.mt

  15. Thank you! cstaff@cs.um.edu.mt

More Related