180 likes | 293 Views
Debasis Ganguly Johannes Leveling Gareth Jones. Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval. Outline. Standard blind relevance feedback Sentence based query expansion Does it fit into LM? Evaluation on FIRE Bengali and English ad-hoc topics
E N D
Debasis Ganguly Johannes Leveling Gareth Jones Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval
Outline Standard blind relevance feedback Sentence based query expansion Does it fit into LM? Evaluation on FIRE Bengali and English ad-hoc topics Comparison with term based query expansion Conclusions
Standard Blind Relevance Feedback (BRF) • Assume top R documents from initial retrieval as relevant. • Extract feedback terms from these documents: • Choose terms which occur in most number of pseudo-relevant documents (e.g. VSM) • Choose terms with highest value of RSV scores (e.g. BM25) • Choose terms with highest value of LM scores (e.g. LM) • Expand query with and final retrieval
What standard BRF assumes (wrongly) The whole document is relevant All R feedback documents are equally relevant Query t2 t1
Ideal scenario The whole document is relevant. Restrict the choice of feedback terms to the relevant segments of the documents Query t2 t1
Can we get closer to the ideal? Extract sentences most similar to the query assuming these sentences constitute relevant text chunks. Impossible to accurately know the relevant segments Query
Sentence selection using rank Make the number of sentences to add for a document proportional to its rank Not all documents are equally relevant Query
In short • Documents are often composed of a few main topics and a series of short, sometimes densely discussed subtopics. • Feedback terms chosen from a whole document might introduce a topic shift. • Good expansion terms might exist in a particular subtopic. • Terms with close proximity to the query terms might be useful for feedback.
Does this fit into LM? • Add a part of D1 to Q • Add a part of D2 to Q • As a result Q starts looking like D1 and D2 which increases the likelihood of generation Qexp D1 Noisy channel Query D2 Qexp Dn
Tools • FIRE collection comprises of newspaper articles from different genres like sports, business etc. in several Indian languages • Morphadorner package used for sentence demarcation • Stopword lists • Standard SMART stopword list for English • Default stopword list provided by FIRE organizers for Bengali • Stemmers • Rule based stemmer for Bengali • Porter’s stemmer for English • LM implemented in SMART used for indexing and retrieval
Setup • Baseline is standard BRF using terms occurring in most number of relevant documents • Two variants of sentence based expansion tried out • BRFcns: constant number of sentences for each document • BRFvns: variable number of sentences (proportional to retrieval rank)
Parameter Settings • R: # of documents assumed to be relevant, • varied in [10,40] • T: # of terms to add • varied in [10,40] • m: # of sentences to add from the top ranked document • varied in [2,10]
Best MAPs BRF BRFvns BRFcns
Query drift analysis • As a result of adding too many terms the original query might be completely off-the-mark from the original information need • Measured with impact of changes in precision values per query • An easy query is one for which P@20 for initial retrieval is good • Queries categorized into groups by initial retrieval P@20 • A good feedback algorithm would improve many (ideally bad) queries and hurt performance of a few (ideally good) queries
Query drift analysis BRFcns BRF BRFvns
Comparison to True Relevance Feedback • The best possible average precision in IR is obtained by True Relevance Feedback • A BRF method should be as close as possible to this oracle.
Conclusions • The new approach improves over standard BRF by • using sentences instead of whole documents • distinguishing between the amount of pseudo-relevance • Significantly improves MAP on four ad-hoc topic sets as compared to standard BRF for two languages • Is able to add more true relevant terms as compared to standard BRF