Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval

Debasis Ganguly Johannes Leveling Gareth Jones Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval

Outline Standard blind relevance feedback Sentence based query expansion Does it fit into LM? Evaluation on FIRE Bengali and English ad-hoc topics Comparison with term based query expansion Conclusions

Standard Blind Relevance Feedback (BRF) • Assume top R documents from initial retrieval as relevant. • Extract feedback terms from these documents: • Choose terms which occur in most number of pseudo-relevant documents (e.g. VSM) • Choose terms with highest value of RSV scores (e.g. BM25) • Choose terms with highest value of LM scores (e.g. LM) • Expand query with and final retrieval

What standard BRF assumes (wrongly) The whole document is relevant All R feedback documents are equally relevant Query t2 t1

Ideal scenario The whole document is relevant. Restrict the choice of feedback terms to the relevant segments of the documents Query t2 t1

Can we get closer to the ideal? Extract sentences most similar to the query assuming these sentences constitute relevant text chunks. Impossible to accurately know the relevant segments Query

Sentence selection using rank Make the number of sentences to add for a document proportional to its rank Not all documents are equally relevant Query

In short • Documents are often composed of a few main topics and a series of short, sometimes densely discussed subtopics. • Feedback terms chosen from a whole document might introduce a topic shift. • Good expansion terms might exist in a particular subtopic. • Terms with close proximity to the query terms might be useful for feedback.

Does this fit into LM? • Add a part of D1 to Q • Add a part of D2 to Q • As a result Q starts looking like D1 and D2 which increases the likelihood of generation Qexp D1 Noisy channel Query D2 Qexp Dn

Tools • FIRE collection comprises of newspaper articles from different genres like sports, business etc. in several Indian languages • Morphadorner package used for sentence demarcation • Stopword lists • Standard SMART stopword list for English • Default stopword list provided by FIRE organizers for Bengali • Stemmers • Rule based stemmer for Bengali • Porter’s stemmer for English • LM implemented in SMART used for indexing and retrieval

Setup • Baseline is standard BRF using terms occurring in most number of relevant documents • Two variants of sentence based expansion tried out • BRFcns: constant number of sentences for each document • BRFvns: variable number of sentences (proportional to retrieval rank)

Parameter Settings • R: # of documents assumed to be relevant, • varied in [10,40] • T: # of terms to add • varied in [10,40] • m: # of sentences to add from the top ranked document • varied in [2,10]

Best MAPs BRF BRFvns BRFcns

Query drift analysis • As a result of adding too many terms the original query might be completely off-the-mark from the original information need • Measured with impact of changes in precision values per query • An easy query is one for which P@20 for initial retrieval is good • Queries categorized into groups by initial retrieval P@20 • A good feedback algorithm would improve many (ideally bad) queries and hurt performance of a few (ideally good) queries

Query drift analysis BRFcns BRF BRFvns

Comparison to True Relevance Feedback • The best possible average precision in IR is obtained by True Relevance Feedback • A BRF method should be as close as possible to this oracle.

Conclusions • The new approach improves over standard BRF by • using sentences instead of whole documents • distinguishing between the amount of pseudo-relevance • Significantly improves MAP on four ad-hoc topic sets as compared to standard BRF for two languages • Is able to add more true relevant terms as compared to standard BRF

Queries?

Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval

Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval

Presentation Transcript

From sentence to sense level information retrieval

Cross-Language Information Retrieval

Sentence Expansion

Cross-Language Information Retrieval

Query Expansion

Information Retrieval - Query expansion

Mining Dependency Relations for Query Expansion in Passage Retrieval

QUERY AND DOCUMENT EXPANSION IN TEXT RETRIEVAL

Query Expansion in Information Retrieval using a Bayesian Network-Based Thesaurus

Information Retrieval - Query expansion

Query-based Rule Modeling

A Language Modeling Approach to Information Retrieval

Language Modeling Frameworks for Information Retrieval

Automatic Query Expansion in Information Retrieval

Query Caching in Agent-based Distributed Information Retrieval

Challenges in Information Retrieval and Language Modeling

Query Expansion

Modeling Diversity in Information Retrieval

Information Retrieval Modeling

Information Retrieval - Query expansion

Model-based Feedback in the Language Modeling Approach to Information Retrieval