Query Reformulation: User Relevance Feedback

Query Reformulation:User Relevance Feedback

Introduction • Difficulty of formulating user queries • Users have insufficient knowledge of the collection make-up • Users have insufficient knowledge of the retrieval environment • Query reformulation to improve user query • two basic methods • query expansion • Expanding the original query with new terms • term reweighting • Reweighting the terms in the expanded query

Introduction • Approaches for query reformulation • user relevance feedback • based on feedback information from the user • local analysis • based on information derived from the set of documents initially retrieved (local set) • global analysis • based on global information derived from the document collection

User Relevance Feedback • User’s role in URF cycle • is presented with a list of the retrieved documents • marks relevant documents • Main idea of URF • selecting important terms, or expressions, attached to the documents that have been identified as relevant by the user • enhancing the importance of these terms in new query formulation • effect: the new query will be moved towards the relevant documents and away from the non-relevant ones

User Relevance Feedback • Advantages of URF • it shields the user from the details of the query reformulation process • users only have to provide a relevance judgment on documents • it breaks down the whole searching task into a sequence of small steps which are easier to grasp • it provides a controlled process designed to emphasize relevant terms and de-emphasize non-relevant terms

URF for Vector Model • Assumptions • the term-weight vectors of the documents identified as relevant to the query have similarities among themselves. • non-relevant documents have term-weight vectors which are dissimilar from the ones for the relevant documents. • Basic idea • reformulate the query such that it gets closer to the term-weight vector space of the relevant documents

The Perfect (Vector Model) Query • Assume we know what documents are relevant and which are not. • Given: • a collection of N documents • Cr : the set of relevant documents • What is the optimal query?

Back to Reality • Actually, what we are trying to figure out is which documents are relevant and which are not. • Our ideal query & definitions: • a collection of N documents • Cr : the set of relevant documents • Dr : set of documents user identified as relevant • Dn : set of retrieved documents not relevant • α, β, γ : tuning constants • Modified Query • (Rochio)

Rochio & Ide Variations • Standard Rochio • Ide (Regular) • Ide (Dec_Hi) • where maxnonrelevant(dj): the highest ranked non-relevant document

Tuning the Feedback • Modified Query • How do we set the tuning constants α, β, γ? • Rochio originally set α = 1 • Ide originally set α = β = γ = 1 • Often, positive relevance feedback is more valuable than negative relevance feedback. • this implies: β > γ • purely positive feedback mechanism: γ = 0

URF for Vector Model • Includes both query expansion and term reweighting • Advantages • simplicity • modified term weights are computed directly from the set of retrieved documents • good results • modified query vector does reflect a portion of the intended query semantics • Issue: As with all learning techniques, this assumes the information need is relatively static.

Evaluation of Relevance Feedback Strategies • Simplistic evaluation is to compare the results of the modified query to the original query. • Does not work!!! • Results are great but mostly due to higher ranking of documents returned by original query. • User has already seen these documents.

Evaluation of Relevance Feedback Strategies • More realistic evaluation • Compute precision and recall on residual collection (those documents not returned by the original query) • Because highly-ranked documents are removed, these results can be worse than for the original query. • That is okay if we are comparing between relevance feedback approaches.

Query Reformulation: User Relevance Feedback