Relevance Feedback

Relevance Feedback • User tells system whether returned/disseminated documents are relevant to query/information need or not • Feedback: • usually positive • sometimes negative • always incomplete • Hypothesis: relevant docs should be more like each other than like non-relevant docs

Relevance Feedback: Purpose • Augment keyword retrieval: Query Reformulation • give user opportunity to refine their query • tailored to individual • exemplar based – different type of information from the query • Iterative, subjective improvement • Evaluation!

Relevance Feedback: Examples • Image Retrieval • http://www.cs.bu.edu/groups/ivc/ImageRover/ • http://nayana.ece.ucsb.edu/imsearch/imsearch.html • http://www.mmdb.ece.ucsb.edu/~demo/corelacm/

Relevance Feedback: Early Usage by Rocchio • Modify original keyword query • strengthen terms in relevant docs • weaken terms in non-relevant docs • modify original query by weighting based on amount of feedback

Relevance Feedback: Early Results • Evaluation: • how much feedback needed • how did recall/precision change • Conclusion: • improved recall & precision over even 1 iteration and return of up to 20 non-relevant docs • Promising technique

Query Reformulation • User does not know enough about document set to construct optimal query initially. • Querying is iterative learning process repeating two steps: • expand original query with new terms (query expansion) • assign weights to the query terms (term reweighting)

Query Reformulation Approaches • Relevance feedback based • vector model (Rocchio …) • probabilistic model (Robertson & Sparck Jones, Croft…) • Cluster based • Local analysis: derive information from retrieved document set • Global analysis: derive information from corpus

Vector Based Reformulation • Rocchio (~1965)with adjustable weights • Ide Dec Hi (~1968) counts only the most similar non-relevant document

Probabilistic Reformulation • Recall from earlier: • still need to estimate probabilities: • do so using relevance feedback!

Estimating Probabilities by Accumulating Statistics • Dr is set of relevant docs • Dr,i is set of relevant docs with term ki • ni is number of docs in corpus containing term ki

Computing Similarity (Term Reweighting) • assume: term independence and binary document indexing • Cons: no term weighting, no query expansion, ignores previous weights

Croft Extensions • include within document frequency weights • initial search variant Last term is normalized within-document frequency. C and K are adjustable parameters.

Query Reformulation: Summary so far… • Relevance feedback can produce dramatic improvements. • However, must be careful that previously judged documents are not part of “improvement” and techniques have limitations. • Next round of improvements requires clustering…

Croft Feedback Searches • Use probability updates as in Robertson

Assumptions • Initial query was a good approximation. • Ideal query is approximated by shared terms in relevant documents.

Assumptions • Initial query was a good approximation. • polysemy? synonyms? • slang? concept drift? • Ideal query is approximated by shared terms in relevant documents.

Relevance Feedback