1 / 16

Relevance Feedback

Relevance Feedback. User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes negative always incomplete Hypothesis: relevant docs should be more like each other than like non-relevant docs.

carsten
Download Presentation

Relevance Feedback

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relevance Feedback • User tells system whether returned/disseminated documents are relevant to query/information need or not • Feedback: • usually positive • sometimes negative • always incomplete • Hypothesis: relevant docs should be more like each other than like non-relevant docs

  2. Relevance Feedback: Purpose • Augment keyword retrieval: Query Reformulation • give user opportunity to refine their query • tailored to individual • exemplar based – different type of information from the query • Iterative, subjective improvement • Evaluation!

  3. Relevance Feedback: Examples • Image Retrieval • http://www.cs.bu.edu/groups/ivc/ImageRover/ • http://nayana.ece.ucsb.edu/imsearch/imsearch.html • http://www.mmdb.ece.ucsb.edu/~demo/corelacm/

  4. Relevance Feedback: Early Usage by Rocchio • Modify original keyword query • strengthen terms in relevant docs • weaken terms in non-relevant docs • modify original query by weighting based on amount of feedback

  5. Relevance Feedback: Early Results • Evaluation: • how much feedback needed • how did recall/precision change • Conclusion: • improved recall & precision over even 1 iteration and return of up to 20 non-relevant docs • Promising technique

  6. Query Reformulation • User does not know enough about document set to construct optimal query initially. • Querying is iterative learning process repeating two steps: • expand original query with new terms (query expansion) • assign weights to the query terms (term reweighting)

  7. Query Reformulation Approaches • Relevance feedback based • vector model (Rocchio …) • probabilistic model (Robertson & Sparck Jones, Croft…) • Cluster based • Local analysis: derive information from retrieved document set • Global analysis: derive information from corpus

  8. Vector Based Reformulation • Rocchio (~1965)with adjustable weights • Ide Dec Hi (~1968) counts only the most similar non-relevant document

  9. Probabilistic Reformulation • Recall from earlier: • still need to estimate probabilities: • do so using relevance feedback!

  10. Estimating Probabilities by Accumulating Statistics • Dr is set of relevant docs • Dr,i is set of relevant docs with term ki • ni is number of docs in corpus containing term ki

  11. Computing Similarity (Term Reweighting) • assume: term independence and binary document indexing • Cons: no term weighting, no query expansion, ignores previous weights

  12. Croft Extensions • include within document frequency weights • initial search variant Last term is normalized within-document frequency. C and K are adjustable parameters.

  13. Query Reformulation: Summary so far… • Relevance feedback can produce dramatic improvements. • However, must be careful that previously judged documents are not part of “improvement” and techniques have limitations. • Next round of improvements requires clustering…

  14. Croft Feedback Searches • Use probability updates as in Robertson

  15. Assumptions • Initial query was a good approximation. • Ideal query is approximated by shared terms in relevant documents.

  16. Assumptions • Initial query was a good approximation. • polysemy? synonyms? • slang? concept drift? • Ideal query is approximated by shared terms in relevant documents.

More Related