1 / 34

Assessing The Retrieval

Assessing The Retrieval. A.I Lab 2007.01.20 박동훈. Contents. 4.1 Personal Assessment of Relevance 4.2 Extending the Dialog with RelFbk 4.3 Aggregated Assessment : Search Engine Performance 4.4 RAVE : A Relevance Assessment Vehicle 4.5 Summary. 4.1 Personal Assessment of Relevance.

oliver
Download Presentation

Assessing The Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assessing The Retrieval A.I Lab 2007.01.20 박동훈

  2. Contents • 4.1 Personal Assessment of Relevance • 4.2 Extending the Dialog with RelFbk • 4.3 Aggregated Assessment : Search Engine Performance • 4.4 RAVE : A Relevance Assessment Vehicle • 4.5 Summary

  3. 4.1 Personal Assessment of Relevance • 4.1.1 Cognitive Assumptions • Users trying to do ‘object recognition’ • Comparison with respect to prototypic document • Reliability of user opinions? • Relevance Scale • RelFbk is nonmetric

  4. Relevance Scale

  5. RelFbk is nonmetric • Users naturally provides only preference information • Not(metric) measurement of how relevant a retrieved document is!

  6. 4.2 Extending the Dialog with RelFbk RelFbk Labeling of the Retr Set

  7. Query Session, Linked by RelFbk

  8. 4.2.1 Using RelFbk for Query Refinment

  9. Fig 4.7 Change documents!? More/less the query that successfully / un matches them 4.2.2 Document Modifications due to RelFbk

  10. 4.3 Aggregated Assessment : Search Engine Performance • 4.3.1 Underlying Assumptions • RelFbk(q,di) assessments independent • Users’ opinions will all agree with single ‘omniscient’ expert’s

  11. 4.3.2 Consensual relevance Consensuallyrelevant

  12. 4.3.4 Basic Measures • Relevant versus Retrieved Sets

  13. Contingency table • NRet : the number of retrieved documents • NNRet : the number of documents not retrieved • NRel : the number of relevant documents • NNRel : the number of irrelevant documents • NDoc : the total number of documents

  14. 4.3.4 Basic Measures (cont)

  15. 4.3.4 Basic Measures (cont)

  16. 4.3.5 Ordering the Retr set • Each document assigned hitlist rank Rank(di) • Descending Match(q,di) • Rank(di)<Rank(dj) ⇔ Match(q,di)>Match(q,dj) • Rank(di)<Rank(dj) ⇔ Pr(Rel(di))>Pr(Rel(dj)) • Coordination level : document’s rank in Retr • Number of keywords shared by doc and query • Goal:Probability Ranking Principle

  17. A tale of tworetrievals Query1 Query2

  18. Recall/precision curve Query1

  19. Recall/precision curve Query1

  20. Retrieval envelope

  21. 4.3.6 Normalized recall Best Worst ri : i번째 relevant doc 의 hitlist rank

  22. 4.3.8 One-Parameter Criteria • Combining recall and precision • Classification accuracy • Sliding ratio • Point alienation

  23. Combining recall and precision • F-measure • [Jardine & van Rijsbergen71] • [Lewis&Gale94] • Effectiveness • [vanRijsbergen, 1979] • E=1-F, α=1/(β2+1) • α=0.5=>harmonic mean of precision & recall

  24. Classification accuracy • accuracy • Correct identification of relevant and irrelevant

  25. Sliding ratio • Imagine a nonbinary, metric Rel(di) measure • Rank1, Rank2 computed by two separate systems

  26. Point alienation • Developed to measure human preference data • Capturing fundamental nonmetric nature of RelFbk

  27. 4.3.9 Test corpora • More data required for “test corpus” • Standard test corpora • TREC:Text Retrieval Evaluation Conference • TREC’s refined queries • TREC constantly expanding, refining tasks

  28. More data required for “test corpus” • Documents • Queries • Relevance assessments Rel(q,d) • Perhaps other data too • Classification data (Reuters) • Hypertext graph structure (EB5)

  29. Standard test corpora

  30. TREC constantly expanding,refining tasks • Ad hoc queries tasks • Routing/filtering task • Interactive task

  31. Other Measure • Expected search length (ESL) • Length of “path” as user walks down HitList • ESL=Num. irrelevant documents before each relevant document • ESL for random retrieval • ESL reduction factor

  32. 4.5 Summary • Discussed both metric and nonmetric relevance feedback • The difficulties in getting users to provide relevance judgments for documents in the retrieved set • Quantified several measures of system perfomance

More Related