340 likes | 459 Views
Assessing The Retrieval. A.I Lab 2007.01.20 박동훈. Contents. 4.1 Personal Assessment of Relevance 4.2 Extending the Dialog with RelFbk 4.3 Aggregated Assessment : Search Engine Performance 4.4 RAVE : A Relevance Assessment Vehicle 4.5 Summary. 4.1 Personal Assessment of Relevance.
E N D
Assessing The Retrieval A.I Lab 2007.01.20 박동훈
Contents • 4.1 Personal Assessment of Relevance • 4.2 Extending the Dialog with RelFbk • 4.3 Aggregated Assessment : Search Engine Performance • 4.4 RAVE : A Relevance Assessment Vehicle • 4.5 Summary
4.1 Personal Assessment of Relevance • 4.1.1 Cognitive Assumptions • Users trying to do ‘object recognition’ • Comparison with respect to prototypic document • Reliability of user opinions? • Relevance Scale • RelFbk is nonmetric
RelFbk is nonmetric • Users naturally provides only preference information • Not(metric) measurement of how relevant a retrieved document is!
4.2 Extending the Dialog with RelFbk RelFbk Labeling of the Retr Set
Fig 4.7 Change documents!? More/less the query that successfully / un matches them 4.2.2 Document Modifications due to RelFbk
4.3 Aggregated Assessment : Search Engine Performance • 4.3.1 Underlying Assumptions • RelFbk(q,di) assessments independent • Users’ opinions will all agree with single ‘omniscient’ expert’s
4.3.2 Consensual relevance Consensuallyrelevant
4.3.4 Basic Measures • Relevant versus Retrieved Sets
Contingency table • NRet : the number of retrieved documents • NNRet : the number of documents not retrieved • NRel : the number of relevant documents • NNRel : the number of irrelevant documents • NDoc : the total number of documents
4.3.5 Ordering the Retr set • Each document assigned hitlist rank Rank(di) • Descending Match(q,di) • Rank(di)<Rank(dj) ⇔ Match(q,di)>Match(q,dj) • Rank(di)<Rank(dj) ⇔ Pr(Rel(di))>Pr(Rel(dj)) • Coordination level : document’s rank in Retr • Number of keywords shared by doc and query • Goal:Probability Ranking Principle
A tale of tworetrievals Query1 Query2
Recall/precision curve Query1
Recall/precision curve Query1
4.3.6 Normalized recall Best Worst ri : i번째 relevant doc 의 hitlist rank
4.3.8 One-Parameter Criteria • Combining recall and precision • Classification accuracy • Sliding ratio • Point alienation
Combining recall and precision • F-measure • [Jardine & van Rijsbergen71] • [Lewis&Gale94] • Effectiveness • [vanRijsbergen, 1979] • E=1-F, α=1/(β2+1) • α=0.5=>harmonic mean of precision & recall
Classification accuracy • accuracy • Correct identification of relevant and irrelevant
Sliding ratio • Imagine a nonbinary, metric Rel(di) measure • Rank1, Rank2 computed by two separate systems
Point alienation • Developed to measure human preference data • Capturing fundamental nonmetric nature of RelFbk
4.3.9 Test corpora • More data required for “test corpus” • Standard test corpora • TREC:Text Retrieval Evaluation Conference • TREC’s refined queries • TREC constantly expanding, refining tasks
More data required for “test corpus” • Documents • Queries • Relevance assessments Rel(q,d) • Perhaps other data too • Classification data (Reuters) • Hypertext graph structure (EB5)
TREC constantly expanding,refining tasks • Ad hoc queries tasks • Routing/filtering task • Interactive task
Other Measure • Expected search length (ESL) • Length of “path” as user walks down HitList • ESL=Num. irrelevant documents before each relevant document • ESL for random retrieval • ESL reduction factor
4.5 Summary • Discussed both metric and nonmetric relevance feedback • The difficulties in getting users to provide relevance judgments for documents in the retrieved set • Quantified several measures of system perfomance