1 / 23

Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

CLEF’10: Conference on Multilingual and Multimodal Information Access Evaluation September 20-23, Padua, Italy. Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation. Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment.

yoshe
Download Presentation

Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CLEF’10: Conference on Multilingual and MultimodalInformation Access EvaluationSeptember 20-23, Padua, Italy Tie-Breaking Bias:Effect of an Uncontrolled Parameteron Information Retrieval Evaluation Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment

  2. Effect of the Tie-Breaking BiasG. Cabanac et al. Outline • Motivation A tale about two TREC participants • Context IRS effectiveness evaluationIssue Tie-breaking bias effects • Contribution Reordering strategies • Experiments Impact of the tie-breaking bias • Conclusion and Future Works

  3. Effect of the Tie-Breaking BiasG. Cabanac et al. Outline • Motivation A tale about two TREC participants • Context IRS effectiveness evaluationIssue Tie-breaking bias effects • Contribution Reordering strategies • Experiments Impact of the tie-breaking bias • Conclusion and Future Works

  4. 1. Motivation  Tie-breaking bias illustration G. Cabanac et al. A tale about two TREC participants (1/2) Topic 031 “satellite launch contracts” 5 relevant documents Chris Ellen one single difference C = (N, 0.8), (R, 0.8), (N, 0.5) E = (N, 0.8), (R, 0.8), (N, 0.5) unlucky lucky Why such a huge difference?

  5. 1. Motivation  Tie-breaking bias illustration G. Cabanac et al. A tale about two TREC participants (2/2) Chris Ellen C = (N, 0.8), (R, 0.8), (N, 0.5) E = (N, 0.8), (R, 0.8), (N, 0.5) one single difference After 15 days of hard work  Only difference: the name of one document 

  6. Effect of the Tie-Breaking BiasG. Cabanac et al. Outline • Motivation A tale about two TREC participants • Context IRS effectiveness evaluationIssue Tie-breaking bias effects • Contribution Reordering strategies • Experiments Impact of the tie-breaking bias • Conclusion and Future Works

  7. 2. Context & issue  Tie-breaking bias G. Cabanac et al. Measuring the effectiveness of IRSs • User-centered vs. System-focused[Spärk Jones & Willett, 1997] • Evaluation campaigns • 1958 Cranfield UK • 1992 TREC Text Retrieval Conference USA • 1999 NTCIR NII Test Collection for IR Systems Japan • 2001 CLEF Cross-Language Evaluation Forum Europe • … • “Cranfield” methodology • Task • Test collection • Corpus • Topics • Qrels • Measures : MAP, P@X ... using trec_eval [Voorhees, 2007]

  8. 2. Context & issue  Tie-breaking bias G. Cabanac et al. Runs are reordered prior to their evaluation • Qrels = qid, iter,docno, rel Run = qid, iter,docno, rank,sim, run_id relevant[1 ; 127] (N, 0.8), (R, 0.8), (N, 0.5) Reordering by trec_evalqid asc, sim desc, docno desc (R, 0.8), (N, 0.8), (N, 0.5) Effectiveness measure = f (intrinsic_quality, )MAP, P@X, MRR…

  9. Effect of the Tie-Breaking BiasG. Cabanac et al. Outline • Motivation A tale about two TREC participants • Context IRS effectiveness evaluationIssue Tie-breaking bias effects • Contribution Reordering strategies • Experiments Impact of the tie-breaking bias • Conclusion and Future Works

  10. 3. Contribution  Reordering strategies G. Cabanac et al. Consequences of run reordering • Measures of effectiveness for an IRS s • RR(s,t) 1/rank of the 1st relevant document, for topic t • P(s,t,d) precision at document d, for topic t • AP(s,t)average precision for topic t • MAP(s)mean average precision • Tie-breaking bias • Is the Wall Street Journal collection more relevant than Associated Press? • Problem 1 comparing 2 systems AP(s1, t) vs. AP(s2, t) • Problem 2 comparing 2 topics AP(s, t1) vs. AP(s, t2) Sensitive to document rank Ellen Chris

  11. 3. Contribution  Reordering strategies G. Cabanac et al. Alternative unbiased reordering strategies • Conventional reordering (TREC) • Ties sorted Z  A qidasc, simdesc, docnodesc • Realistic reordering • Relevant docs last qidasc, simdesc, relasc, docnodesc • Optimistic reordering • Relevant docs first qidasc, simdesc, reldesc, docnodesc ex aequo ex aequo

  12. Effect of the Tie-Breaking BiasG. Cabanac et al. Outline • Motivation A tale about two TREC participants • Context IRS effectiveness evaluationIssue Tie-breaking bias effects • Contribution Reordering strategies • Experiments Impact of the tie-breaking bias • Conclusion and Future Works

  13. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Effect of the tie-breaking bias • Study of 4 TREC tasks • 22 editions • 1360 runs • Assessing the effect of tie-breaking • Proportion of document ties  How frequent is the bias? • Effect on measure values • Top 3 observed differences • Observed difference in % • Significance of the observed difference: Student’s t-test (paired, unilateral) 1993 1997 1998 1999 2000 2002 2004 2009 web filtering routing adhoc 3 GB of data from trec.nist.gov

  14. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Ties demographics • 89.6% of the runs comprise ties • Ties are present all along the runs

  15. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Proportion of tied documents in submitted runs On average, 25.2 % of a result-list = tied documents On average, 10.6 docs in a tied group of docs

  16. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Effect on Reciprocal Rank (RR)

  17. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Effect on Average Precision (AP)

  18. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Effect on Mean Average Precision (MAP) Difference of ranks computed on MAP not significant (Kendall’s t)

  19. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. What we learnt: Beware of tie-breaking for AP • Poor effect on MAP, larger effect on AP • Measure bounds APRealisticAPConventionnalAPOptimistic • Failure analysis for the ranking process • Error bar = element of chance  potential for improvement padre1, adhoc’94

  20. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Related works in IR evaluation Topics reliability?[Buckley & Voorhees, 2000]  25[Voorhees & Buckley, 2002] error rate[Voorhees, 2009] n collections Qrels reliability?[Voorhees, 1998] quality[Al-Maskari et al., 2008] TREC vs. TREC [Voorhees, 2007] Measures reliability?[Buckley & Voorhees, 2000] MAP  [Sakai, 2008] ‘system bias’[Moffat & Zobel, 2008] new measures [Raghavan et al., 1989] Precall [McSherry & Najork, 2008] Tied scores Pooling reliability?[Zobel, 1998] approximation [Sanderson & Joho, 2004] manual[Buckley et al., 2007] size adaptation [Cabanac et al., 2010]tie-breaking bias

  21. Effect of the Tie-Breaking BiasG. Cabanac et al. Outline • Motivation A tale about two TREC participants • Context IRS effectiveness evaluationIssue Tie-breaking bias effects • Contribution Reordering strategies • Experiments Impact of the tie-breaking bias • Conclusion and Future Works

  22. Impact du « biais des ex aequo » dans les évaluations de RI G. Cabanac et al. Conclusions and future works • Context: IR evaluation • TREC and other campaigns based on trec_eval • Contributions •  Measure = f (intrinsic_quality, luck)  tie-breaking bias • Measure bounds (realistic  conventional optimistic) • Study of the tie-breaking bias effect • (conventional, realistic) for RR, AP and MAP • Strong correlation, yet significant difference • No difference on system rankings (based on MAP) • Future works • Study of other / more recent evaluation campaigns • Reordering-free measures • Finer grained analyses: finding vs. ranking

  23. CLEF’10: Conference on Multilingual and MultimodalInformation Access EvaluationSeptember 20-23, Padua, Italy Thank you

More Related