1 / 10

PRES A Score Metric for Evaluating Recall-Oriented IR Applications

SIGIR, 22 July 2010. PRES A Score Metric for Evaluating Recall-Oriented IR Applications. Walid Magdy Gareth Jones Dublin City University. Recall-Oriented IR. Examples: patent search and legal search Objective: find all possible relevant documents Search: takes much longer

johna
Download Presentation

PRES A Score Metric for Evaluating Recall-Oriented IR Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SIGIR, 22 July 2010 PRES A Score Metric for Evaluating Recall-Oriented IR Applications Walid Magdy Gareth Jones Dublin City University

  2. Recall-Oriented IR • Examples: patent search and legal search • Objective: find all possible relevant documents • Search: takes much longer • Users: professionals and more patient • IR Campaigns: NTCIR, TREC, CLEF • Evaluation: mainly MAP!!!

  3. Current Evaluation Metrics For a topic with 4 relevant docs and 1st 100 docs are to be checked: System1: relevant ranks = {1} System2: relevant ranks = {50, 51, 53, 54} System3: relevant ranks = {1, 2, 3, 4} System4: relevant ranks = {1, 98, 99, 100} APsystem1 = 0.25 APsystem2 = 0.0481 APsystem3 = 1 Rsystem1 = 0.25 Rsystem2 = 1 Rsystem3 = 1 F1system1 = 0.0192 F1system2 = 0.0769 F1system3 = 0.0769 F1system1 = 0.25 F1system2 = 0.0917 F1system3 = 1 F4system1 = 0.25 F4system2 = 0.462 F4system3 = 1 APsystem4 = 0.2727 Rsystem4 = 1 F4system4 = 0.864

  4. Normalized Recall (Rnorm) Rnorm is the area between the actual case and the worst as a proportion of the area between the best and the worst. N: collection size n: number of relevant docs ri: the rank at which the ith relevant document is retrieved

  5. Applicability of Rnorm • Rnorm requires the following: • Known collection size (N) • Number of relevant documents (qrels) (n) • Retrieving documents till reaching 100% recall (ri) • Workaround: • Un-retrieved relevant docs are considered as worst case • For large scale document collection: Rnorm ≈ Recall

  6. Rnorm Modification PRES: Patent Retrieval Evaluation Score Nworst_case = Nmax+ n For recall = 1 n/Nmax≤≤ 1 For recall = R  nR2/Nmax≤≤R Rnorm|M PRES Rnorm|M PRES

  7. PRES Performance For a topic with 4 relevant docs and 1st 100 docs are to be checked: System1: relevant ranks = {1} System2: relevant ranks = {50, 51, 53, 54} System3: relevant ranks = {1, 2, 3, 4} System4: relevant ranks = {1, 98, 99, 100} n = 4, Nmax = 100

  8. Average Performance PRES 0.66 0.87 Correlation MAP Recall 0.56 • 48 runs in CLEF-IP 2009 • PRES vs MAP vs Recall • Change in Scores • Change in Ranking • Nmax = 1000

  9. PRES Designed for recall-oriented applications Gives higher score for systems achieving higher recall and better average relative ranking Designed for laboratory testing Dependent on user’s potential/effort (Nmax) Going to be applied in CLEF-IP 2010 GetPRESevalfrom:www.computing.dcu.ie/~wmagdy/PRES.htm

  10. What should I say next? • Let me check MAP system Recall system PRES system Thank bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla Thank you bla bla bla bla bla bla bla bla Thank you bla bla bla bla bla bla bla bla Thank you What should I say?

More Related