100 likes | 115 Views
SIGIR, 22 July 2010. PRES A Score Metric for Evaluating Recall-Oriented IR Applications. Walid Magdy Gareth Jones Dublin City University. Recall-Oriented IR. Examples: patent search and legal search Objective: find all possible relevant documents Search: takes much longer
E N D
SIGIR, 22 July 2010 PRES A Score Metric for Evaluating Recall-Oriented IR Applications Walid Magdy Gareth Jones Dublin City University
Recall-Oriented IR • Examples: patent search and legal search • Objective: find all possible relevant documents • Search: takes much longer • Users: professionals and more patient • IR Campaigns: NTCIR, TREC, CLEF • Evaluation: mainly MAP!!!
Current Evaluation Metrics For a topic with 4 relevant docs and 1st 100 docs are to be checked: System1: relevant ranks = {1} System2: relevant ranks = {50, 51, 53, 54} System3: relevant ranks = {1, 2, 3, 4} System4: relevant ranks = {1, 98, 99, 100} APsystem1 = 0.25 APsystem2 = 0.0481 APsystem3 = 1 Rsystem1 = 0.25 Rsystem2 = 1 Rsystem3 = 1 F1system1 = 0.0192 F1system2 = 0.0769 F1system3 = 0.0769 F1system1 = 0.25 F1system2 = 0.0917 F1system3 = 1 F4system1 = 0.25 F4system2 = 0.462 F4system3 = 1 APsystem4 = 0.2727 Rsystem4 = 1 F4system4 = 0.864
Normalized Recall (Rnorm) Rnorm is the area between the actual case and the worst as a proportion of the area between the best and the worst. N: collection size n: number of relevant docs ri: the rank at which the ith relevant document is retrieved
Applicability of Rnorm • Rnorm requires the following: • Known collection size (N) • Number of relevant documents (qrels) (n) • Retrieving documents till reaching 100% recall (ri) • Workaround: • Un-retrieved relevant docs are considered as worst case • For large scale document collection: Rnorm ≈ Recall
Rnorm Modification PRES: Patent Retrieval Evaluation Score Nworst_case = Nmax+ n For recall = 1 n/Nmax≤≤ 1 For recall = R nR2/Nmax≤≤R Rnorm|M PRES Rnorm|M PRES
PRES Performance For a topic with 4 relevant docs and 1st 100 docs are to be checked: System1: relevant ranks = {1} System2: relevant ranks = {50, 51, 53, 54} System3: relevant ranks = {1, 2, 3, 4} System4: relevant ranks = {1, 98, 99, 100} n = 4, Nmax = 100
Average Performance PRES 0.66 0.87 Correlation MAP Recall 0.56 • 48 runs in CLEF-IP 2009 • PRES vs MAP vs Recall • Change in Scores • Change in Ranking • Nmax = 1000
PRES Designed for recall-oriented applications Gives higher score for systems achieving higher recall and better average relative ranking Designed for laboratory testing Dependent on user’s potential/effort (Nmax) Going to be applied in CLEF-IP 2010 GetPRESevalfrom:www.computing.dcu.ie/~wmagdy/PRES.htm
What should I say next? • Let me check MAP system Recall system PRES system Thank bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla Thank you bla bla bla bla bla bla bla bla Thank you bla bla bla bla bla bla bla bla Thank you What should I say?