PRES A Score Metric for Evaluating Recall-Oriented IR Applications

SIGIR, 22 July 2010 PRES A Score Metric for Evaluating Recall-Oriented IR Applications Walid Magdy Gareth Jones Dublin City University

Recall-Oriented IR • Examples: patent search and legal search • Objective: find all possible relevant documents • Search: takes much longer • Users: professionals and more patient • IR Campaigns: NTCIR, TREC, CLEF • Evaluation: mainly MAP!!!

Current Evaluation Metrics For a topic with 4 relevant docs and 1st 100 docs are to be checked: System1: relevant ranks = {1} System2: relevant ranks = {50, 51, 53, 54} System3: relevant ranks = {1, 2, 3, 4} System4: relevant ranks = {1, 98, 99, 100} APsystem1 = 0.25 APsystem2 = 0.0481 APsystem3 = 1 Rsystem1 = 0.25 Rsystem2 = 1 Rsystem3 = 1 F1system1 = 0.0192 F1system2 = 0.0769 F1system3 = 0.0769 F1system1 = 0.25 F1system2 = 0.0917 F1system3 = 1 F4system1 = 0.25 F4system2 = 0.462 F4system3 = 1 APsystem4 = 0.2727 Rsystem4 = 1 F4system4 = 0.864

Normalized Recall (Rnorm) Rnorm is the area between the actual case and the worst as a proportion of the area between the best and the worst. N: collection size n: number of relevant docs ri: the rank at which the ith relevant document is retrieved

Applicability of Rnorm • Rnorm requires the following: • Known collection size (N) • Number of relevant documents (qrels) (n) • Retrieving documents till reaching 100% recall (ri) • Workaround: • Un-retrieved relevant docs are considered as worst case • For large scale document collection: Rnorm ≈ Recall

Rnorm Modification PRES: Patent Retrieval Evaluation Score Nworst_case = Nmax+ n For recall = 1 n/Nmax≤≤ 1 For recall = R  nR2/Nmax≤≤R Rnorm|M PRES Rnorm|M PRES

PRES Performance For a topic with 4 relevant docs and 1st 100 docs are to be checked: System1: relevant ranks = {1} System2: relevant ranks = {50, 51, 53, 54} System3: relevant ranks = {1, 2, 3, 4} System4: relevant ranks = {1, 98, 99, 100} n = 4, Nmax = 100

Average Performance PRES 0.66 0.87 Correlation MAP Recall 0.56 • 48 runs in CLEF-IP 2009 • PRES vs MAP vs Recall • Change in Scores • Change in Ranking • Nmax = 1000

PRES Designed for recall-oriented applications Gives higher score for systems achieving higher recall and better average relative ranking Designed for laboratory testing Dependent on user’s potential/effort (Nmax) Going to be applied in CLEF-IP 2010 GetPRESevalfrom:www.computing.dcu.ie/~wmagdy/PRES.htm

What should I say next? • Let me check MAP system Recall system PRES system Thank bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla Thank you bla bla bla bla bla bla bla bla Thank you bla bla bla bla bla bla bla bla Thank you What should I say?

PRES A Score Metric for Evaluating Recall-Oriented IR Applications

PRES A Score Metric for Evaluating Recall-Oriented IR Applications

Presentation Transcript

Are you ready for a recall?

pres

A Metric for Evaluating Static Analysis Tools

Methodologies and Procedures for Evaluating Coverage and Content Error Pres. 6

Are You Ready For a Recall ?

Total Recall for Ajax Applications – Firefox extension

A Metric for Software Readability

Evaluating FERMI features for Data Mining Applications

Applications of IR spectroscopy

Object-Oriented Design for Delphi Applications

Evaluating Partial Reconfiguration for Embedded FPGA Applications

A New Metric for Evaluating the Throughput Performance of HEW

Evaluating Evaluation Metrics for Ontology-Based Applications

A Distributed Aspect-Oriented System for J2EE Applications

A Metric-based Approach for Reconstructing Methods in Object-Oriented Systems

A New Metric for Evaluating the Throughput of HEW

User-Oriented IR Models

Evaluating IR systems/search engines

A New Metric for Evaluating the Throughput Performance of HEW

User-Oriented IR Models