260 likes | 403 Views
Understand the key performance measures in information retrieval, including precision, recall, R-precision, and user-oriented metrics. Learn about the evaluation strategies, such as query interfaces, real data testing, and scalability considerations. Explore the importance of precision-recall curves, harmonic mean, and user coverage ratios in evaluating retrieval systems. Discover advanced techniques like relevance feedback, natural language processing, and cross-language retrieval for enhancing retrieval effectiveness.
E N D
Modern Information Retrieval Chapter 3 Retrieval Evaluation
The most common measures of system performance are time and space • an inherent tradeoff • Data retrieval • time and space • indexing • Information retrieval • precision of the answer set also important
evaluation considerations • query with/without feedback • query interface design • real data/synthetic data • real life/laboratory environment • repeatability and scalability
recall and precision • recall: fraction of relevant documents which has been retrieved • precision: fraction of retrieved documents which is relevant
can we precisely compute precisions? can we precisely compute recalls?
precision versus recall curve: a standard evaluation strategy
interpolation procedure for generating the 11 standard recall levels • Rq={d3,d56,d129} where j is in {0,1,2,…,10} and P(r) is a known precision
to evaluate the retrieval strategy over all test queries, the precisions at each recall level are averaged
another approach: compute average precision at given relevant document cutoff values • advantages?
single value summary for each query • average precision at seen relevant documents • example in Figure 3.2 • favor systems which retrieve relevant documents quickly • can have a poor overall recall performance • R-precision • R: total number of relevant documents • examples in Figures 3.2 and 3.3
combining recall and precision • the harmonic mean • it assumes a high value only when both recall and precision are high
the E measure • b=1, complement of the harmonic mean • b>1, the user is more interested in precision • b<1, the user is more interested in recall
coverage ratio: fraction of the documents known to be relevant which has been retrieved • the system finds the relevant documents the user expected to see
novelty ratio: fraction of the relevant documents retrieved which was previously unknown to the user • the system reveals new relevant documents previously unknown to the user
relative recall: the ratio between the number of relevant documents found and the number of relevant documents the user expected to find • relative recall= • when the relative recall equals to 1 (the user finds enough relevant documents), the user stops searching
recall effort: the ratio between the number of relevant documents the user expected to find and the number of documents examined in an attempt to find the expected relevant documents • research in IR • lack a solid formal framework • lack robust and consistent testbeds and benchmarks • Text REtrieval Conference
retrieval techniques • methods using automatic thesauri • sophisticated term weighting • natural language techniques • relevance feedback • advanced pattern matching • document collection • over 1 million documents • newspaper, patents, etc. • topics • in natural language • conversion done by the system
relevant documents • the pooling method: for each topic, collect the top k documents generated by each participating system and decide their relevance by human assessors • the benchmark tasks • ad hoc task • filtering task • Chinese • cross languages • spoken document retrieval • high precision • very large collection
evaluation measures • summary table statistics: number of documents retrieved, number of relevant documents retrieved, number of relevant documents not retrieved, etc. • recall-precision averages • document level averages: average precision at seen relevant documents • average precision histogram