250 likes | 374 Views
Evaluation of Relevance Feedback Algorithms for XML Retrieval. Silvana Solomon 27 February 2007. Supervisor: Dr. Ralf Schenkel. Outline. Short introduction Motivation & Goals Evaluating retrieval effectiveness INEX tool Evaluation methodology Results. (4) expanded query. (1) query.
E N D
Evaluation of Relevance Feedback Algorithms for XML Retrieval Silvana Solomon 27 February 2007 Supervisor: Dr. Ralf Schenkel
Outline • Short introduction • Motivation & Goals • Evaluating retrieval effectiveness • INEX tool • Evaluation methodology • Results Evaluation of RF Algorithms for XML Retrieval
(4) expanded query (1) query (2) results (5) results of expanded query (3) feedback Introduction article XML SearchEngine Feedback backmatter frontmatter body sec sec sec „The IR process is composed…“ author„Ian Ruthven“ citation„D. Harman“ subsec subsec„For small collections…“ p „Figure 1 outlines…“ p p Path to the result Content of result Evaluation of RF Algorithms for XML Retrieval
Motivation • Best way to compare feedback algorithms? • Cannot use standard evaluation tools on feedback results • Goals: • Analyze evaluation methods • Develop an evaluation tool Evaluation of RF Algorithms for XML Retrieval
Evaluating Retrieval Effectiveness • INEX: INitiative for the Evaluation of XML Retrieval • 2006 document collection: 600,000 Wikipedia documents Document collection Human assessors Topics set Assessments set Metrics Evaluation of RF Algorithms for XML Retrieval
INEX Tool: EvalJ • Tool for evaluation of information retrieval experiments • Implements a set of metrics used for evaluation • Limitations: cannot measure improvement of runs produced with feedback Evaluation of RF Algorithms for XML Retrieval
RF Evaluation – Ranking Effect Mark in top results relevant doc[3] doc[3] doc[8]/bdy[1]/article[3] doc[8]/bdy[1]/article[3] doc[7]/article[3] doc[2]/bdy[1]/article[1] • push the known relevant results to the top of the element ranking • artificially improves RP figures Evaluation of RF Algorithms for XML Retrieval
RF Evaluation – Feedback Effect • measure improvement on unseen relevant elements • not directly tested Mark in top results relevant Modify FB run Evaluate untrained results Evaluation of RF Algorithms for XML Retrieval
Evaluation Methodology (1) • Standard text IR: freezing known results at the top independent results assumption • New approach: remove known results+X from the collection • resColl-result: remove results only (~doc retrieval) • resColl-desc: remove results+descendants • resColl-anc: remove results+ancestors • resColl-path: remove results+desc+anc • resColl-doc: remove whole doc with known results Evaluation of RF Algorithms for XML Retrieval
Evaluation Methodology (2) • Freezing: Evaluation of RF Algorithms for XML Retrieval
Freezing: Evaluation Methodology (2) block top-3 Evaluation of RF Algorithms for XML Retrieval
Evaluation Methodology (2) • Freezing: block top-3 Evaluation of RF Algorithms for XML Retrieval
Freezing: Evaluation Methodology (2) block top-3 Evaluation of RF Algorithms for XML Retrieval
Freezing: Evaluation Methodology (2) block top-3 Evaluation of RF Algorithms for XML Retrieval
Evaluation Methodology (3) • resColl-path: Evaluation of RF Algorithms for XML Retrieval
Evaluation Methodology (3) • resColl-path: Evaluation of RF Algorithms for XML Retrieval
Evaluation Methodology (3) • resColl-path: Evaluation of RF Algorithms for XML Retrieval
Evaluation Methodology (3) • resColl-path: Evaluation of RF Algorithms for XML Retrieval
Evaluation Methodology (3) • resColl-path: Evaluation of RF Algorithms for XML Retrieval
Evaluation Methodology (3) • resColl-path: Evaluation of RF Algorithms for XML Retrieval
Best Evaluation Methodology? • resColl-path article frontmatter body backmatter sec sec citation„D. Harman“ author„Ian Ruthven“ sec „The IR process is composed…“ subsec„For small collections…“ subsec p p P „Figure 1 outlines…“ Evaluation of RF Algorithms for XML Retrieval
Testing Evaluated Results • Standard method: average – problems: • t-test & Wilcoxon signed-rank test: gives probability p that the baseline run is better than the feedback run • experiment significant if p<0.05 or p<0.01 Evaluation of RF Algorithms for XML Retrieval
Results (1) • Evaluation mode: resColl-path Evaluation of RF Algorithms for XML Retrieval
Results (2) • Comparison of evaluation techniques based on relative improvement w.r.t. baseline run Evaluation of RF Algorithms for XML Retrieval
Conclusions & Future Work • Evaluation based on different techniques & metrics • Correct improvement measurement • Not solved: comparing several systems with different output • Maybe a hybrid evaluation mode Evaluation of RF Algorithms for XML Retrieval