1 / 21

The Overlap Problem in Content-Oriented XML Retrieval Evaluation

The Overlap Problem in Content-Oriented XML Retrieval Evaluation. Gabriella Kazai 1 Mounia Lalmas 1 Arjen de Vries 2. 1 Queen Mary University of London, UK 2 CWI, The Netherlands. Outline. What is overlap and why is it a problem The INEX test collection

jubal
Download Presentation

The Overlap Problem in Content-Oriented XML Retrieval Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Overlap Problem in Content-Oriented XML Retrieval Evaluation Gabriella Kazai1 Mounia Lalmas1 Arjen de Vries2 1 Queen Mary University of London, UK 2 CWI, The Netherlands

  2. Outline • What is overlap and why is it a problem • The INEX test collection • Problems with current INEX metrics • Proposed metrics • Conclusions and future work

  3. Assessments: Ranked result list: p p . . . sec sec Overlap in XML Retrieval • Overlapping (nested) result elements in output list • Overlapping (nested) reference elements in recall-base article ... sec author sec title subsec subsec title p p

  4. Initiative for the Evaluation of XML retrieval (INEX) • Effectiveness of content-oriented XML retrieval • Ad-hoc retrieval task • Content-only (CO): no structural hints XML engine to identify most appropriate level of granularity

  5. INEX Evaluation Criterion • Two relevance dimensions! • Exhaustivity (E):how exhaustively a component discusses the topic of request • Specificity (S):how focused the component is on the topic of request (i.e. discusses no other, irrelevant topics) • Multiple grades! • highly (3), • fairly (2), • marginally (1), • not (0) exhaustive/specific • Assessments as (e,s) pairs (3,1) (3,2) (1,3) (2,3)

  6. INEX Test Collection • Documents • 12,107 articles of IEEE CS 1995-2002, 8.2 million XML elements • Topics • 31 CO topics • Relevance assessments • Propagation effect of Exhaustivity! • ~26,000 relevant elements on ~14,000 relevant paths • Propagated assessments: ~45% • Increase in size of recall-base: ~182%

  7. p sec Current INEX Metrics • inex-2002 and inex-2003 • Based on recall/precision • Quantisation functions: • E.g., generalised: • inex-2003 penalises overlap of results • Reduced score for components seen in full or in part

  8. Problem with Current INEX Metrics • Both metrics ignore overlap of reference elements! • 100% recall only if all reference elements returned including overlapping elements (contradicts task!) • Extent of problem: evaluation of an ideal run inex_2002inex_2003 inex_2002inex_2003 Strict quantisation Generalised quantisation Precision is plotted against lower recall values than merited according to the task definition!

  9. Proposed Metrics • Metrics not directly dependent on size of recall-base • Separation of ideal results vs. near misses • Metrics independent of user model • Extended Cumulated Gain (CG) based metrics • Relevance-value functions • Ideal Recall-base

  10. Ideal Recall-base and Run • Ideal recall-base • Ideal results should be retrieved; near misses could be retrieved, but should not penalise if not retrieved • Derived based on user preferences • Ideal run • Ordering elements of the ideal recall-base by relevance score (3,1) (3,2) (3,3) (1,2) (1,3)

  11. Relevance-Value (RV) Functions • Models user behaviour • Result-list independent • Based only on (e,s) value pairs (~quantisation functions) • Result-list dependent • Considers overlap of result elements (~inex-2003) : ranked result list : reflects user’s tolerance to redundant component parts

  12. Cumulated Gain • Gain vector (G) from ranked document list • Ideal gain vector (I) from documents in recall-base • Cumulated gain (CG) • Plot CGG of actual run against CGI of ideal ranking • nCGG = CGG / CGI L = <d4,d5,d2,d3,d1> G = <3,0,1,3,2> I = <3,3,2,1,0> CGG= <3,3,4,7,9> CGI= <3,6,8,9,9>

  13. Cumulated Gain for XML Recall-base: Ranked result list: Ideal gain vector I[i] = r(ci) (r(ci) from ideal recall-base) Actual gain vector G[i] = r(ci) (r(ci) from full recall-base!)

  14. Retrieval of ideal results is rewarded, near misses can be rewarded partial score, but does not penalise systems for not retrieving near misses! Cumulated Gain for XML • Multiple relevance • Result-list dependent RV function Overlap of • I derived from ideal recall-base Overlap of dimensions result elements reference elements

  15. (3,1) (3,3) Cumulated Gain for XML • However, consequences of ideal recall-base in CG • | G | < | I | • Max(CGG) > Max(CGI) G = <1,0.75,…> I = <1> I = <1,0,...> Extend ideal gain vector with irrelevant elements Force CGG to level after reaching Max(CGI)

  16. Conclusions • Unsolved issues with recall/precision due to overlap of reference elements in recall-base • XML-CG with ideal recall-base provides a solution for overlap of result and reference elements • Still possible to reward partial success without theside-effect • “Plug-in” user models: RV function used as parameter of metrics • Limitation: Max(CGG) = Max(CGI) :

  17. Future Work • Metric to be used in INEX 2004 • Evaluation of metric: stability testing • RV functions based on user models in INEX 2004 Interactive track • General problem of overlap of result elements when no predefined unit of retrieval exists

  18. Thank you

  19. Does NOTconsideroverlap ofresult elementsnoroverlap ofreference elements! inex-2002 metric • Precall [Raghavan, Bollman & Jung 1989]: • Quantisation functions • Strict • Generalised

  20. Does NOTconsideroverlap ofreference elements! inex-2003 metric • E,S in ideal concept space [Gövert, Kazai, Fuhr & Lalmas 2003]: • Quantisation functions • Strict • Generalised

More Related