210 likes | 359 Views
TIJAH @ INEX 2003. The Cirquid Team CWI and University of Twente. Overview. Introduction Content-Only (CO) (Pattern-Based) Structured Querying Conclusions and Future Work Questions/Discussion. Content-Only (CO). Same model as for INEX 2002 Exhaustivity (content-based relevance)
E N D
TIJAH @ INEX 2003 The Cirquid Team CWI and University of Twente
Overview • Introduction • Content-Only (CO) • (Pattern-Based) Structured Querying • Conclusions and Future Work • Questions/Discussion
Content-Only (CO) • Same model as for INEX 2002 • Exhaustivity (content-based relevance) • Statistical Language Model [Hiemstra 2000] • Specificity • Log-normal distribution • Component size, mean at ~2500 words
Structured Querying (SCAS/VCAS) • Pattern-Based Structured Querying • Collection of 3 patterns • Base pattern for determining a single subtree (pattern 1) • More complex combinations of pattern 1 instances (patterns 2 and 3)
About Function • Use of ALL topic’s keywords to process EACH OF the about clauses //article[about(./,’IR’) AND about(.//sec,’XML’)]
About Function • Use of ALL topic’s keywords to process EACH OF the about clauses //article[about(./,’IR’) AND about(.//sec,’XML’)] //article[about(./,’IR XML’) AND about(.//sec,’IR XML’)]
Pattern 1 • Simplest pattern instance • Topic 69 • VCAS and SCAS /article/bdy/sec[about(.//st, ‘…’)]
Pattern 1 – VCAS and SCAS article • Nodeset selections • Containment • Relevance • Containment bdy … sec … … st
Average ranked containing • Previous operation is ranked containing • Multiple subtrees within the target element are averaged sec st st 0.2 0.1
Average ranked containing • Previous operation is ranked containing • Multiple subtrees within the target element are averaged sec sec 0.15 st st 0.2 0.1
Pattern 2 • Topic 73 • VCAS • Absence of subtree does not render target irrelevant completely • SCAS • All subtrees specified need to be present for relevance //article[about(.//st, ‘…’) AND about(.//bib, ‘…’)]
article … … bib Pattern 2 – VCAS • Split up into set of pattern 1 instances • Combine resultsets • OR -> max • AND -> min • (non zero) st
Pattern 2 – SCAS • Split up into set of pattern 1 instances • Combine resultsets • AND -> min • OR -> max article … … st bib
Art 1 0.2 Art 2 0 Art 1 0.1 Art 2 0.3 Art 1 0.1 Art 2 0.3 Art 1 0.1 Art 2 0 Pattern 2 - Example //article[about(.//st, ‘+comparison’) AND about(.//bib, ‘machine learning’)] 1.- Execution of 2 pattern 1 //article[about(.//st, ‘comparison machine learning’)] //article[about(.//bib, ‘comparison machine learning’)] 2.- Combining results VCAS AND SCAS
Pattern 3 • Topic 64 CAS //article[about(., ‘…’)]//sec[about(., ‘…’)] • VCAS • What does the first about mean? • Drop all about-calls, except those specified • for target element • SCAS • Split up into set of pattern 1 instances • Topdown structural correlation to correct • nodeset
Pattern 3 – VCAS article (about 1) … … (about 2) sec //article//sec[about(., ‘…’)]
article 1.- about 1 … 3.- containment … 2.- about 2 sec Pattern 3 – SCAS Ranked by the scores of the target element
Art 1 0.2 Art 2 0 sec 1 0.1 sec 2 0.3 sec 1 0.1 sec 2 0.3 sec1 0.1 Pattern 3 - Example //article[about(./, ‘hollerith’)]// sec[about(., ‘DEHOMAG’)] 1.- Execution of 1 or 2 pattern 1 //article [about(./, ‘hollerith DEHOMAG’)] //article//sec [about(./, ‘hollerith DEHOMAG’)] 2.- Ranked containing Only second about VCAS In case sec 1 belongs to art 1 and sec 2 do not SCAS
//st /article/bdy/sec W Q /article/bdy/sec about avg-groupby Physical Query Plan - Pattern 1 /article/bdy/sec[about(.//st, ‘…’)]
Conclusions • CO model works pretty well • Article run still • ‘Keep it simple’ approach