200 likes | 297 Views
Approximate Counting of Frequent Query Patterns over XQuery Stream. Liang Huai Yang, Mong Li Lee, Wynne HSU DASFAA 2004 Speaker:M ing Jing Tsai. Introduction. Efficient approach to improve XML management system Cache frequently retrieved results Frequent query patterns application
E N D
Approximate Counting of Frequent Query Patterns over XQuery Stream Liang Huai Yang, Mong Li Lee, Wynne HSU DASFAA 2004 Speaker:Ming Jing Tsai
Introduction • Efficient approach to improve XML management system • Cache frequently retrieved results • Frequent query patterns • application • Search engine • XML query system
Preliminaries • S = QPT1,QPT2,…,QPTN • Query pattern trees(QPT) • Label:{“*”,”//”} ∪tagset • Rooted subtree(RST) • root(RST) = root(QPT) • RSTV’ QPTV , RSTE’ QPTE
book title author price book title author price QPT book book author section title price fn ln title QPT1 QPT2 QPT3 RST
Approximate Counting • rst.count app ≧ (σ-ε)N • rst.count app ≧ rst.counttrue-Εn • XQuery stream divided into buckets of w = • bcurrent =
book book 1 2 3 8 title title author author price price D-GQPT book 1 author section price 2 3 6 8 title 4 5 7 fn ln title RST3 1,2,-1,3,-1,8,-1
book book 1 2 3 8 title title author author price price D-GQPT book 1 author section price 2 3 6 8 title 4 5 7 fn ln title RST3 1,2,-1,4,-1,9,-1
1 1 1 1 1 2 3 6 8 Grmlne = Grmlne = Grmlne Grmlne Gjoin Gjoin 1 1 1 1 1 1 1 1 1 Gjoin 1 1 1 2 3 6 3 2 3 2 3 3 8 6 8 3 6 8 6 8 6 3 3 6 5 4 7 Gjoin Grmlne Gjoin 1 1 3 3 6 8 7 4 5 4 4 ECTree
Candidate Generation • Rightmost active leaf node expansion Grmlne( )= • Gjoin ( )= | = X j = i+1,…,N
Prune • RSTK+1 doesn’t exist in ECTree • RSTk+1.Δ = bcurrent - β • | RSTK+1.tidlist| < β prune • RSTK+1 exists in ECTree • RSTK+1.countapp = RSTK+1. countapp+|RSTK+1.tidlist| • RSTK+1.countapp +RSTk+1.Δ < bcurrent prune • Join result with RSTK+1 • subtree induced by RSTK+1
1 1 1 1 1 2 3 6 8 Grmlne = Grmlne = Grmlne Grmlne Gjoin Gjoin 1 1 1 1 1 1 1 1 1 Gjoin 1 1 1 2 3 6 3 2 3 2 3 3 8 6 8 3 6 8 6 8 6 3 3 6 5 4 7 Gjoin Grmlne Gjoin 1 1 3 3 6 8 7 4 5 4 4 ECTree
Experiment • P4 2.4GHz, 1GB RAM, WINXP • DBLP DTD:98 nodes • Shakespears’ Play DTD: 23 nodes