460 likes | 566 Views
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data. by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian Yu. Outline. Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates Notation and data structure
E N D
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian Yu
Outline • Introduction and motivation • Related Work • Preliminaries • Match Twig Query with Not-predicates • Notation and data structure • A holistic matching algorithm: TwigStackList¬ • Experiments • Conclusion TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Introduction • XML data representation rapidly increases popularity • XML documents modeled as ordered trees. • XML queries specify patterns of selection predicates on multiple elements having some structural relationships (parent-child, ancestor-descendant) TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Introduction: NOT-twig Query • Twig query: a small tree whose nodes are tags, attributes or text values and edges are either Parent-Child (P-C) edges or Ancestor-Descendant (A-D) edges. • Twig query can have not-predicates • Example: • Normal-twig query (without not-predicate): Q1:suppliersDatabase/supplier[//store]//part • Not-twig query (with not-predicates): Q2:suppliersDatabase/supplier[NOT(//store)]//part Intuitive meaning:selects all partelements which are descendent of supplier elements having no descendant element store. TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Motivation: • Naïve Method for evaluating not-twig queries: • Decompose the NOT-twig query into several normal-twig queries • Evaluate the decomposed queries with existing method (twigStack or twigStackList) • final result can be calculated (using set operators) based on the results of the individual decomposed quires. • For example: Q2: Two decomposed Queries: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Motivation: • Disadvantage of the naïve method: • Not optimal. Many useless intermediate results are produced • Additional disk I/O: Clearly many elements are accessed more than once. • Our objective: • Efficient processing of XML not-twig queries. TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Outline • Introduction and motivation • Related Work • Preliminaries • Match Twig Query with Not-predicates • Notation and data structure • A holistic matching algorithm: TwigStackList¬ • Experiments • Conclusion TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Previous work: TwigStack [1] • TwigStack [1]: a holistic approach to match twig query without not-predicates • Two-phase algorithm: Phase 1: The intermediateroot to leaf path solutions are outputted Phase 2: Merge the intermediate path solutions to get the final results • TwigStack: optimal when the query contains only ancester-descendant relationship • parent-child relationship: TwigStack may output some intermediate path solutions that cannot contribute to final solutions. • Therefore, TwigStack is sub-optimal for queries with parent-child relationships 1. N. Bruno, D. Srivastava, and N. Koudas. “Holistic twig joins: optimal xml pattern matching.” In Proceedings of ACM SIGMOD, 2002. TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Previous work: TwigStackList [2] • The main problem of TwigStack is to assume all edges are ancestor-descendant relationship in the first phase. So it is not efficient for queries with parent-child relationships. • Improved method: TwigStackList [2] • There is an additional list structure for each query node to cache elements that likely participate in final solutions. • TwigStackList is optimal when there is no P-C edge for branching nodes, therefore, it is more efficient than TwigStack. 2. J. Lu, T. Chen, and T. W. Ling. “Efficient processing of xml twig patterns with parent child edges: a look-ahead approach.” In Proceedings ofCIKM, pages 533- 542, 2004. TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Previous work: Query Predicates[3][4][5] • XML Query Predicate • OR-predicates [3] • Ordered-predicates[4] • not-predicates on XPath[5] 3. H. Jiang, H. Lu, and W. Wang. “Efficient processing of twig queries with OR-predicates.” In Proceeding of SIGMOD, pages 59–70, 2004. 4. J. Lu, T. W. Ling, T. Yu, C. Li, and W. Ni. “Efficient processing of ordered XML twig pattern.” In Proceeding of DEXA, 2005. 5. E. Jiao, T.W. Ling, C. Y. Chan, and P. S. Yu. “Pathstack¬: A holistic path join algorithm for path query with not-predicates on XML data.” In Proceedings of DASFAA, 2005. TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Previous work: Query Predicates[3][4][5] • OR-predicate • Define OR-block • AND/OR-twig can be viewed as an AND-twig with elements and OR-blocks. • Ordered XML query matching • Use look-ahead list to check for order • Support Four axis: following-sibling, preceding-sibling, following, and preceding a > b c d 3. H. Jiang, H. Lu, and W. Wang. “Efficient processing of twig queries with OR-predicates.” In Proceeding of SIGMOD, pages 59–70, 2004. 4. J. Lu, T. W. Ling, T. Yu, C. Li, and W. Ni. “Efficient processing of ordered XML twig pattern.” In Proceeding of DEXA, 2005. TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Previous work: not-Predicates on XPath [5] • It evaluating path queries with not-predicates, based on PathStack algorithm • Define: non-output node, output node in a query • non-output node: node that appear below a negative edge • output node: node that does not appear below any negative edge • XPath query with NOT-predicate • Each query node is also associated with a stack which is either a regular stack or a boolean stack • Check NOT-predicate by updating the boolean value • Cannot match not-twig queries Output node: A, B Non-Output node: C, D 5. E. Jiao, T.W. Ling, C. Y. Chan, and P. S. Yu. “Pathstack¬: A holistic path join algorithm for path query with not-predicates on XML data.” In Proceedings of DASFAA, 2005. TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Outline • Introduction and motivation • Related Work • Preliminaries • Match Twig Query with Not-predicates • Notation and data structure • A holistic matching algorithm: TwigStackList¬ • Experiments • Conclusion TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Data model • An XML document is modeled as a rooted, ordered and tagged tree • Labelling scheme: Region Coding(startPos, endPos, LevelNum) • Given e1, e2: • e1 is ancestor of e2: iff e1.start < e2.start and e1.end > e2.end • e1 is parent of e2: iff e1.start < e2.start and e1.end > e2.end, and e1.level + 1=e2.level (1,123,1) e1 book e2 (5,12,2) (2,4,2) (13,24,2) preface chapter chapter … chapter (3,3,3) (9,11,3) “Intro..” (6,8,3) (14,16,3) (17,19,3) … title section section title section (7,7,4) (15,15,4) (18,18,4) (10,10,4) e2 “Data..” “Data..” “…” “…” “…” TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
ProposedSubquery MatchingMethod: • Positive/negative children of a query node: • Positive children of A: C, D, E • Negative children of A: B, F NOT twig query TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
ProposedSubquery MatchingMethod: Given a NOT-twig query (with a query node n) and an XML data tree (Data), we say that an element en (with the tag n) in the XML data tree satisfies the sub-query rooted at n iff: (1) n is a leaf node of NOT-query; OR (2) For each child node m of n: (Four cases) (case i)If m is a positive PC child node of n, there is an element em in Data such that em is a child element of en and satisfies the sub-query rooted at m in Data. (case ii)If m is a positive AD child node of n, there is an element em in Data such that em is a descendant element of en and satisfies the sub-query rooted at m in Data. Case 1 Case 2 e e n n n n … e e m m m m Query Data Query Data TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
ProposedSubquery MatchingMethod: (case iii)If m is a negative PC child node of n, there does not exist any element em in Data such that em is a child element of en and satisfies the sub-query rooted at m in Data. (case iv)If m is a negative AD child node of n, there does not exist any element em in Data such that em is a descendant element of en and satisfies the sub query rooted at m in Data. Case 1 Case 2 Case 3 Case 4 e e e e n n n n n n n n … … e e e e m m m m m m m m Query Data Query Data Query Data Query Data TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Subquery Matching: Example • Leaf nodes: B1, B2 and D1 has satisfy the subqueries rooted at B and D respectively. • C1satisfies subquery rooted at C, since D1 is a descendant of C1 • Since C1 satisfies the subquery rooted at C, we can safely say A1 does not satisfy subquery rooted at A. It is because C1 is a child of A1 and in the NOT-twig, node C is a negative child of node A. • A2 satisfies subquery rooted at A since A2 has a descendent B2satisfies subquery rooted at B, and does not have any child with the tag C. TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Outline • Introduction and motivation • Related Work • Preliminaries • Match Twig Query with Not-predicates • Notation and data structure • A holistic matching algorithm: TwigStackList¬ • Experiments • Conclusion TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
TwigStackList¬Algorithm: Data structure Each node n in the twig query has: Stream, List, and Stack • Data Stream: Tn • we partition an XML document into streams • All elements in a stream are of the same tag and ordered by their start Position • The elements in each stream is read only once from head to tail. a1 Level 1: Ta a1, a2, a3 a a3 b2 a2 2: b1 , b2 b c d d1, d2, d3 Tb Td d3 d1 3: d2 b1 Tc C1 , C2 4: c2 c1 Document TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
TwigStackList¬Algorithm: Data structure Each node with tag name n in the twig query has: Stream, List, and Stack • Data Stream: Tn • List: Ln • The elements in lists help to check for P-C relationship and negative edges • Stack: Sn • Stacks is used for each output node • Store elements that are the potential solutions of the XML query. • The stack size if bounded by the document root to leaf depth. Sa La a1, a2… a Ld d1 ,d3 .. Sb b c d Lb b1 .. Lc C1 .. TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Outline • Introduction and motivation • Related Work • Preliminaries • Match Twig Query with Not-predicates • Notation and data structure • A holistic matching algorithm: TwigStackList¬ • Experiments • Conclusion TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
TwigStackList¬ Algorithm: An Example A1 Document: B1 C1 D1 A2 Query: A B2 B C T A1, A2 A D T B1, B2 B T C1 C T D1 D TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
TwigStackList¬ Algorithm: An Example A1 Document: B1 C1 D1 A2 Query: A B2 B C T A1, A2 A D T B1, B2 B T C1 C T D1 D Next Action: Create Stacks of every output nodes TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
TwigStackList¬ Algorithm: An Example A1 Document: B1 C1 D1 A2 Query: A B2 B C T A1, A2 A D T B1, B2 B T C1 C T D1 D Next Action: Create lists of every query nodes TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
TwigStackList¬ Algorithm: An Example A1 Document: B1 C1 D1 A2 Query: A B2 B C C1 T A1, A2 A D T B1, B2 B T C1 C T D1 D C1 has a child D1 Next Action: Push C1 into list of C TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
TwigStackList¬ Algorithm: An Example A1 Document: B1 C1 A1 D1 A2 Query: A B2 B C C1 T A1, A2 A D T B1, B2 B T C1 C T D1 D A1 has descendant B1 Next Action: Push A1 into list of A TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
TwigStackList¬ Algorithm: An Example A1 Document: B1 C1 A1 D1 A2 Query: A B2 B C C1 T A1, A2 A D T B1, B2 B T C1 C C1 is a child of A1, thus A1 does not satisfy the negative edge of the query between A and C. T D1 D Next Action: Remove A1 from list of A TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
TwigStackList¬ Algorithm: An Example A1 Document: B1 C1 D1 A2 Query: A B2 B C C1 T A1, A2 A D T B1, B2 B T C1 C T D1 D Next Action: Advance A TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
TwigStackList¬ Algorithm: An Example A1 Document: B1 C1 D1 A2 Query: A B2 B C C1 T A1, A2 A D T B1, B2 B T C1 C T D1 D Next Action: Advance B TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
TwigStackList¬ Algorithm: An Example A1 Document: B1 C1 D1 A2 Query: A B2 B C C1 T A1, A2 A D T B1, B2 B T C1 C T D1 D Next Action: Advance C, reach end of stream TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
TwigStackList¬ Algorithm: An Example A1 Document: B1 C1 A2 D1 A2 Query: A B2 B C C1 T A1, A2 A D T B1, B2 B T C1 C T D1 D A2 has descendant B2 Next Action: Push A2 into list of A TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
TwigStackList¬ Algorithm: An Example A1 Document: B1 C1 A2 D1 A2 Query: A B2 B C C1 T A1, A2 A D T B1, B2 B T C1 C A2 has descendant B2 and no child with tag C T D1 D Next Action: Final Solution: A2, B2 TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
TwigStackList¬ Algorithm: Optimality • TwigStack is optimal for A-D only relationship. • TwigStackList is optimal when no branching node has P-C relationship. • TwigStackList¬ is optimal if a branching node doesn’t have more than one positive relationships among which at least one is P-C edge. Negative P-C for Branch node A-D for branching node A-D only TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Outline • Introduction and motivation • Related Work • Preliminaries • Match Twig Query with Not-predicates • Notation and data structure • A holistic matching algorithm: TwigStackList¬ • Experiments • Conclusion TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
TwigStackList¬ Algorithm: Experiments • Algorithms for comparison: • naÏve -TwigStack (NTS) • naÏve-TwigStackList (NTSL) • Our proposed TwigStackList¬ • Benchmarks • XMark: Synthetic Data • Treebank: Real Data from Wall Street Journal • Random Data set: random uniformly distributed data • Evaluation metrics • Number of intermediate path solutions • Total running time TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
TwigStackList¬ Algorithm: Experiments • Testing Queires for TwigStackList¬: Q(a)-Q(e) TreeBank, Q(f)-Q(h) Xmark, Q(i)-Q(k) Random Dataset TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
TwigStackList¬ Algorithm: Intermediate result TwigStackList¬ has the smallest intermediate results NTS, NTSL match more than one Twig Patterns TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
TwigStackList¬ Algorithm: Execution time Treebank testing result Random dataset testing result • Treebank and random data: TwigStackList¬ has the smallest running time • They output more intermediate results. • They match more than one twig queries. TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
TwigStackList¬ Algorithm: Execution time XMark Queries XMark testing result • XMark: TwigStackList¬ has the smallest running time • Query Q(f), Q(g), Q(h) have the same query nodes and structure, but the number of not-predicates is increasing. • NTS and NTSL execution time increases linearly, because the number of decomposed queries is increasing. • TwigStackList¬ execution time is almost constant. TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Outline • Introduction and motivation • Related Work • Preliminaries • Match Twig Query with Not-predicates • Notation and data structure • A holistic matching algorithm: TwigStackList¬ • Experiments • Conclusion TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Conclusions • We developed a new algorithm TwigStackList¬ to match Twig Pattern with not-predicates. • Our algorithm can identify a larger query class to guarantee I/O optimally • Experimental results showed the effectiveness and efficiency of our algorithm TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
END • Thank you! • Q & A TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Reference 1. N. Bruno, D. Srivastava, and N. Koudas. “Holistic twig joins: optimal xml pattern matching.” In Proceedings of ACM SIGMOD, 2002. 2. J. Lu, T. Chen, and T. W. Ling. “Efficient processing of XML twig patterns with parent child edges: a look-ahead approach.” In Proceedings ofCIKM, pages 533- 542, 2004. 3. H. Jiang, H. Lu, and W. Wang. “Efficient processing of twig queries with OR-predicates.” In Proceeding of SIGMOD, pages 59–70, 2004. 4. J. Lu, T. W. Ling, T. Yu, C. Li, and W. Ni. “Efficient processing of ordered XML twig pattern.” In Proceeding of DEXA, 2005. 5. E. Jiao, T.W. Ling, C. Y. Chan, and P. S. Yu. “Pathstack¬: A holistic path join algorithm for path query with not-predicates on XML data.” In Proceedings of DASFAA, 2005. TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Appendix (1) Previous work: binary structural Join • TreeMerge and Stack-Merge [1]: • Break into binary relationship branches • Stitch the branches together • A novel stack-based binary join algorithm • Disadvantage: large intermediate results 1. S. Al-Khalifa, H. V. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu, “Structural Joins: A Primitive for Efficient XML Query Pattern Matching.” In Proceedings ofICDE Conf. 2002. TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data
Appendix (2) Previous work: TwigStackList v.s. TwigStack • TwigStack output the it output the “uesless” intermediate path solution < s1,t1>, since it doesn’t check for parent-child relationsihp. • TwigStackList has no uesless intermediate output. < s1,t1> is not in the output. Root Twig Pattern An XML tree section s2 s1 s1 title p2 t3 paragraph t1 p1 t1 No Parent-child relationship for branching node p3 t2 figure f1 f2 TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data