1 / 23

Efficient Structural Joins on Indexed XML Documents

Efficient Structural Joins on Indexed XML Documents. Shu-Yao Chien, Zografoula Vagena , Donghui Zhang, Vassilis J. Tsotras, Carlo Zaniolo VLDB 2002. Overview. Motivation Problem Description Structural Joins Structural Joins using B+-trees Structural Joins using R-trees

Download Presentation

Efficient Structural Joins on Indexed XML Documents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui Zhang, Vassilis J. Tsotras, Carlo Zaniolo VLDB 2002 Efficient Structural Joins on Indexed XML

  2. Overview • Motivation • Problem Description • Structural Joins • Structural Joins using B+-trees • Structural Joins using R-trees • Problem Variations • Experimental Results Efficient Structural Joins on Indexed XML

  3. Motivation (1) Query languages for XML qualify documents for retrieval both by their structure and the values of their elements. Example: section[title=“Overview”]//figure[caption=“R-tree”] (path-expression query) Efficient Structural Joins on Indexed XML

  4. Motivation (2) • Numbering Schemes • Each node is assigned a unique interval. • The intervals of a parent node contains the intervals of all its children. When the XML document is combined with a numbering scheme, path expression queries require the computation of structural joins. Efficient Structural Joins on Indexed XML

  5. Motivation (2) • From path expressions to structural join: two nodes qualify for a path expression query if one is an ancestor of the other. With intervals, this is equivalent to containment. When the XML document is combined with a numbering scheme, path expression queries require the computation of structural joins. Efficient Structural Joins on Indexed XML

  6. Problem Description • Structural Join: Let A and D be two lists containing the instances of two particular tags in an XML document, join A and D using their containment associations as the join condition. • [Al-Khalifa, etc. 2002] proposed non-indexed structural join algorithms. • We extend their algorithms to take advantage of existing indices on the two lists. Efficient Structural Joins on Indexed XML

  7. Structural Joins, no indices • Let a, d be the first elements of A and D • while (A, D are not empty or the stack is not empty) do • if (a.start > stack.top and d.start > stack.top) then • stack.pop() • else if (a.start < d.start) then • stack.push(a) • Let a be the next element in A • else • output d as descendant of all elements in stack • let d be the next element in D • endif • endwhile Efficient Structural Joins on Indexed XML

  8. Example a1 a2 a4 a3 d2 d3 d1 Efficient Structural Joins on Indexed XML

  9. Structural Joins using B+-trees • Existing structural join algorithms sequentially scan the input lists. • Durable numbering schemes have enabled indexing of XML files with mainstream indices. • Such indices can result in sub-linear access time as they provide the facility to skip elements that don’t participate in the join. Efficient Structural Joins on Indexed XML

  10. Motivation for using the B+-tree index (1) a1 a2 a12 a3 a4 a8 a5 a9 a6 a7 a10 a11 d1 d2 Efficient Structural Joins on Indexed XML

  11. Motivation for using the B+-tree index (2) a1 a2 d14 d1 d2 d3 d13 d4 d5 d9 d10 d6 d7 d8 d11 d12 Efficient Structural Joins on Indexed XML

  12. Structural Joins using B+-trees • Put pointers a and d at the beginning of lists A and D • while ( not at the end of A or D ) do • if ( a is an ancestor of d ) then • Push into stack all elements in A that are ancestors of d • Join d with all elements in stack and let d=d->next • else if ( a.start < d.start ) then // jump ancestor A • Pop all elements in stack which are before d • Move a forward by skipping sub-trees of last element popped • else // a is after d; jump descendant D • Join d with all elements in stack • Move d forward by skipping all D elements with start<a.start Efficient Structural Joins on Indexed XML

  13. Containment forest • Structure linking elements that belong to the same tag. • Each element corresponds to a node in the structure and is linked to other elements via parent, first-child and right-sibling pointers. • Can be embedded within the associated B+-tree • Improves CPU time Efficient Structural Joins on Indexed XML

  14. Containment forest example A (10,500) A (800,900) A (1400,2000) A (150,250) A (300,400) A (830,860) A (1530,1560) A (1700,1800) Efficient Structural Joins on Indexed XML

  15. Containment forest properties • The (start, end) interval of each node contains all intervals in its subtree. • The start numbers in the forest follow a preorder traversal. • The start (end) numbers of sibling nodes are in increasing order. Containment forest can be dynamically maintained. Efficient algorithms for element insertion/deletion Efficient Structural Joins on Indexed XML

  16. Structural Join using R-trees (1) • The interval (start, end) of an element can be mapped to a point (e.start, e.end) in the 2-D space which is then indexed by an R-tree. • An R-tree can also be used to index the element (start, end) ranges as 1-D intervals Efficient Structural Joins on Indexed XML

  17. Structural Join using R-trees (2) two points two pages Efficient Structural Joins on Indexed XML

  18. Problem Variations • Self Joins • non-indexed algorithm that traverses the element list exactly once • Structural Join in a pipelining environment • Feedback between modules can help to skip elements that don’t take part in the join Efficient Structural Joins on Indexed XML

  19. Performance Analysis (1) Effect of skipping only ancestors in join performance Efficient Structural Joins on Indexed XML

  20. Performance Analysis (2) Effect of skipping only descendants in join performance Efficient Structural Joins on Indexed XML

  21. Performance Analysis (3) Effect of skipping both ancestors and descendants Efficient Structural Joins on Indexed XML

  22. Performance Analysis (4) Comparison of B+-tree and B+psp algorithms Efficient Structural Joins on Indexed XML

  23. Conclusions • We presented efficient ways to perform structural joins over XML data utilizing existing indices. • Experimental results showed that among the indexed approaches, the B+-tree with sibling pointers performs the best. • Easily maintainable solution that provided drastic improvement over no-index case. Efficient Structural Joins on Indexed XML

More Related