1 / 25

Optimizing Cursor Movement in Holistic Twig Joins

This study presents TwigOptimal, a holistic twig join algorithm for efficient XQuery processing, with focus on cursor movement optimization and extraction points. Experimental results show significant performance improvements over existing algorithms.

lmann
Download Presentation

Optimizing Cursor Movement in Holistic Twig Joins

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford) CIKM’2005

  2. for $a in //article[year = “2005” or keyword = “XML”] for $s in $a/section return $s/title In an index-based method, 7 tags and text elements need to be verified to process this query Running time is dominated by the I/O for manipulating this cursors Twig join Algorithms are not optimized for I/O and do not exploit the query’s extraction points article AND OR section year keyword title 2005 XML Motivation

  3. Our Contributions • TwigOptimal, a new holistic twig join algorithm that supports a large fraction of XQuery (including AND/OR branches) • Description of how extraction points improve query performance • Experimental evaluation that shows how TwigOptimal outperforms current algorithms

  4. Agenda • Background • TwigOptimal algorithm • Experimental results • Conclusions

  5. (0,7,0) R (1,5,1) B3 A1 (6,7,1) (3,5,2) (7,7,2) B2 B1 C2 (2,2,2) (4,4,3) C1 D1 (5,5,3) XML Indexing • Begin/End/Level encoding • Begin: preorder position of tag/text • End: preorder position of last descendent • Level: depth • Containment: X contains Y iff X.begin < Y.begin <= X.end (assuming well-formed)

  6. R B3 A1 B1 B2 C2 C1 D1 B1 B2 B3 C1 C2 Basic Access Path • Inverted lists • Posting: <Token, Location> • Token = <term/tag> • Location = <DocumentID, Position> • Supported method on cursor: • CB.fowardTo(Position p)

  7. A || B || || C D Joins in XML • Structural (Containment) Joins • Twig Joins A || B B || C B || D A || B || C

  8. A B3 A || B B1 D1 X2 D2 C1 X1 || || C D C2 LocateExtension • “Extension” (w.r.t. query node q) – a solution for the subquery rooted at q • Input: q • Result: the cursors of all descendants of q point to an extension for q

  9. A B3 A || B B1 D1 X2 D2 C1 X1 || || C D C2 LocateExtension While (not end(q) && not hasExtension(q)) { (p, c) = PickBrokenEdge(q); ZigZagJoin(p, c); }

  10. TwigOptimal Algorithm • Tests if the cursor with the minimal location has an extension • If not, try to virtually move cursors until they form an extension • Only move cursors physically if no more virtual move is possible • A virtual move just sets the begin value of the cursor, therefore no I/O is involved: • Cq.begin = new begin value for Cq; • Cq.virtual = true; //indicates that the cursor is virtual

  11. A B3 A || B B1 D1 X2 D2 C1 X1 || || C D C2 Checking Extension • We have an extension for cursor q if: • All cursors underneath q are properly aligned • All cursors underneath q have physical locations Return false

  12. A B3 A || B B1 D1 X2 D2 C1 X1 || || C D C2 Checking Extension • We have an extension for cursor q if: • All cursors underneath q are properly aligned • All cursors underneath q have physical locations Return true

  13. Moving Cursors • Two passes over the query tree • Bottom-up: move each parent cursor forward so it contains the children cursors • Top-down: move the children cursors forward so they are contained by their parents

  14. Move Cursors Example Query = //x[.//y and .//z] = physical move = virtual move 5 1 x1 x2 6 4 2 y1 y2 y3 y4 y5 7 3 z1 z2

  15. Comparing with TSGeneric+ = current cursor position = physical move = virtual move Query = //w//x//y//z w1 w2 x1 x2 x3 x4... x49 x50 y1 y2 y3… y49 y50 y51 y52 ... y98 y100 y99 z1 z2

  16. Comparing with TSGeneric+ = current cursor position Query = //w//x//y//z = physical move w1 w2 x1 x2 x3 x4... x49 x50 y1 y52... y2 y3… y49 y50 y51 y98 y100 y99 z1 z2

  17. A || || B C Extraction Points Optimization • If neither q or its descendants in the query are extraction points we can virtually move these cursors within q’s parent A1 A2 C1 B1 B2 B3 C100 C99

  18. Prototype • Implemented over Berkeley DB B-tree • Inverted lists • Posting: <Token, Location> • Token = <term/tag> • Location = <DocumentID, Position> • Position is BEL

  19. Data Sets • Xmark • 10 documents of size ~ 100MB each • Synthetic • 4 tags: W, X, Y, Z • Uncorrelated, no self-nesting • Same frequency

  20. Experimental Results

  21. Experimental Results

  22. Experimental Results

  23. Experimental Results

  24. Experimental Results

  25. Conclusion • TwigOptimal algorithm outperforms existing twig join algorithms by more than 40%, especially for larger queries • Optimized for I/O, which is the performance bottleneck • Extraction points optimization improve performance

More Related