1 / 33

Schema-Based Query Optimization for XQuery over XML Streams

Learn about Schema-Based Query Optimization (SQO) that utilizes schema knowledge to optimize XQuery, focusing on static and streaming XML data retrieval. Explore related works and challenges, and understand the execution of optimized plans in XQuery optimization. Discover the benefits and overhead of SQO implementation in XML data filtering.

gerke
Download Presentation

Schema-Based Query Optimization for XQuery over XML Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA VLDB 2005

  2. Schema-Based Query Optimization (SQO) • Schema knowledge can be utilized to optimize queries • Well studied in deductive/relational databases • Join elimination • predicate elimination, • detection of empty answer set … • Equally applicable to XML for flat value filtering

  3. SQO for XML Pattern Retrieval • General XML SQO • Applicable to both static and streaming XML • E.g..: Query tree minimization [Amer-Yahia+02] • Static XML Specific SQO • Focus on expediting random access of data • E.g.: Query rewrite using “extents” (indices built on element types) [Fernandez+98], … • Stream specific XML SQO • Focus on expediting token-by-token sequential access of data

  4. <seller> Skip computation When retrieved Stream Specific SQO Example buffer: Without schema /seller[shipTo] Buffer seller element <seller><sameAddr>…<url>…<url></seller> Retrieve /shipTo buffer: Buffer seller element Retrieve/shipTo Retrieve/sameAddr <!element seller((billTo,shipTo)|sameAddr, …)>

  5. Related Work • YFilter [Diao02] and XSM [Ludscher 03] • Use schema to decide whether pattern results are recursive or types of child elements • Essentially propose general XML SQO • FluXQuery [Koch+04] • Use schema to minimize buffer size • Is complementary to our focus (aim to skip unnecessary computations) • SIX [Gupta+03] • Use indices interleaved with XML data to reduce parsing • Could be combined with our techniques

  6. Retrieve/billTo /seller[shipTo]/billTo Retrieve/shipTo Nothing to save: /billTo has already been retrieved Retrieve/sameAddr When retrieved Challenge: Constraint Useful? /seller/shipTo Retrieve/shipTo Nothing to save: /shipTo is the only pattern retrieval Retrieve/sameAddr <!element seller((billTo,shipTo)|sameAddr, …)> When retrieved

  7. Challenge : Benefits/Overhead? • Maximal benefits:no beneficial optimization should be missed • Any failed patterns should be detected as early as possible • Minimal overhead:no redundant optimization should be introduced • Whether a particular pattern fails should not be repeatedly checked

  8. Challenge: Plan Execution • Optimization at lower level than query rewrite • Specific physical implementations are needed No query can capture this optimization Buffer seller element /seller[shipTo] Retrieve/shipTo Retrieve/sameAddr <!element seller((billTo,shipTo)|sameAddr, …)> When retrieved

  9. Outline • SQO Technique Design • SQO Application • Execution of Optimized Plan • Experimentations

  10. Physical Implementation of Pattern Retrieval • Note: • Important to understand physical stream engine implementation for designing effective SQO • Our implementation: • Widely used automata implementation [e.g., Tukwila, YFilter]

  11. input <auctions> <auction> … <phone> … </phone> stack [1] [0] [12#] [11] … [2,3] [1] [0] [0] [11] … [2,3] [1] [0] [2,3] [1] [0] … … #: buffering flag Example Query and its Automata … 3 λ auctions auction for $a in /auctions/auction, $b in $a/seller[shipTo] where $b/*/phone=“508-123-4567” return <auction> for $c in $a/item where $c//keyword=“auto” return $b/*/phone </auction> 0 1 2 shipTo 10 seller 9 11 12 primary, secondary * phone

  12. Opt. opportunities: • avoid transitions as much as possible • revoke buffering flag as soon as possible input <auctions> <auction> … <phone> … </phone> stack [1] [0] [12#] [11] … [2,3] [1] [0] [0] [11] … [2,3] [1] [0] [2,3] [1] [0] … … #: buffering flag Example Query and its Automata … 3 λ auctions auction 0 1 2 shipTo 10 seller 9 11 12 primary, secondary * phone

  13. Is Constraint Useful for Opt.? • Constraints used to find “ending marks” of a pattern within a context element <!element seller((billTo, shipTo)|sameAddr?, …)> <sameAddr> is ending mark of /shipTo within sellerelement context

  14. Is Constraint Useful for Opt.? • Ending mark helpful if • Context element can be filtered out earlier:

  15. Is Constraint Useful for Opt.? • Ending mark helpful if • Context element can be filtered out earlier: • Pattern may fail to appear Ending mark for $a/seller is not helpful <!element auction(seller, …)> for $a in /auctions/auction, $b in $a/seller … + Ending mark for $a/seller is helpful <!element auction(seller?, …)>

  16. Is Constraint Useful for Opt.? • Ending mark helpful if • Context element can be filtered out earlier: • Pattern may fail to appear • Pattern is required for $c in $a/item return <c>$a/category</c> Ending mark for $a/category is not helpful <!element item (category?, desc, …)> + for $c in $a/item[category] return <c>$a/category</c> Ending mark for $a/category is helpful

  17. Is Constraint Useful for Opt.? • Ending mark helpful if • Context element can be filtered out earlier: • Pattern may fail to appear • Pattern is required and • The early filtering can be beneficial: • Transitions may happen after ending marks • Buffering flags may be raised before ending marks

  18. SQO Design • Helpful ending marks identified by our SQO • Three SQO rules designed using • Occurrence constraints • Exclusive constraints • Order constraints

  19. Example SQO Rule • Use occurrence constraint • Event-condition-action output by rule for $a in /auctions/auction, $b in $a/seller Where $b/*/phone = “508-1234567” … Event:second</phone> is encountered in a seller Condition: $b/*/phone = “508-1234567” not satisfied yet Action: skip rest computations within current seller element + <!element seller(primary, secondary, …)> <!element primary (phone)> <!element secondary (phone)>

  20. Outline • SQO Technique Design • SQO Application • Execution of Optimized Plan • Experimentations

  21. Properties of SQO Application • Maximal benefits • Minimal overhead

  22. Maximal Benefit • Definition of “rule independence” • Proof of “maximal benefits” given If rules are all independent, as long as each rule is applied on each pattern, maximal benefits are ensured

  23. Minimal Overhead: Redundancy • Same pattern redundancy : Multiple ending marks adopted for same pattern Ending mark <billTo> for $b/shipTo Ending mark <url> for $b/shipTo Query Schema Constraints for $a in /auctions/auction, $b in $a/seller[shipTo] … <!element seller ( shipTo?, billTo, url )> <billTo> guarantees to capture failure of /shipTo Redundant

  24. Minimal Overhead: Redundancy? • Parent-child pattern redundancy: ending marks of child patterns early filter parent pattern <billTo> for $b/shipTo <bidder> for $a/seller Query Constraints <!element auction (seller, bidder)> optional Can be used to capture failure of $a/seller[shipTo] <!element seller (shipTo, billTo?)> for $a in /auctions/auction, $b in $a/seller[shipTo] … <!element auction (seller, bidder)> required Redundant <!element seller (shipTo, billTo)>

  25. SQO Application Algorithm • Input: • XQuery represented as a tree • XML Schema represented as a graph • Processing: • Query tree traversed top-down • “maximal benefits” ensured • Tree node applied by local/regional appliers • Same pattern redundancy excluded by local applier • Parent-child pattern redundancy excluded by regional applier • Output: • Event-condition-actions attached to tree nodes

  26. Outline • SQO Technique Design Guideline • SQO Application • Execution of Optimized Plan • Experimentations

  27. Encoding ECAs in Automata • E: push-in or pop-out of state • C: pattern result buffer checked • A: actions include: • Suspend computations by removing automata transitions • Clean up result generated within current context element • Prepare for recovering computation for next context element (e.g., backup transitions)

  28. Event: 1st <sameAddr> encountered • Condition: none • Action: cut all transitions from • q2 • States reachable via : q3 • States between q2 and q13: q9 Example: ECAs in Automata for $a in /auctions/auction, $b in $a/seller[shipTo] where $b/*/phone=“508-123-4567” return <auction> for $c in $a/item … </auction> … (…, state 3) 5 3 (1, startTag, none,state 2) item 0 1 2 13 …<auction> <seller> auction sameAddr auctions <sameAddr> </sameAddr> seller 10 9 <item> </item> shipTo <primary> </primary> primary, secondary 11 12 … phone

  29. Outline • SQO technique design guideline • SQO application • Execution of optimized plan • Experimentations

  30. Optimization Effected by ? • How often pattern fails (pattern selectivity) • How much gain each early filtering brings (unit gain)

  31. Necessity of Design Guideline Plan without SQO Plan with SQO (1 ending mark) Plan with SQO but no guideline considered (30 ending marks) Selectivity of Pattern with the Only Useful Ending Mark

  32. Conclusion • First SQL on streaming XML • Support SQO on nested XQuery with “*” or “//” • Offer criteria of “useful” constraints • Ensure maximal benefits and minimal overhead in SQO application • Provide execution strategy in widely-used automata-based model • Implement SQO optimizer in Raindrop system (VLDB’04 demo) • Experimentally demonstrate SQO brings significant improvement with little overhead

  33. Visit our XQuery engine over XML stream project (RAINDROP) website http://davis.wpi.edu/dsrg/raindrop/ Supported by USA National Science Foundation and IBM PhD Fellowship

More Related