1 / 28

Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams. Hong Su, Elke Rundensteiner, Murali Mani, Ming Li Worcester Polytechnic Institute Worcester, MA VLDB 2004. Stream Processing. data sources. Networks. data requesters.

teleri
Download Presentation

Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Raindrop:An Algebra-Automata Combined XQuery Engine over XML Streams Hong Su, Elke Rundensteiner, Murali Mani, Ming Li Worcester Polytechnic Institute Worcester, MA VLDB 2004

  2. Stream Processing data sources Networks data requesters

  3. Pattern retrieval + Filtering + Restructuring FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr] WHERE $b/*/phone = “508” Return <auction> $b, $c </auction> Token: not a counterpart of a self-contained tuple What’s Special for XML Stream Processing Token-by-Token access manner <auctions> <auction> <seller> <primary> <phone> timeline • Pattern Retrieval on Token Streams

  4. Two Computation Paradigms • Automata-based [yfilter, xscan, xsm, xsq, xpush…] • Algebraic [niagara00, …] FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr] WHERE $b/*/phone = “508” Return <auction> $b, $c </auction> Tagger homepage … 4 seller 3 * auction phone Navigate $a, /bidder-> $c * 1 2 5 6 bid Navigate $a, /seller->$b 7 8 9 bidder sameAddr Navigate stream(bids),//auction->$a Automata Algebra

  5. Comparison of Two Paradigms Either paradigm has deficiencies Both paradigms complement each other

  6. Stream Execution Plan Stream Physical Plan Four-Level Algebraic Framework This Raindrop framework intends to integrate both paradigms into one Express the semantics of query regardless of input sources High (Declarative) Semantics-Focused Plan Accommodate tokenized streams/ automata computation Stream Logic Plan Describe implementation details of operators Decide how an operator is invoked (scheduling) Low (Procedural) Abstraction Level

  7. Level I: Semantics-Focused Plan • Express query semantics regardless of stored or stream input sources [Rainbow-ZPR02] • Reuse existing general optimization techniques • Decorrelation • Cancel duplicate navigation operators • …

  8. source <auctions>… </auctions> $a <auction>… </auction> $b <seller>… </seller> <auctions>… </auctions> <auction>… </auction> … source <auctions>… </auctions> $a <auction> … </auction> source <auctions> … </auctions> <auctions> … </auctions> <auction> … </auction> Example Semantics-Focused Plan Stream Data: <auctions> <auction> <seller> <primary><phone>508</phone></primary> <secondary><phone>613</phone></secondary> </seller> <bid><bidder>…</bidder><bidder>…</bidder></bid> </auction> … Query: FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr] WHERE $b/*/phone = “508” Return <auction> $b, $c </auction> Plan and Input/output Data: … source <auctions>… </auctions> $a <auction>… </auction> $b <seller>… </seller> $c <bidder>… </bidder> NavUnnest $a, /bid/bidder ->$c <auctions>… </auctions> <auction>. .. </auction> … NavUnnest $a, /seller ->$b NavUnnest stream(bids),//auction->$a

  9. Level II: Stream Logical Plan • Extend semantics-focused plan to accommodate tokenized stream inputs • New input data format: • Tokens • New operators: • StreamSource, TokenNavigate, ExtractUnnest, ExtractNest, StructuralJoin • New rewrite rules: • Push-into/Pull-out-of Automata

  10. One Uniform Algebraic View Algebraic Stream Logical Plan Tuple-based plan Query answer Tuple stream Token-based plan (automata plan) XML data stream

  11. Modeling Automata in Algebraic Plan:Black Box[XScan01] vs. White Box FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bid/bidder[sameAddr] WHERE $b/*/phone = “508” Return <auction> $b, $c </auction> StructuralJoin $a $a := stream(bids)//auction $b := $a/seller $c := $a/bid/bidder ExtractUnnest $a, $b ExtractUnnest $a, $c XScan TokenNavigate $a, /bid/bidder->$c TokenNavigate $a, /seller->$b TokenNavigate stream(bids), //auction->$a White Box Black Box

  12. Data Model in Algebraic Plan Modeling Automata <seller>…</seller> <bidder>...</bidder> … … StructuralJoin $a <seller>…</seller> <bidder>...</bidder> … … ExtractUnnest $a, $b ExtractUnnest $a, $c <seller> <primary> <bidder> <phone> <bidderid> 508 0314 </phone> … TokenNavigate $a, /bid/bidder->$c TokenNavigate $a, /seller->$b </primary> ... <auction> <seller> <auctions> TokenNavigate stream(bids), //auction->$a <primary> <auction> <phone> .... … StreamSource … …

  13. For Details of Levels III and IV, please refer to • “Automaton Meets Query Algebra: Towards a Unified Model for XQuery Evaluation over XML Data Streams”, ER 2003 • “Raindrop: A Uniform and Layered Algebraic Framework for XQueries on XML Streams”, CIKM 2003 • “Raindrop: A Uniform and Layered Algebraic Framework for XQueries on XML Streams”, Journal Submission 2004

  14. Optimization I: Computation Into or Out of Automata? … Into Automata Out of Automata NavigateUnest $a, /bid/bidder ->$c … … NavigateUnnest $a, /seller ->$b NavigateUnnest $a, /bid/bidder->$c NavUnnest stream(bids), //auction->$a Automata Plan StructuralJoin $a NavigateUnnest $a, /seller->$b ExtractUnnest $a, $b ExtractUnnest $a, $c Automata Plan TokenNavigate $a, /seller->$b TokenNavigate $a, /bid/bidder->$c ExtracUnnest stream(bids), $a TokenNavigate stream(bids), //auction->$a TokenNavigate stream(bids), //auction->$a

  15. Experimentation Results

  16. Optimization II: Semantic Query Optimization • General schema-based optimizations • Eliminate predicate/join, … • Focus on operators manipulating flat values • XML specific schema-based optimizations • Focus on pattern retrieval • Fall into two categories • General XML SQO • Minimize query tree [YCL+-AT&T 01] • Stream XML SQO (our focus)

  17. Stream-Specific XML SQO • Observations • Pattern retrieval over tokens solely relies on document-order traversal • Schema constraints help expedite document-order traversal • State-of-the-Art • [XPush03] covers limited query (boolean XPath match) and one type of constraints • Our goals: • Support more powerfulquery (XQuery) • Supportmore types of constraints (XSchema)

  18. Step I: Construct Query Graph FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bid/bidder[sameAddr] WHERE $b/*/phone = “508” Return <auction> $b, $c </auction> (a) Example Query (b) Query Tree

  19. Example XML Schema

  20. Step II: Apply Optimization Rules • Offer optimization rules utilizing • occurrence constraints • exclusive constraints • order constraints • Apply rules in an order ensuring • no beneficial rule missed • no redundant rule introduced

  21. Step III: Translate Rewritten Query Graph Back to Plan (I) when </phone> is encountered twice, check /*/phone: if fails the predicate, suspend states s2 and s3 Utilize Occurrence Constraints

  22. Step III: Translate Rewritten Query Graph Back to Plan (II) when <billTo> or <shipTo> is encountered once: suspend states s2 and s9 Utilize Exclusive Constraints

  23. Step III: Translate Rewritten Query Graph Back to Plan (III) when <primary> is encountered once, check /homepage: if no presence, suspend states s10, s3 and s2 Utilize Order Constraints

  24. http://davis.wpi.edu/dsrg/raindrop/ suhong@cs.wpi.edu Thank WPI DSRG Rainbow Team for XAT Algebra Support

  25. Thank WPI DSRG Rainbow Team for XAT Algebra Support

More Related