280 likes | 424 Views
Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams. Hong Su, Elke Rundensteiner, Murali Mani, Ming Li Worcester Polytechnic Institute Worcester, MA VLDB 2004. Stream Processing. data sources. Networks. data requesters.
E N D
Raindrop:An Algebra-Automata Combined XQuery Engine over XML Streams Hong Su, Elke Rundensteiner, Murali Mani, Ming Li Worcester Polytechnic Institute Worcester, MA VLDB 2004
Stream Processing data sources Networks data requesters
Pattern retrieval + Filtering + Restructuring FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr] WHERE $b/*/phone = “508” Return <auction> $b, $c </auction> Token: not a counterpart of a self-contained tuple What’s Special for XML Stream Processing Token-by-Token access manner <auctions> <auction> <seller> <primary> <phone> timeline • Pattern Retrieval on Token Streams
Two Computation Paradigms • Automata-based [yfilter, xscan, xsm, xsq, xpush…] • Algebraic [niagara00, …] FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr] WHERE $b/*/phone = “508” Return <auction> $b, $c </auction> Tagger homepage … 4 seller 3 * auction phone Navigate $a, /bidder-> $c * 1 2 5 6 bid Navigate $a, /seller->$b 7 8 9 bidder sameAddr Navigate stream(bids),//auction->$a Automata Algebra
Comparison of Two Paradigms Either paradigm has deficiencies Both paradigms complement each other
Stream Execution Plan Stream Physical Plan Four-Level Algebraic Framework This Raindrop framework intends to integrate both paradigms into one Express the semantics of query regardless of input sources High (Declarative) Semantics-Focused Plan Accommodate tokenized streams/ automata computation Stream Logic Plan Describe implementation details of operators Decide how an operator is invoked (scheduling) Low (Procedural) Abstraction Level
Level I: Semantics-Focused Plan • Express query semantics regardless of stored or stream input sources [Rainbow-ZPR02] • Reuse existing general optimization techniques • Decorrelation • Cancel duplicate navigation operators • …
source <auctions>… </auctions> $a <auction>… </auction> $b <seller>… </seller> <auctions>… </auctions> <auction>… </auction> … source <auctions>… </auctions> $a <auction> … </auction> source <auctions> … </auctions> <auctions> … </auctions> <auction> … </auction> Example Semantics-Focused Plan Stream Data: <auctions> <auction> <seller> <primary><phone>508</phone></primary> <secondary><phone>613</phone></secondary> </seller> <bid><bidder>…</bidder><bidder>…</bidder></bid> </auction> … Query: FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr] WHERE $b/*/phone = “508” Return <auction> $b, $c </auction> Plan and Input/output Data: … source <auctions>… </auctions> $a <auction>… </auction> $b <seller>… </seller> $c <bidder>… </bidder> NavUnnest $a, /bid/bidder ->$c <auctions>… </auctions> <auction>. .. </auction> … NavUnnest $a, /seller ->$b NavUnnest stream(bids),//auction->$a
Level II: Stream Logical Plan • Extend semantics-focused plan to accommodate tokenized stream inputs • New input data format: • Tokens • New operators: • StreamSource, TokenNavigate, ExtractUnnest, ExtractNest, StructuralJoin • New rewrite rules: • Push-into/Pull-out-of Automata
One Uniform Algebraic View Algebraic Stream Logical Plan Tuple-based plan Query answer Tuple stream Token-based plan (automata plan) XML data stream
Modeling Automata in Algebraic Plan:Black Box[XScan01] vs. White Box FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bid/bidder[sameAddr] WHERE $b/*/phone = “508” Return <auction> $b, $c </auction> StructuralJoin $a $a := stream(bids)//auction $b := $a/seller $c := $a/bid/bidder ExtractUnnest $a, $b ExtractUnnest $a, $c XScan TokenNavigate $a, /bid/bidder->$c TokenNavigate $a, /seller->$b TokenNavigate stream(bids), //auction->$a White Box Black Box
Data Model in Algebraic Plan Modeling Automata <seller>…</seller> <bidder>...</bidder> … … StructuralJoin $a <seller>…</seller> <bidder>...</bidder> … … ExtractUnnest $a, $b ExtractUnnest $a, $c <seller> <primary> <bidder> <phone> <bidderid> 508 0314 </phone> … TokenNavigate $a, /bid/bidder->$c TokenNavigate $a, /seller->$b </primary> ... <auction> <seller> <auctions> TokenNavigate stream(bids), //auction->$a <primary> <auction> <phone> .... … StreamSource … …
For Details of Levels III and IV, please refer to • “Automaton Meets Query Algebra: Towards a Unified Model for XQuery Evaluation over XML Data Streams”, ER 2003 • “Raindrop: A Uniform and Layered Algebraic Framework for XQueries on XML Streams”, CIKM 2003 • “Raindrop: A Uniform and Layered Algebraic Framework for XQueries on XML Streams”, Journal Submission 2004
Optimization I: Computation Into or Out of Automata? … Into Automata Out of Automata NavigateUnest $a, /bid/bidder ->$c … … NavigateUnnest $a, /seller ->$b NavigateUnnest $a, /bid/bidder->$c NavUnnest stream(bids), //auction->$a Automata Plan StructuralJoin $a NavigateUnnest $a, /seller->$b ExtractUnnest $a, $b ExtractUnnest $a, $c Automata Plan TokenNavigate $a, /seller->$b TokenNavigate $a, /bid/bidder->$c ExtracUnnest stream(bids), $a TokenNavigate stream(bids), //auction->$a TokenNavigate stream(bids), //auction->$a
Optimization II: Semantic Query Optimization • General schema-based optimizations • Eliminate predicate/join, … • Focus on operators manipulating flat values • XML specific schema-based optimizations • Focus on pattern retrieval • Fall into two categories • General XML SQO • Minimize query tree [YCL+-AT&T 01] • Stream XML SQO (our focus)
Stream-Specific XML SQO • Observations • Pattern retrieval over tokens solely relies on document-order traversal • Schema constraints help expedite document-order traversal • State-of-the-Art • [XPush03] covers limited query (boolean XPath match) and one type of constraints • Our goals: • Support more powerfulquery (XQuery) • Supportmore types of constraints (XSchema)
Step I: Construct Query Graph FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bid/bidder[sameAddr] WHERE $b/*/phone = “508” Return <auction> $b, $c </auction> (a) Example Query (b) Query Tree
Step II: Apply Optimization Rules • Offer optimization rules utilizing • occurrence constraints • exclusive constraints • order constraints • Apply rules in an order ensuring • no beneficial rule missed • no redundant rule introduced
Step III: Translate Rewritten Query Graph Back to Plan (I) when </phone> is encountered twice, check /*/phone: if fails the predicate, suspend states s2 and s3 Utilize Occurrence Constraints
Step III: Translate Rewritten Query Graph Back to Plan (II) when <billTo> or <shipTo> is encountered once: suspend states s2 and s9 Utilize Exclusive Constraints
Step III: Translate Rewritten Query Graph Back to Plan (III) when <primary> is encountered once, check /homepage: if no presence, suspend states s10, s3 and s2 Utilize Order Constraints
http://davis.wpi.edu/dsrg/raindrop/ suhong@cs.wpi.edu Thank WPI DSRG Rainbow Team for XAT Algebra Support