1 / 30

A Uniform and Layered Algebraic Framework for XQueries on XML Streams

Explore a unified approach to XQuery processing on XML streams, integrating automata-based and algebraic paradigms for efficient querying. This framework enables pattern retrieval, filtering, and restructuring on token streams, enhancing query performance for real-time applications. Discover design choices, models integration, and semantics-focused planning in the Raindrop framework for stream execution. Enhance query processing engines, extend support for various data sources, and optimize pattern recognition methods for seamless stream querying.

madelenet
Download Presentation

A Uniform and Layered Algebraic Framework for XQueries on XML Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov 5, 2003

  2. Need for Stream Processing • New computing environment • Data sources can be anywhere/anytime • On-line arriving data • Data requests can be anywhere/anytime • Real-time response requirement • New applications • Relational • Sensor networks • XML • Analysis of XML web logs • Selective dissemination of XML information (e.g., news)

  3. Token-by-Token access manner Pattern retrieval + Filtering + Restructuring FOR $b in stream(biditems.xml)//book LET $p := $b/price $t := $b/title WHERE $p < 20 Return <Inexpensive> $t </Inexpensive> <biditems> <book> <title> Dream Catcher </title> … timeline Token: not a direct counterpart of a tuple year title last first publisher price 2001 Dream King S. Bt Bound 30 What’s Special for XML Stream Processing <Biditems> <book year=“2001"> <title>Dream Catcher</title> <author><last>King</last><first>S.</first></author> <publisher>Bt Bound </publisher> <price> 30 </initial> </book> … • Pattern Retrieval on Token Streams

  4. Two Computation Paradigms • Automata-based [yfilter02, xscan01, xsm02, xsq03, xpush03…] • Algebraic [niagara00, …] This Raindrop framework intends to integrate both paradigms into one

  5. Automata-Based Paradigm • Auxiliary structures for: • Buffering data • Filtering • Restructuring • … FOR $b in stream(biditems.xml)//book LET $p := $b/price $t := $b/title WHERE $p < 20 Return <Inexpensive> $t </Inexpensive> //book/title title 4 * book 1 2 price //book //book/price 3

  6. Tagger Tagger Rewrite by “pushing down selection” Navigate //$b, /title->$t Select $p < 30 Select $p < 30 Navigate $b, /title-> $t Navigate $b, /price->$p Navigate $b,/price->$p Rewritten Logic Plan Logic Plan Tagger Choose low-level implementation alternatives Navigate-Scan $b, /title -> $t Select $p < 30 Navigate-Index $b, /price -> $p Physical Plan Algebraic Computation FOR $b in stream(biditems.xml)//book LET $p := $b/price $t := $b/title WHERE $p < 20 Return <Inexpensive> $t </Inexpensive> book book book title author publisher price Text Text Text last first Text Text $b $t <book>… </book> <title>… </title> … Navigate $b, /title -> $t $b <book> …</book> …

  7. Observations Either paradigm has deficiencies Both paradigms complement each other

  8. How to Integrate Two Paradigms

  9. How to Integrate Two Models? • Design choices • Extend algebraic paradigm to support automata? • Extend automata paradigm to support algebra? • Come up with completely new paradigm? • Extend algebraic paradigm to support automata • Practical • Reuse & extend existing algebraic query processing engines • Natural • Present details of automata computation at low level • Present semantics of automata computation (target patterns) at high level

  10. Semantics-focused Plan Stream Execution Plan Stream Physical Plan Raindrop: Four-Level Framework High (Declarative) Stream Logic Plan Low (Procedural) Abstraction Level

  11. Level I: Semantics-focused Plan [Rainbow-ZPR02] • Express query semantics regardless of stored or stream input sources • Reuse existing techniques for stored XML processing • Query parser • Initial plan constructor • Rewriting optimization • Decorrelation • Selection push down • …

  12. $S1 <Biditems>… </Biditems> $b <book>… </book> $p 30 $t Dream Catcher <Biditems>… </Biditems> <book>. .. </book> Tagger “Inexpensive”, $t->$r … … $S1 <Biditems>… </Biditems> $b <book>… </book> $p 30 Select $p<30 <Biditems> … </Biditems> <book>… </book> … NavNest $b, /title ->$t $S1 <Biditems>… </Biditems> $b <book> … </book> $S1 <Biditems> … </Biditems> NavNest $b, /price/text() ->$p <Biditems> … </Biditems> <book> … </book> NavUnnest $S1, //book ->$b Example Semantics-focused Plan FOR $b in stream(biditems.xml)//book LET $p := $b/price $t := $b/title WHERE $p < 20 Return <Inexpensive> $t </Inexpensive> <Biditems> <book year=“2001"> <title>Dream Catcher</title> <author><last>King</last><first>S.</first></author> <publisher>Bt Bound </publisher> <price> 30 </initial> </book> …

  13. Level II: Stream Logical Plan • Extend semantics-focused plan to accommodate tokenized stream inputs • New input data format: • contextualized tokens • New operators: • StreamSource, Nav, ExtractUnnest, ExtractNest, StructuralJoin • New rewrite rules: • Push-into-Automata

  14. One Uniform Algebraic View Algebraic Stream Logical Plan Tuple-based plan Query answer Tuple stream Token-based plan (automata plan) XML data stream

  15. StructuralJoin $b ExtractNest $b, $p ExtractNest $b, $t Navigate $b, /title->$t Navigate $b, /price->$p Navigate $S1, //book ->$b White Box Modeling the Automata in Algebraic Plan:Black Box[XScan01] vs. White Box FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 20 Return <Inexpensive> $t </Inexpensive> $b := //book $p := $b/price $t := $b/title XScan Black Box

  16. Example Uniform Algebraic Plan FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 30 Return <Inexpensive> $t </Inexpensive> Tuple-based plan Token-based plan (automata plan)

  17. Example Uniform Algebraic Plan FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 30 Return <Inexpensive> $t </Inexpensive> Tuple-based plan StructuralJoin $b ExtractNest $b, $p ExtractNest $b, $t Navigate $b, /title->$t Navigate $b, /price->$p Navigate $S1, //book ->$b

  18. Example Uniform Algebraic Plan FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 30 Return <Inexpensive> $t </Inexpensive> Tagger “Inexpensive”, $t->$r Select $p<30 StructuralJoin $b ExtractNest $b, $p ExtractNest $b, $t Navigate $b, /title->$t Navigate $b, /price->$p Navigate $S1, //book ->$b

  19. Tagger “Inexpensive”, $t->$r Select $p<30 Tagger “Inexpensive”, $t->$r StructuralJoin $b Select $p<30 Apply “push into automata” ExtractNest $b, $p ExtractNest $b, $t NavNest $b, /title ->$t Nav $b, /title->$t Nav $b, /price/text()->$p NavNest $b, /price/text() ->$p Apply “push into automata” Nav $S1, //book ->$b NavUnnest $S1, //book ->$b From Semantics-focused Plan to Stream Logical Plan

  20. Level III: Stream Physical Plan • For each stream logical operator, define how to generate outputs when given some inputs • Multiple physical implementations may be provided for a single logical operator • Automata details of some physical implementation are exposed at this level • Nav, ExtractNest, ExtractUnnest, Structural Join

  21. 3 2 One Implementation of Extract/Structural Join SJoin //book ExtractNest $b, $t ExtractNest /$b, $p Nav $b, /title->$t Nav $b, /price->$p * title book 1 Nav ., //book ->$b price 4 <biditems> <book> <title> Dream Catcher </title> … </book>…

  22. Level IV: Stream Execution Plan • Describe coordination between operators regarding when to fetch the inputs • When input operator generates one output tuple • When input operator generates a batch • When a time period has elapsed • … • Potentially unstable data arrival rate in stream makes fixed scheduling strategy unsuitable • Delayed data under scheduling may stall engine • Bursty data not under scheduling may cause overflow

  23. Semantics-focused Plan Stream Execution Plan Stream Physical Plan Raindrop: Four-Level Framework (Recap) Express the semantics of query regardless of input sources Accommodate tokenized input streams Stream Logic Plan Describe how operators manipulate given data Decides the Coordination among operators

  24. Optimization Opportunities

  25. Semantics-focused Plan Stream Execution Plan Stream Physical Plan Optimization Opportunities General rewriting (e.g., selection push down) Break-linear-navigation rewriting Stream Logic Plan Physical implementations choosing Execution strategy choosing

  26. From Semantics-focused to Stream Logical Plan: In or Out? Tuple-based Plan Query answer Pattern retrieval in Semantics-focused plan Tuple stream Token-based plan (automata plan) Apply “push into automata” XML data stream

  27. In Out Tagger Tagger Select price < 30 Navigate book/title Tagger “Inexpensive”, $t->$r Select price<30 SJoin //book Select $p<30 Navigate /price ExtractNest $b, $t ExtractNest $b, $p NavNest $b, /title ->$t Nav $b, /title->$t Nav $b, /price->$p ExtractNest $S1, $b NavNest $b, /price ->$p Nav $S1, //book->$b Nav $S1, //book->$b NavUnnest $S1, //book ->$b Plan Alternatives

  28. Experimentation Results

  29. Contributions • Combined automata and algebra based paradigms into one uniform algebraic paradigm • Provided four layers in algebraic paradigm • Query semantics expressed at high layer • Automata computation on streams hidden at low layer • Supported optimization at an iterative manner (from high abstraction level to low abstraction level) • Illustrated enriched optimization opportunities by experiments

  30. http://davis.wpi.edu/dsrg/raindrop/ Project Overview Publications Talks Email: suhong@cs.wpi.edu

More Related