340 likes | 441 Views
Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams. Bernhard Stegmaier (TU München) Joint work with Christoph Koch (TU Wien) Stefanie Scherzinger (TU Wien) Nicole Schweikardt (HU Berlin). Outline. Motivation
E N D
Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams Bernhard Stegmaier (TU München) Joint work with Christoph Koch (TU Wien) Stefanie Scherzinger (TU Wien) Nicole Schweikardt (HU Berlin)
Outline • Motivation • FluX Query Language • Translating XQuery into FluX • Further Aspects • Experiments • Conclusion
Traditional Approach Bibliography DTD <!ELEMENT bib (book)*> <!ELEMENT book ((title|author)*,price)> • Evaluation of book-node • Print <result> • Buffer titles and authors • Output titles • Output authors • Print </result> List title(s) and authors of books <results> {for $b in /bib/book return <result> {$b/title} {$b/author} </result>} </results> Example: Buffer:<author>Kemper</author> <title>Datenbanksysteme</title> <author>Eickler</author> … <book> <author>Kemper</author> <title>Datenbanksysteme</title> <author>Eickler</author> <price>40€</price> </book> … Output:<result> <title>Datenbanksysteme</title> <author>Kemper</author> <author>Eickler</author> </result>
The FluX Approach Bibliography DTD <!ELEMENT bib (book)*> <!ELEMENT book ((title|author)*,price)> FluX query (for book node) … <result> {process-stream $b: on title as $t return $t; on-first past (title,author) return {for $a in $b/author return $a}} </result> … List title(s) and authors of books <results> {for $b in /bib/book return <result> {$b/title} {$b/author} </result>} </results> • Less buffering using order constraints Example: Buffer:<author>Kemper</author> <author>Eickler</author> … <book> <author>Kemper</author> <title>Datenbanksysteme</title> <author>Eickler</author> <price>40€</price> </book> … Output:<result> <title>Datenbanksysteme</title> <author>Kemper</author> <author>Eickler</author> </result>
The FluX Approach II Bibliography DTD <!ELEMENT bib (book)*> <!ELEMENT book ((title*,author*),price)> FluX query … <result> {process-stream $b: on title as $t return $t; on author as $a return $a;} </result> … List title(s) and authors of books <results> {for $b in /bib/book return <result> {$b/title} {$b/author} </result>} </results> No buffering using order constraints! Example: Buffer: … <book> <title>Datenbanksysteme</title> <author>Kemper</author> <author>Eickler</author> <price>40€</price> </book> … Output:<result> <title>Datenbanksysteme</title> <author>Kemper</author> <author>Eickler</author> </result>
Outline • Motivation • FluX Query Language • Translating XQuery into FluX • Further Aspects • Experiments • Conclusion
FluX Query Language • Based on XQuery fragment XQuery- • ε(empty) • s(output fixed string) • αβ(sequence) • {for $x in $y/π [where χ] return α}(for loop) • {$x/π}(output path) • {$x}(output) • {if χ then α}(conditional)
FluX Query Language XQuery- expression is simple Can be executed without buffering the stream Example 1: <a> {$x} </a>{if $x/b = 5 then <b>5</b>} simple Example 2: {$x} {$x} not simple
FluX Query Language (ctd.) • FluX expressions • Simple XQuery- expression • s {process-stream $y: H } s´ • Event handlers H • on-first past(S) returnα • α: XQuery- expression • S: set of symbols • on a as $x return Q • a: symbol name • $x: variable • Q: FluX expression α executed on buffers Q executed in event-based fashion
Safe FluX Queries FluX query is safe No XQuery- expression refers to elements that may still be encountered in the stream Bibliography DTD <!ELEMENT bib (book)*> <!ELEMENT book ((title|author)*, price)> FluX query … <result> {process-stream $b: on title as $t return $t; on-first past (title,author) return {for $p in $b/price return $p}} </result> … Data stream … <book> <author>Kemper</author> <title>Datenbanksysteme</title> <author>Eickler</author> <price>39€</price> </book> … execute Not safe!
Safe FluX Queries FluX query is safe No XQuery- expression refers to elements that may still be encountered in the stream Bibliography DTD <!ELEMENT bib (book)*> <!ELEMENT book ((title|author)*, price)> FluX query … <result> {process-stream $b: on title as $t return $t; on-first past (title,author, price) return {for $p in $b/price return $p}} </result> … Data stream … <book> <author>Kemper</author> <title>Datenbanksysteme</title> <author>Eickler</author> <price>39€</price> </book> … execute Safe!
Outline • Motivation • FluX Query Language • Translating XQuery into FluX • Further Aspects • Experiments • Conclusion
XQuery to FluX • Rewrite XQuery- Q to FluX query F using (non-recursive) DTD • F is safe w.r.t. DTD • F is equivalent to Q • F has low memory consumption • Appropriate scheduling of event processors • Steps • Normalization of Q • Rewriting into FluX
Normalization • Rule-based rewriting of XQuery • Split paths in single step for loops • Eliminate where using if • Push down if expressions • Rewrite paths $x/a/… to for loops XMP, Q1 <bib> {for $b in $ROOT/bib/book where χ return <book> {$b/year} {$b/title} </book>} </bib> <bib> {for $bib in $ROOT/bib return {for $b in $bib/book return {if χ then <book>} {for $year in $b/year return {ifχthen {$year}}} {for $title in $b/title return {ifχthen {$title}}} {if χ then </book>}}} </bib>
Example function rewrite(Variable parentVar, Set<Σ> H, XQuery-β): FluX rewrite($ROOT, {}, Q) Delay execution of β <results> {for $bib in $ROOT/bib return {for $b in $bib/book return <result> {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} </result>}} </results> Bibliography DTD <!ELEMENT bib (book)*> <!ELEMENT book ((title|author)*,price)>
Example rewrite($ROOT, {}, β1) β1 simple, no delay generate on-first past () return … β1 <results> {for $bib in $ROOT/bib return {for $b in $bib/book return <result> {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} </result>}} </results> β2
Example rewrite($ROOT, {}, β2) {ps $ROOT: on-first past() return <results> {for $bib in $ROOT/bib return {for $b in $bib/book return <result> {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} </result>}} </results> β2
Example rewrite($ROOT, {}, β2) β21, β22 rewrite($ROOT, {}, β21) no delay generate on bib as $bib return … {ps $ROOT: on-first past() return <results> {for $bib in $ROOT/bib return {for $b in $bib/book return <result> {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} </result>}} </results> β21 β22
Example rewrite($bib, {}, α1) no delay generate on book as $b return … {ps $ROOT: on-first past() return <results> on bib as $bib return {for $b in $bib/book return <result> {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} </result> }} </results> α1
Example rewrite($b, {}, α2) as before, no delays generate on-first past() return … on title as $t return … {ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return <result> {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} </result> } </results> α2
Example Assure all titles before α32 rewrite($b, {title}, α32) rewrite($b, {title}, α41) delay execution after title, buffered execution generate on-first past(title,author) return … {ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return <result>; on title as $t return {$t}; {for $a in $b/author return {$a}} </result> } </results> α41 α32 α42
Example Assure all titles and authors before α42 rewrite($b, {title,authors}, α42) α42 simple, delay execution after title,author generate on-first past(title,author) return … {ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return <result>; on title as $t return {$t}; on-first past(title,author) return {for $a in $b/author return {$a}}; </result> } </results> α42
Example {ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return <result>; on title as $t return {$t}; on-first past(title,author) return {for $a in $b/author return {$a}}; on-first past(title,author) return </result>;}; </results>
Example {ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return <result>; on title as $t return {$t}; on-first past(title,author) return {for $a in $b/author return {$a}}; on-first past(title,author) return </result>;} on-first past(bib) return </results>;}
Example – Order Constraints Assure all titles before α41 rewrite($b, {title}, α41) DTD ensures titles before authors generate on author as $a return … {ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return <result>; on title as $t return {$t}; {for $a in $b/author return {$a}} </result> } </results> <!ELEMENT bib (book)*> <!ELEMENT book (title*, author*),…> α41 α42
Example Assure all titles before α41 rewrite($b, {title}, α41) H={title} DTD ensures titles before authors generate on author as $a return … {ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return <result>; on title as $t return {$t}; on author as $a return {$a}; on-first past(title,author) return </result>;}; on-first past(bib) return </results>;} <!ELEMENT bib (book)*> <!ELEMENT book (title*, author*)>
Outline • Motivation • FluX Query Language • Translating XQuery into FluX • Further Aspects • Experiments • Conclusion
Further Aspects Query Optimizer Visit our demonstration (Group 3: XML) XQuery To Normal Form DTD Algebraic Optimizations To FluX Runtime Engine Query Compiler Streamed Query Evaluator Memory Buffers XSAX XML Input Stream XML Output Stream
Outline • Motivation • FluX Query Language • Translating XQuery into FluX • Further Aspects • Experiments • Conclusion
Experiments • Based on XMark • Queries adapted to XQuery- fragment • Environment • AMD Athlon XP 2000, 512MB RAM • Linux, Sun JDK 1.4.2_03 • Measurements • Execution time • Memory consumption
Outline • Motivation • FluX Query Language • Translating XQuery into FluX • Further Aspects • Experiments • Conclusion
Conclusion • FluX • Event based extension of XQuery • Rewriting of XQuery into FluX Usage of information of DTD • FluX supports buffer-conscious query processing • Low main memory consumption • Efficient and scalable query execution on data streams • Future work • Recursive DTDs • Extension of XQuery- subset (e.g., //, aggregate operators) • Improve execution (joins)
Related Work • Altinel, Franklin. “Efficient Filtering of XML Documents for Selective Dissemination of Information”. VLDB 2000 • Buneman, Grohe, Koch. “Path Queries on Compressed XML”. VLDB 2003 • Chan, Felber, Garofalakis, Rastogi. “Efficient Filtering of XML Documents with XPath Expressions”. ICDE 2002 • Deutsch, Tannen. “Reformulation of XML Queries and Constraints”. ICDT 2003 • Fegaras, Levine, Bose, Chaluvadi. “Query Processing on Streamed XML Data”. CIKM 2002 • Green, Miklau, Onizuka, Suciu. “Processing XML Streams with Deterministic Automata”. ICDT 2003 • Gupta, Suciu. “Stream Processing of XPath Queries with Predicates”. SIGMOD 2003 • Ludäscher, Mukhopadhyay, Papakonstantinou. “A Transducer-Based XML Query Processor”. VLDB 2002 • Marian, Siméon. “Projecting XML Documents”. VLDB 2003 • Olteanu, Kiesling, Bry. “An Evaluation of Regular Path Expressions with Qualifiers against XML Streams”. ICDE 2003