360 likes | 492 Views
Πανεπιστήμιο Κρήτης Σχολή Θετικών Επιστημών Τμήμα Επιστήμης Υπολογιστών ΗΥ-56 1 : Διαχείριση Δεδομένων στον Παγκόσμιο Ιστό. Xquery Streaming à la Carte & Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation.
E N D
Πανεπιστήμιο ΚρήτηςΣχολή Θετικών ΕπιστημώνΤμήμα Επιστήμης Υπολογιστών ΗΥ-561: Διαχείριση Δεδομένων στον Παγκόσμιο Ιστό Xquery Streaming à la Carte & Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation
XQuery Streaming à la CarteIntroduction Introduction • Existing XML query evaluation techniques • Algebraic optimization with algorithms for persistent data • Streaming algorithms for transient data • New Idea • Physical algebra for XQuery • À la carte use of streaming algorithms & optimization techniques Konstantinos Galanakis
XQuery Streaming à la CarteIntroduction Diverse Data Sources Join of local repository and streaming source Konstantinos Galanakis
XQuery Streaming à la CartePreliminaries Preliminaries • List • Immutable ordered sequence of homogenous values • Cursor • Mutable ordered sequence of homogenous values • Destructive • C(α): Cursor containing values of type α • Operators • fromList • next • peek Konstantinos Galanakis
XQuery Streaming à la CartePhysical Data Model Physical Data Model 1/2 • Physical Value • Physical XML value, (Xml) • Cursor of XML tokens, C(Tok) • List of tree values, L(Tree) • Physical table, (Table) • Cursor of tuples, C(τ) • Physical Tuple, τ: record of fields containing physical XML values • List of tuples, L(τ) • XML Token, (Tok): • Parsing event produced by SAX parser Konstantinos Galanakis
XQuery Streaming à la CartePhysical Data Model Physical Data Model 2/2 • XML Token, (Tok) : Parsing event produced by SAX parser • startElem • endElem • text • atomic • hole Konstantinos Galanakis
XQuery Streaming à la CartePhysical Representation & Conversion Physical Representation & Conversion Konstantinos Galanakis
XQuery Streaming à la CartePhysical Algebra – Overview & Operators Physical algebra for logical Algebra proposed in C. Re, J. Simeon and M. Fernandez , “A complete and efficient algebraic compiler for XQuery”, In ICDE 2006 Konstantinos Galanakis
XQuery Streaming à la CartePhysical Algebra - Constructors Constructors Konstantinos Galanakis
XQuery Streaming à la CartePhysical Algebra – Navigation Operators Navigation Operators 1/3 • TreeProject • Projection of path expressions on a tree. • Injected after Parse to reduce the plan input size. • TreeJoin • Returns a node sequence in document order with no duplicate • Strictly-forward path expressions • self axes • child axes • descendant axes • descendant-or-self axes • attribute axes Konstantinos Galanakis
XQuery Streaming à la CartePhysical Algebra – Navigation Operators Navigation Operators 2/3 desc-or-self::section child::title Compiled in physical plan Applying the plan to an input document Konstantinos Galanakis
XQuery Streaming à la CartePhysical Algebra – Tuple Operators Tuple operators 1/2 • Polymorphic Operators except MapFromItem • MapFromItem • Input → Item sequence • Output → Tuple for each item • 2 implementations • For Lists of trees and for token cursors • Relies to map and split Konstantinos Galanakis
XQuery Streaming à la CartePhysical Algebra – Tuple Operators Tuple operators 2/2 Konstantinos Galanakis
XQuery Streaming à la CartePhysical Algebra – Code Selection Code selection 1/4 • Mapping from a logical plan (Op) to a physical plan (POp). • CS(Op) → POp • Physical plan correctness • Stream safety • Sufficient to ensure correctness Konstantinos Galanakis
XQuery Streaming à la CartePhysical Algebra – Code Selection Code selection 2/4 Op Conditions for Stream Safety Navigational access on the XML values returned by Op is strictly forward Tuples returned by Op consumed in the order of creation Tuple fields returned by Op accessed at most once Konstantinos Galanakis
XQuery Streaming à la CartePhysical Algebra – Code Selection Code selection 3/4 • Code selection heuristic based assumptions • conversion between physical representations is expensive • streaming operators are more efficient on streamed sources • copying whole sub-trees is expensive and should be avoided • Following rules are applied to each subplan Op of a whole plan Op0, bottom-up • If a)inputs of Op are streamed, b)streaming operators POp exists for OP c)Op is stream-safe, then CS(Op) selects Op • If Op is a constructor operator, CS(Op) uses a streaming operator. Konstantinos Galanakis
XQuery Streaming à la CarteExperimental Evaluation Experimental Evaluation 1/2 • Experiments on synthetic data • verify linear scalability of streaming operators w.r.t. query and document sizes • run over MemBeR documents in XCheck framework • XMark benchmarks • Q2, 6, 15 are fully streamable • Q1, 4, 5, 7, 14, 16 – 19 are partially streamable • Self-join queries Q8 – 12 /Q20 not streamable Konstantinos Galanakis
XQuery Streaming à la CarteExperimental Evaluation Experimental Evaluation 1/2 Konstantinos Galanakis
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationIntroduction General • Buffer manager of a streaming Xquery will • Only relevant query evaluation data put into buffer • Avoid keeping data buffered longer than necessary • Avoid keeping multiple copies of the data in buffers • Claim: Combination of static analysis and dynamic buffer minimization techniques needed Konstantinos Galanakis
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationIntroduction Previous Work 1/2 XQuery Projection Paths <q> { for $b in /bib/book where ($b/author=“A. Turing” and fn:exists($b/price)) return $b/title } </q> { /bib/book, /bib/book/author/ dos::node(), /bib/book/price, /bib/book/title/ dos::node() } XML Document bib book book article author price title isbn author price title isbn … … … … … … … … … … … Konstantinos Galanakis
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationIntroduction Previous Work 2/2 XQuery <q> { for $x1 in //book return for $x2 in //* return for $x3 in //article return <node/> } </q> Two approaches: (1) Single DOM-tree (2) Buffers for variables Konstantinos Galanakis
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationActive Garbage Collection Active Garbage Collection • Buffer management technique for Xquery Engines • Both static and dynamic analysis is exploited • Basic idea • Which data objects won’t be accessed in the future • A.G.C. Strategy • Reference counting • New approach • Roles assigned to nodes • Multiple roles per node • Multiple nodes per role • signOff-statement Konstantinos Galanakis
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationMain Idea Input stream Roles Projection Tree Buffer (Nodes role annotation) Role removal (A.G.C.) XQuery normalizations Rewritten Xquery (Role updates) Variable bindings Evaluator Output stream XQuery Konstantinos Galanakis
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationQuery Language Query Language • XQ is an XQuery Fragment • Nested for-expressions • Conditions • Joins • Covers syntactically simple fragments of Xquery • Assume that syntactically richer fragment could be evaluated • Remove let-expressions → Query normalization • Rewrite where-conditions to if-then-else expressions • Replace for-loop with nested single step for-loops Konstantinos Galanakis
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationQuery Language where-expressions → if-statement <r> { for $b in /bib return ( if (fn:exists($b/book)) then <books> else (), if (fn:exists($b/book)) then $b/book else (), if (fn:exists($b/book)) then </books> else () ) } </r> <r> { for $b in /bib where (fn:exists($b/book)) return <books>{ $b/book }</books> } </r> Konstantinos Galanakis
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationQuery Language where-expressions → if-statement pushing if-statements <r> { for $b in /bib return ( if (fn:exists($b/book)) then <books> else (), if (fn:exists($b/book)) then $b/book else (), if (fn:exists($b/book)) then </books> else () ) } </r> <r> { for $b in /bib where (fn:exists($b/book)) return <books>{ $b/book }</books> } </r> Konstantinos Galanakis
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationQuery Language Role extraction <r> { for $bib in /bib return (for $x in $bib/* return if (not(fn:exists($x/price))) then $x else (), for $b in $bib/book return $b/title ) } </r> / /bib /* /book /price[1] dos::node() /title/dos::node() KonstantinosGalanakis
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationQuery Language Role assignment Roles XML document r1 / r2 /bib r3 /bib/* r4 /bib/*/price[1] r5 /bib/*/dos::node() r6 /bib/book r7 /bib/book/title/dos::node() { r2 } bib { r3, r5, r6 } book { r5, r7 } { r5 } title author Roles assigned to document node when projected into buffer On-the-fly role assignment Nodes without roles and role-carrying ancestors need not to be buffered Konstantinos Galanakis
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationQuery Language Role update inserting <r> { for $bib in /bib return ( for $x in $bib/* return ( if (not(exists($x/price))) then $x else (), signOff($x,r3), signOff($x/price[1],r4), signOff($x/dos::node(),r5) ), for $b in $bib/book return ( $b/title, signOff($b,r6), signOff($b/title/dos::node(),r7))) ), signOff($bib,r2) ) } </r> <r> { for $bib in /bib return (for $x in $bib/* return if (not(fn:exists($x/price))) then $x else (), for $b in $bib/book return $b/title) } </r> r1 / r2 /bib $bib r3 /bib/* $x r4 /bib/*/price[1] $x/price r5 /bib/*/dos::node() $x r6 /bib/book $b r7 /bib/book/title/dos::node() $b/title Konstantinos Galanakis
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationQuery Language Active Garbage Collection <r> { for $bib in /bib return ( for $x in $bib/* return ( if (not(exists($x/price))) then $x else (), signOff($x,r3), signOff($x/price[1],r4), signOff($x/dos::node(),r5) ), for $b in $bib/book return ( $b/title, signOff($b,r6), signOff($b/title/dos::node(),r7))) ), signOff($bib,r2) ) } </r> Input stream: <bib> <book> <title/> <author/> </book> … Buffer: {r2} bib {r6} {r5 , r6} {r3 , r5 , r6} book {r5 , r7} {r7} {} {r5} title author Output stream: <r> <book> <title/> <author/> </book> Konstantinos Galanakis
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationOptimizations Path steps → for-expressions <r> { for $bib in /bib (return $bib/book, signOff($bib,r1), signOff($bib/book/dos::node(),r2)) } </r> <r> { for $bib in /bib return $bib/book } </r> <r> { for $bib in /bib return (for $_1 in $bib/book (return $_1/book, signOff($_1/book/dos::node(),r2)), signOff($bib,r1)) } </r> <r> { for $bib in /bib return for $_1 in $bib/book return $_1/book } </r> Aggregated roles Remove redundant roles Konstantinos Galanakis
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationBenchmarking Benchmark Results 1/5 • Time and memory consumption • Queries and documents from the XMark Benchmark • Queries and documents modified to match the supported fragment • 3GHz CPU Intel Pentium IV with 2GB RAM • SuSe Linux 10.0, J2RE v1.4.2 for Java-based systems • Time limit: 1 hour • Benchmarks against the following systems • FluX • Java in-memory engine for streaming XQuery evaluation. • MonetDB v4.12.0/XQuery v0.12.0 • A secondary storage engine written in C++. Loading of the document is included in time measurements. • QizX/open v1.1 • Free in-memory XQuery engine written in Java. • Saxon v8.7.1 • Free in-memory XQuery engine written in Java. KonstantinosGalanakis
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationBenchmarking Benchmark Results 2/5 XMark Q1: Running time (s) <query1> { for $s in /site return for $p in $s/people return for $pe in $pe/person return if ($pe/person_id="person0") then <result>{ $pe/name }</result> else () } </query1> Konstantinos Galanakis
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationBenchmarking Benchmark Results 3/5 Memory Consumption (MB) XMark Q1: <query1> { for $s in /site return for $p in $s/people return for $pe in $pe/person return if ($pe/person_id="person0") then <result>{ $pe/name }</result> else () } </query1> Konstantinos Galanakis
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationBenchmarking Benchmark Results 4/5 XMark Q8: <query8> { for $root in (/) return for $site in $root/site return for $people in $site/people return for $person in $people/person return <item> { ( <person>{ $person/name }</person>, <items_bought> { for $site2 in $root/site return for $cas in $site2/closed_auctions return for $ca in $cas/closed_auction return for $buyer in $ca/buyer return if ($buyer/buyer_person=$person/person_id) then <result> { $ca } </result> else () } </items_bought> ) } </item> } </query8> Konstantinos Galanakis
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationBenchmarking Benchmark Results 5/5 XMark Q8 Konstantinos Galanakis Failure for 100MB: MonetDB – Failure for 200MB: GCX, FluxQuery, MonetDB