460 likes | 961 Views
Beyond the Rainbow: —— A Pot of Gold ala XML Database Projects WPI DSRG GROUP Motivation XML is new, and here to stay … Universal flexible representation of data De facto standard for information exchange XQuery is useful, and here to stay… Powerful query language for XML
E N D
Beyond the Rainbow:—— A Pot of Gold ala XML Database Projects WPI DSRG GROUP DSRG, Worcester Polytechnic Institute
Motivation • XML is new, and here to stay … • Universal flexible representation of data • De facto standard for information exchange • XQuery is useful, and here to stay… • Powerful query language for XML • De facto standard for XML querying • Plentitude of relevant new issues … DSRG, Worcester Polytechnic Institute
searching • querying • integrating • restructuring • updating XML Queries And Updates XML Paradigm EVE-Middleware Internet WWW: global scale distributed information system for sharing data XML1 XML5 RDBL6 XML2 XML3 RDB4 XMLn DSRG, Worcester Polytechnic Institute
XML Data Management Middleware Technology What We Aim For… • efficient • flexible • scalable • lightweight • resource-sensitive • adaptive Internet EVE-Middleware RDB5 XML1 XML6 XML2 RDB3 XML4 XMLn DSRG, Worcester Polytechnic Institute
WPI Project Directions • RAINBOW: Exploiting RDB for XML management: • Algebraic-XQuery processing • XCube: Flexible XML Mapping Tool: • Flexible loading/extracting XML to RDB via XQuery • Updating Virtual XML Views: • Update decomposition and trigger-propagation • MASS: Native XML Query Engine: • Multi-axis compressed order-preserving XML storage DSRG, Worcester Polytechnic Institute
WPI Project Directions • XCache: XML Query Caching: • Cache containment and query rewriting • Materialized XML View Maintenance: • Incremental algebraic maintenance strategy • SAXE: XML Incremental Updating & Evolution: • Lightweight updating by update query rewriting • RAINDROP: XQuery-based Stream Processing: • Adaptive on-fly multi-subscription optimization DSRG, Worcester Polytechnic Institute
THE RAINBOW PROJECT DSRG, Worcester Polytechnic Institute
The look and feel of an XML query system with maturity and technology support of RDB XML Relational Database • Emerging web standard • Flexible data representation • Powerful query language 1) Widely used to store business data 2) Efficient, reliable, secure DBMS 3) Mature query processing techniques XML meets Relational DBs Rainbow + DSRG, Worcester Polytechnic Institute
<results> <title>TCP/IP Illustrated</title> <title>Data on the Web</title> </results> <dxv> <book> <row> <bid>001</bid> <title>TCP/IP Illustrated</title> </row> <row> <bid>002</bid> <title>Data on the Web</title> </row> </book> <prices> <row> <bid>001</bid> <price>65.95</price> </row> <row> <bid>002</bid> <price>34.95</price> </row> </prices> </dxv> <prices> FOR $book IN document(“dxv.xml”)/book/row $prices IN document(“dxv.xml”)/prices/row WHERE $book/bid = $prices/bid RETURN <book> $book/title, $prices/price </book> </prices> Bid Title Bid Price 001 TCP/IP Illustrated 001 65.95 002 Data on the Web 002 34.95 Running Example <result> FOR $t IN document(“prices.xml”)/book/title RETURN $t </result> <prices> <book> <title>TCP/IP Illustrated</title> <price>65.95</price> </book> <book> <title>Data on the Web</title> <price>34.95</price> </book> </prices> DSRG, Worcester Polytechnic Institute
XML Default View XML Default View • Fixed and straight-forward mapping scheme. <DB> <BOOKS> <ROW> <Cover>Paperback</Cover> <TITLE>Texas Holdem'</TITLE> <AUTHORS>David Sklansky, Straight Flush</AUTHORS> </ROW> <ROW> <Cover>Paperback</Cover> <TITLE>Dracula</TITLE> <AUTHORS>Bram Stoker</AUTHORS> </ROW> </BOOKS> <…> … </DB> DSRG, Worcester Polytechnic Institute
Generic Loading • Knowledge of schema of XML document to be loaded helps to reduce unnecessary parts. FUNCTION Q1($root){ LET $maintag := gettag($root) RETURN <$maintag $root/@*> FOR $actual IN $root/* LET $innertag := gettag($actual) RETURN IF ($actual/element()) THEN Q1($actual) ELSE <$innertag $actual/@*> IF ($actual/text()) THEN <PCDATA value=$actual/text()/> ELSE "" </$innertag> </$maintag> } DSRG, Worcester Polytechnic Institute
Instantiation • Generic loading XQuery expression recursive. • + It works for every XML document. • Many recursive calls return no value.- Unnecessary FOR-loops, IF-clauses, and getName()-fct. XML Schema XML Schema XQuery Expression (flat) XQuery Expression Instantiator XQuery Expression (recursive) XQuery Expression DSRG, Worcester Polytechnic Institute
Instantiation (Example) (First Step of CLOCK mapping scheme) Instantiated Loading Query • Short, non-recursive, more efficient … • But: XML schema dependent! FUNCTION Q1($root){ <BOOKLIST> FOR $book IN $root/BOOK RETURN <BOOK$book/@cover> <TITLE> <PCDATAvalue=$book/TITLE/text()/> </TITLE> <AUTHOR> FOR $name IN $book/AUTHOR/NAME RETURN <NAME> <PCDATAvalue=$book/A…/> </NAME> </AUTHOR> </BOOK> </BOOKLIST> } DSRG, Worcester Polytechnic Institute
1 XML XML’ 2 H F G g Relation Relation’ f RDB Default XQuery (Load) XML XML View Reverser RDB Default XQuery (Extract) XML XML View Flexible Mapping Management DSRG, Worcester Polytechnic Institute
XCube in a Nutshell • Easy-to-use (no new transformation language). • Flexible (interchangeable XQuery expressions). • Adaptable (to workload, data specifics, …). • General (Schema independent). • Extendable (with new mapping schemes). • Tunable (Loading manager). • Generic XQuery loading expressions • XQuery load expression instantiation DSRG, Worcester Polytechnic Institute
XML Document Virtual XML Document Virtual XML Document Virtual XML Document Virtual XML Document Virtual XML Document Architecture Rainbow XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User Query Results in XML User XQuery XAT Executor XAT Optimizer SQL Generator XAT XAT XAT Tuples XAT View XAT View XAT View XQuery SQL RDBMS XAT: XML Algebra Tree DSRG, Worcester Polytechnic Institute
XQuery-Level Optimization • XAT - XML Algebra Tree Model • XAT Algebraic Query Plan Optimization • XAT Query Plan Reduction DSRG, Worcester Polytechnic Institute
XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery User XML Algebra Tree (XAT) User Query col3 1: <result> FOR $t IN document(“prices.xml”)/book/title RETURN $t </result> 2: T<results>$t</result>col3 Agg 3: R0, book/title$t 6: S”prices.xml”R0 7: DSRG, Worcester Polytechnic Institute
XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery View XML Algebra Tree (XAT) View Query <prices> FOR $book IN document(“dxv.xml”)/book/row $prices IN document(“dxv.xml”)/prices/row WHERE $book/bid = $prices/bid RETURN <book> $book/title, $prices/price </book> </prices> col6=col7 26: $book, bidcol6 27: $prices, bidcol7 28: 31: T<book> [col10][col12] </book>col5 22: 23: T<prices>col5</prices>col4 11: $book, titlecol10 R3, /prices/row$prices R1, /book/row$book 20: 14: Agg 12: $prices, pricecol12 25: 15: S“dxv.xml” R3 S“dxv.xml” R1 21: DSRG, Worcester Polytechnic Institute
XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery Merged XML Algebra Tree (XAT) User Query View Query col6=col7 col3 26: 1: 2: T<prices>col5</prices>col4 $book, bidcol6 T<results>$t</result>col3 11: 27: Agg 3: Agg $prices, bidcol7 12: 28: R0, book/title$t T<book> [col10][col12] </book>col5 6: 31: 22: col4 R0 23: 7: $book, titlecol10 R3, /prices/row$prices R1, /book/row$book 20: 14: $prices, pricecol12 25: 15: S“dxv.xml” R3 S“dxv.xml” R1 21: DSRG, Worcester Polytechnic Institute
XQuery-Level Optimization • XML Algebra Representation: XAT • XAT Query Plan Rewriting • XAT Query Plan Reduction DSRG, Worcester Polytechnic Institute
XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery XAT Rewrite • Query Optimization at Logic Algebra Level. • Goals: • Redundancy Elimination. • Computation Pushdown. • Technique: • Equivalence Rewrite Rules. • Heuristics: • Pushdown Navigates • Remove Construction of Intermediate Result • Combine Multiple Operators. DSRG, Worcester Polytechnic Institute
Before Navigation Pushdown User Query View Query col6=col7 col3 26: 1: $book, bidcol6 2: T<results>$t</result>col3 27: Agg 3: $prices, bidcol7 28: T<book> [col10][col12] </book>col5 R0, book/title$t 6: 31: 22: col4 R0 23: 7: $book, titlecol10 R3, /prices/row$prices R1, /book/row$book 20: T<prices>col5</prices>col4 14: 11: $prices, pricecol12 25: 15: Agg S“dxv.xml” R3 S“dxv.xml” R1 12: 21: DSRG, Worcester Polytechnic Institute
R3, /prices/row$prices 20: 14: R1, /book/row$book S“dxv.xml” R1 S“dxv.xml” R3 15: 21: After Navigation Pushdown User Query View Query T<book> [col10][col12] </book>col5 22: col3 1: col6=col7 26: 2: T<results>$t</result>col3 31: Agg 3: $prices, pricecol12 $book, titlecol10 23: 25: R0, book/title$t 6: $prices, bidcol7 $book, bidcol6 28: 27: T<prices>col5</prices>R0 11: Agg 12: DSRG, Worcester Polytechnic Institute
R3, /prices/row$prices 20: 14: R1, /book/row$book S“dxv.xml” R1 S“dxv.xml” R3 15: 21: Remove any Taggers? User Query View Query T<book> [col10][col12] </book>col5 22: col3 1: col6=col7 26: 2: T<results>$t</result>col3 31: Agg 3: $prices, pricecol12 $book, titlecol10 23: 25: R0, book/title$t 6: $prices, bidcol7 $book, bidcol6 28: 27: T<prices>col5</prices>R0 11: Agg 12: DSRG, Worcester Polytechnic Institute
$book, bidcol6 col3 1: 27: $prices, pricecol12 25: T<results>$t</result>col3 2: $prices, bidcol7 28: Agg 3: R3, /prices/row$prices 20: 14: R1, /book/row$book S“dxv.xml” R1 S“dxv.xml” R3 15: 21: After Tagger Cancel Out User Query View Query col6=col7 26: 31: $book, title$t 23: DSRG, Worcester Polytechnic Institute
$book, bidcol6 col3 1: 27: T<results>$t</result>col3 2: $book, title$t 23: Agg 3: 14: R1, /book/row$book S“dxv.xml” R1 15: After Making Join User Query View Query JOINcol6=col7 31: $prices, pricecol12 25: $prices, bidcol7 28: R3, /prices/row$prices 20: S“dxv.xml” R3 21: DSRG, Worcester Polytechnic Institute
XQuery-Level Optimization • XML Algebra Representation: XAT • XAT Query Plan Rewriting • XAT Query Plan Reduction DSRG, Worcester Polytechnic Institute
XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery XAT Cleanup • Why: • SQL engine cannot reduce redundancy in XQuery. • How: • Data Redundancy by Schema Cleanup • Each operator produced, consumed and modified some columns. • Minimum schema is then computed. • Tree Redundancy by Unused Operator Cutting • Cutting matrix generation. • Required columns analysis. • Operator cutting. DSRG, Worcester Polytechnic Institute
XAT Operator Properties • Produced • Desc: New column generated by operator. • Example: , S, T • Consumed • Desc: Columns required by operator. • Example: , • Modified • Desc: Columns modified by operator. • Example: , , DSRG, Worcester Polytechnic Institute
Schema Computation Node Parent Produced Consumed Old Schema 1 {} {col3} {col3} 1: col3 2 1 {col3} {$t} {col3, R1, $book, col6, $t, R3, $prices, col7, col12} T<results>$t</result>col3 2: 3 2 {} {} {R1, $book, col6, $t, R3, $prices, col7, col12} Agg 3: 31 3 {} {col6, col7} {R1, $book, col6, $t, R3, $prices, col7, col12} col6=col7 31: 23 31 {$t} {$book} {R1, $book, col6, $t} 25: $prices, pricecol12 $book, title$t 23: 27 23 {col6} {$book} {R1, $book, col6} 14 27 {$book} {R1} {R1, $book} 28: $prices, bidcol7 $book, bidcol6 27: 15 14 {R1} {} {R1} 20: 25 31 {col12} {$prices} {R3, $prices, col7, col12} R3, /prices/row$prices R1, /book/row$book 14: 28 25 {col7} {$prices} {R3, $prices, col7} S“dxv.xml” R3 21: 20 28 {$prices} {R3} {R3, $prices} S“dxv.xml” R1 15: 21 20 {R3} {} {R3} DSRG, Worcester Polytechnic Institute
Intuition: Don’t keep anything that’s not used later. Schema Computation # Parent() col3 $t col6 col7 $book R1 col12 $prices R3 New Schema 1: 1 C {col3} col3 2 1 P C {col3} T<results>$t</result>col3 2: 3 2 {$t} 31* 3 C C {$t} Agg 3: 23 31 P C {col6, $t} 27 23 P C {$book, col6} col6=col7 31: 14 27 P C {$book} 25: $prices, pricecol12 $book, title$t 23: 15 14 P {R1} 25 31 P C {col7, col12} 28: $prices, bidcol7 $book, bidcol6 27: 28 25 P C {$prices, col7} 20: R3, /prices/row$prices R1, /book/row$book 20 28 P C {$prices} 14: 21 20 P {R3} S“dxv.xml” R3 21: S“dxv.xml” R1 *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. 15: DSRG, Worcester Polytechnic Institute
Schema Cleanup Result DSRG, Worcester Polytechnic Institute
XAT Cleanup • Schema Cleanup • Each operator produced, consumed and modified some columns. • Minimum schema is then computed. • Unused Operator Cutting • Cutting matrix generation. • Required columns analysis. • Operator cutting. DSRG, Worcester Polytechnic Institute
Cutting Matrix • Purpose: • Get rid of unused operators. • Equations: • Propagation of modified • Propagation of required • Identify cuttable node. DSRG, Worcester Polytechnic Institute
Matrix Computation 1: col3 T<results>$t</result>col3 2: Agg 3: JOINcol6=col7 31: 25: $prices, pricecol12 $book, title$t 23: 28: $prices, bidcol7 $book, bidcol6 27: 20: R3, /prices/row$prices R1, /book/row$book *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. 14: S“dxv.xml” R3 21: S“dxv.xml” R1 15: DSRG, Worcester Polytechnic Institute
Intuition: Give me only the required columns in order to get the final result. Matrix Computation (Cont.1) 1: col3 # Parent() col3 $t col6 col7 $book R1 col12 $prices R3 Cut? T<results>$t</result>col3 2: 1 R R R R 2 1 P C 3 2 - M - - - - - - - Agg 3: 31* 3 C C 23 31 P C JOINcol6=col7 31: 27 23 P C 14 27 P C 25: $prices, pricecol12 $book, title$t 23: 15 14 P 25 31 P C 28: $prices, bidcol7 28 25 P C $book, bidcol6 27: 20 28 P C 20: 21 20 P R3, /prices/row$prices R1, /book/row$book *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. 14: S“dxv.xml” R3 21: S“dxv.xml” R1 15: DSRG, Worcester Polytechnic Institute
Matrix Computation (Cont. 2) 1: col3 T<results>$t</result>col3 2: Agg 3: JOINcol6=col7 31: 25: $prices, pricecol12 $book, title$t 23: 28: $prices, bidcol7 $book, bidcol6 27: 20: R3, /prices/row$prices R1, /book/row$book *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. 14: S“dxv.xml” R3 21: S“dxv.xml” R1 15: DSRG, Worcester Polytechnic Institute
XAT after Cutting 1: col3 col3 1: T<results>$t</result>col3 2: Agg T<results>$t</result>col3 3: 2: Agg JOINcol6=col7 31: 3: Reduced To 25: $prices, pricecol12 $book, title$t $book, title$t 23: 23: 28: R1, /book/row$book $prices, bidcol7 14: $book, bidcol6 27: 20: S“dxv.xml” R1 R3, /prices/row$prices 15: R1, /book/row$book 14: S“dxv.xml” R3 21: S“dxv.xml” R1 15: DSRG, Worcester Polytechnic Institute
XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery SQL Generated SELECT “$book”.title as “$t”, “$book”.bid as “col6”, “$prices”.price as “col12”, “$prices”.bid as “col7” FROM book “$book”, prices “$prices” WHERE “col6”=“col7” 1: col3 col3 1: 2: T<results>$t</result>col3 T<results>$t</result>col3 2: Agg 3: Agg 3: 31: JOINcol6=col7 $book, title$t 23: 25: $prices, pricecol12 23: $book, title$t R1, /book/row$book 14: 28: $prices, bidcol7 27: $book, bidcol6 S“dxv.xml” R1 15: 20: R3, /prices/row$prices 14: R1, /book/row$book SELECT “$book”.title as “$t”, FROM book “$book”, S“dxv.xml” R3 21: 15: S“dxv.xml” R1 DSRG, Worcester Polytechnic Institute
XQuery-Level Optimization • XML Algebra Representation: XAT • XAT Query Plan Rewriting • XAT Query Plan Reduction DSRG, Worcester Polytechnic Institute
Performance Gain in Execution DSRG, Worcester Polytechnic Institute
Rainbow Engine Overhead XAT Optimizer XAT Generator XAT Decorrelator XAT Merger User XAT User XAT XAT Rewrite User XQuery XAT XAT XAT XAT Executor SQL Generator XAT View XAT View XAT XAT Cleanup View XQuery Total: 32,522 ms Ack.: XQuery using Kweelt Parser DSRG, Worcester Polytechnic Institute
http://davis.wpi.edu/dsrg/rainbow https://sourceforge.net/projects/rainbow-engine/ Rainbow DSRG, Worcester Polytechnic Institute
Related Work • XPERANTO[VLDBJ2000]: XQGM vs. XAT • Xquery Views over RDB, Extension by UDFs for XML features • SilkRoute[IEEE2001(24:2)]: • Xquery Views over RDB, Generate SQL Efficiently • AGORA[VLDB2000]: • Syntax level rewriting. DSRG, Worcester Polytechnic Institute
Summary • Efficient XQuery Processing • XML Algebra Tree (XAT) • XAT Optimization: • Rewrite by using equivalent rules • Cleanup • Schema cleanup • Operator cutting • Prototype system implementation. DSRG, Worcester Polytechnic Institute