370 likes | 498 Views
Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach. Xin Zhang, Bradford Pielech and Elke A. Rundensteiner. XML. Relational Database. Flexible and powerful way to: Represent data on the web Exchange data between applications. 1) Widely used to store business data
E N D
Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach Xin Zhang, Bradford Pielech and Elke A. Rundensteiner DSRG, Worcester Polytechnic Institute
XML Relational Database • Flexible and powerful way to: • Represent data on the web • Exchange data between applications 1) Widely used to store business data 2) Efficient, reliable, secure 3) Provides standard querying (SQL) XML and Relational The look and feel of an XML query system combined with the maturity and technology support of RDB Rainbow + DSRG, Worcester Polytechnic Institute
XML Document Virtual XML Document Virtual XML Document Virtual XML Document Virtual XML Document Virtual XML Document Architecture Rainbow XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User Query Results in XML User XQuery XAT Executor XAT Optimizer SQL Generator XAT XAT XAT Tuples XAT View XAT View XAT View XQuery SQL RDBMS XAT: XML Algebra Tree DSRG, Worcester Polytechnic Institute
GOAL: XQuery level optimization DSRG, Worcester Polytechnic Institute
<results> <title>TCP/IP Illustrated</title> <title>Data on the Web</title> </results> <dxv> <book> <row> <bid>001</bid> <title>TCP/IP Illustrated</title> </row> <row> <bid>002</bid> <title>Data on the Web</title> </row> </book> <prices> <row> <bid>001</bid> <price>65.95</price> </row> <row> <bid>002</bid> <price>34.95</price> </row> </prices> </dxv> <prices> FOR $book IN document(“dxv.xml”)/book/row $prices IN document(“dxv.xml”)/prices/row WHERE $book/bid = $prices/bid RETURN <book> $book/title, $prices/price </book> </prices> Bid Title Bid Price 001 TCP/IP Illustrated 001 65.95 002 Data on the Web 002 34.95 Running Example <result> FOR $t IN document(“prices.xml”)/book/title RETURN $t </result> <prices> <book> <title>TCP/IP Illustrated</title> <price>65.95</price> </book> <book> <title>Data on the Web</title> <price>34.95</price> </book> </prices> DSRG, Worcester Polytechnic Institute
XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery User XML Algebra Tree (XAT) User Query col3 1: <result> FOR $t IN document(“prices.xml”)/book/title RETURN $t </result> 2: T<results>$t</result>col3 Agg 3: R0, book/title$t 6: S”prices.xml”R0 7: DSRG, Worcester Polytechnic Institute
XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery View XML Algebra Tree (XAT) View Query <prices> FOR $book IN document(“dxv.xml”)/book/row $prices IN document(“dxv.xml”)/prices/row WHERE $book/bid = $prices/bid RETURN <book> $book/title, $prices/price </book> </prices> col6=col7 26: $book, bidcol6 27: $prices, bidcol7 28: 31: T<book> [col10][col12] </book>col5 22: 23: T<prices>col5</prices>col4 11: $book, titlecol10 R3, /prices/row$prices R1, /book/row$book 20: 14: Agg 12: $prices, pricecol12 25: 15: S“dxv.xml” R3 S“dxv.xml” R1 21: DSRG, Worcester Polytechnic Institute
XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery Merged XML Algebra Tree (XAT) User Query View Query col6=col7 col3 26: 1: 2: T<prices>col5</prices>col4 $book, bidcol6 T<results>$t</result>col3 11: 27: Agg 3: Agg $prices, bidcol7 12: 28: R0, book/title$t T<book> [col10][col12] </book>col5 6: 31: 22: col4 R0 23: 7: $book, titlecol10 R3, /prices/row$prices R1, /book/row$book 20: 14: $prices, pricecol12 25: 15: S“dxv.xml” R3 S“dxv.xml” R1 21: DSRG, Worcester Polytechnic Institute
Outline • XAT Optimization: • XAT Rewrite • XAT Cleanup • Preliminary Evaluation • Related Work • Summary DSRG, Worcester Polytechnic Institute
XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery XAT Rewrite • Query Optimization at Logic Level. • Goal: • Redundancy Elimination. • Computation Pushdown. • Technique: • Equivalence Rewrite Rules. • Heuristics: • Pushdown Navigates • Remove Construction of Intermediate Result • Combine Multiple Operators. DSRG, Worcester Polytechnic Institute
Before Navigation Pushdown User Query View Query col6=col7 col3 26: 1: $book, bidcol6 2: T<results>$t</result>col3 27: Agg 3: $prices, bidcol7 28: T<book> [col10][col12] </book>col5 R0, book/title$t 6: 31: 22: col4 R0 23: 7: $book, titlecol10 R3, /prices/row$prices R1, /book/row$book 20: T<prices>col5</prices>col4 14: 11: $prices, pricecol12 25: 15: Agg S“dxv.xml” R3 S“dxv.xml” R1 12: 21: DSRG, Worcester Polytechnic Institute
R3, /prices/row$prices 20: 14: R1, /book/row$book S“dxv.xml” R1 S“dxv.xml” R3 15: 21: After Navigation Pushdown User Query View Query T<book> [col10][col12] </book>col5 22: col3 1: col6=col7 26: 2: T<results>$t</result>col3 31: Agg 3: $prices, pricecol12 $book, titlecol10 23: 25: R0, book/title$t 6: $prices, bidcol7 $book, bidcol6 28: 27: T<prices>col5</prices>R0 11: Agg 12: DSRG, Worcester Polytechnic Institute
$book, bidcol6 col3 1: 27: T<results>$t</result>col3 2: $book, title$t 23: Agg 3: 14: R1, /book/row$book S“dxv.xml” R1 15: After Tagger Cancel Out User Query View Query JOINcol6=col7 31: $prices, pricecol12 25: $prices, bidcol7 28: R3, /prices/row$prices 20: S“dxv.xml” R3 21: DSRG, Worcester Polytechnic Institute
Outline • XAT Optimization • XAT Rewrite • XAT Cleanup • Preliminary Evaluation • Related Work • Summary DSRG, Worcester Polytechnic Institute
XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery XAT Cleanup • Why: • SQL engine cannot reduce redundancy in XQuery. • How: • Data Redundancy by Schema Cleanup • Each operator produced, consumed and modified some columns. • Minimum schema is then computed. • Tree Redundancy by Unused Operator Cutting • Cutting matrix generation. • Required columns analysis. • Operator cutting. DSRG, Worcester Polytechnic Institute
XAT Operator Properties • Produced • Desc: New column generated by operator. • Example: , S, T • Consumed • Desc: Columns required by operator. • Example: , • Modified • Desc: Columns modified by operator. • Example: , , DSRG, Worcester Polytechnic Institute
Schema Computation Node Parent Produced Consumed Old Schema 1 {} {col3} {col3} 1: col3 2 1 {col3} {$t} {col3, R1, $book, col6, $t, R3, $prices, col7, col12} T<results>$t</result>col3 2: 3 2 {} {} {R1, $book, col6, $t, R3, $prices, col7, col12} Agg 3: 31 3 {} {col6, col7} {R1, $book, col6, $t, R3, $prices, col7, col12} col6=col7 31: 23 31 {$t} {$book} {R1, $book, col6, $t} 25: $prices, pricecol12 $book, title$t 23: 27 23 {col6} {$book} {R1, $book, col6} 14 27 {$book} {R1} {R1, $book} 28: $prices, bidcol7 $book, bidcol6 27: 15 14 {R1} {} {R1} 20: 25 31 {col12} {$prices} {R3, $prices, col7, col12} R3, /prices/row$prices R1, /book/row$book 14: 28 25 {col7} {$prices} {R3, $prices, col7} S“dxv.xml” R3 21: 20 28 {$prices} {R3} {R3, $prices} S“dxv.xml” R1 15: 21 20 {R3} {} {R3} DSRG, Worcester Polytechnic Institute
Intuition: Don’t keep anything that’s not used later. Schema Computation # Parent() col3 $t col6 col7 $book R1 col12 $prices R3 New Schema 1: 1 C {col3} col3 2 1 P C {col3} T<results>$t</result>col3 2: 3 2 {$t} 31* 3 C C {$t} Agg 3: 23 31 P C {col6, $t} 27 23 P C {$book, col6} col6=col7 31: 14 27 P C {$book} 25: $prices, pricecol12 $book, title$t 23: 15 14 P {R1} 25 31 P C {col7, col12} 28: $prices, bidcol7 $book, bidcol6 27: 28 25 P C {$prices, col7} 20: R3, /prices/row$prices R1, /book/row$book 20 28 P C {$prices} 14: 21 20 P {R3} S“dxv.xml” R3 21: S“dxv.xml” R1 *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. 15: DSRG, Worcester Polytechnic Institute
Schema Cleanup Result DSRG, Worcester Polytechnic Institute
XAT Cleanup • Schema Cleanup • Each operator produced, consumed and modified some columns. • Minimum schema is then computed. • Unused Operator Cutting • Cutting matrix generation. • Required columns analysis. • Operator cutting. DSRG, Worcester Polytechnic Institute
Cutting Matrix • Purpose: • Get rid of the unused operators. • Equations: • Propagation of modified • Propagation of required • Identify cuttable node. DSRG, Worcester Polytechnic Institute
Matrix Computation 1: col3 T<results>$t</result>col3 2: Agg 3: JOINcol6=col7 31: 25: $prices, pricecol12 $book, title$t 23: 28: $prices, bidcol7 $book, bidcol6 27: 20: R3, /prices/row$prices R1, /book/row$book *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. 14: S“dxv.xml” R3 21: S“dxv.xml” R1 15: DSRG, Worcester Polytechnic Institute
Intuition: Give me only the required columns in order to get the final result. Matrix Computation (Cont.1) 1: col3 # Parent() col3 $t col6 col7 $book R1 col12 $prices R3 Cut? T<results>$t</result>col3 2: 1 R R R R 2 1 P C 3 2 - M - - - - - - - Agg 3: 31* 3 C C 23 31 P C JOINcol6=col7 31: 27 23 P C 14 27 P C 25: $prices, pricecol12 $book, title$t 23: 15 14 P 25 31 P C 28: $prices, bidcol7 28 25 P C $book, bidcol6 27: 20 28 P C 20: 21 20 P R3, /prices/row$prices R1, /book/row$book *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. 14: S“dxv.xml” R3 21: S“dxv.xml” R1 15: DSRG, Worcester Polytechnic Institute
Matrix Computation (Cont. 2) 1: col3 T<results>$t</result>col3 2: Agg 3: JOINcol6=col7 31: 25: $prices, pricecol12 $book, title$t 23: 28: $prices, bidcol7 $book, bidcol6 27: 20: R3, /prices/row$prices R1, /book/row$book *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. 14: S“dxv.xml” R3 21: S“dxv.xml” R1 15: DSRG, Worcester Polytechnic Institute
XAT after Cutting 1: col3 col3 1: T<results>$t</result>col3 2: Agg T<results>$t</result>col3 3: 2: Agg JOINcol6=col7 31: 3: Reduced To 25: $prices, pricecol12 $book, title$t $book, title$t 23: 23: 28: R1, /book/row$book $prices, bidcol7 14: $book, bidcol6 27: 20: S“dxv.xml” R1 R3, /prices/row$prices 15: R1, /book/row$book 14: S“dxv.xml” R3 21: S“dxv.xml” R1 15: DSRG, Worcester Polytechnic Institute
XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery SQL Generated SELECT “$book”.title as “$t”, “$book”.bid as “col6”, “$prices”.price as “col12”, “$prices”.bid as “col7” FROM book “$book”, prices “$prices” WHERE “col6”=“col7” 1: col3 col3 1: 2: T<results>$t</result>col3 T<results>$t</result>col3 2: Agg 3: Agg 3: 31: JOINcol6=col7 $book, title$t 23: 25: $prices, pricecol12 23: $book, title$t R1, /book/row$book 14: 28: $prices, bidcol7 27: $book, bidcol6 S“dxv.xml” R1 15: 20: R3, /prices/row$prices 14: R1, /book/row$book SELECT “$book”.title as “$t”, FROM book “$book”, S“dxv.xml” R3 21: 15: S“dxv.xml” R1 DSRG, Worcester Polytechnic Institute
Outline • XAT Optimization • XAT Rewrite • XAT Cleanup • Preliminary Evaluation • Related Work • Summary DSRG, Worcester Polytechnic Institute
Preliminary Evaluation • Experiment Setup • XQuery over Kweelt Parser • PIII800 256 MB, Win 2k Pro. • Data Setup • Synthetic Data • Synthetic Queries • Query Execution • Native XML Engine. DSRG, Worcester Polytechnic Institute
Performance Gain in Execution DSRG, Worcester Polytechnic Institute
Query Engine Overhead XAT Optimizer XAT Generator XAT Decorrelator XAT Merger User XAT User XAT XAT Rewrite User XQuery XAT XAT XAT XAT Executor SQL Generator XAT View XAT View XAT XAT Cleanup View XQuery Total: 32,522 ms DSRG, Worcester Polytechnic Institute
Outline • XAT Optimization • XAT Rewrite • XAT Cleanup • Preliminary Evaluation • Related Work • Summary DSRG, Worcester Polytechnic Institute
Related Work • Rainbow: • Optimize on XAT. (static analysis) • Algebra level rewriting. • SQL Optimization • Algebra based optimization. • Static analysis. • XQuery by Views: Optimize in SQL. • XPERANTO[VLDBJ2000]: XQGM vs. XAT • Extension by UDFs for XML features. • SilkRoute[IEEE2001(24:2)]: • Generate SQL Efficiently. • AGORA[VLDB2000]: • Syntax level rewriting. DSRG, Worcester Polytechnic Institute
Summary • Efficient XQuery Processing • XML Algebra Tree (XAT) • XAT Optimization: • Rewrite by using equivalent rules • Cleanup • Schema cleanup • Operator cutting • Prototype system implementation. DSRG, Worcester Polytechnic Institute
Questions?(Futures!) http://davis.wpi.edu/dsrg/rainbow https://sourceforge.net/projects/rainbow-engine/ Special Thanks: Brian Murphy, Luping Ding, DSRG group. DSRG, Worcester Polytechnic Institute
XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer SQL Generator XAT XAT XAT XAT View XAT View XAT View XQuery DSRG, Worcester Polytechnic Institute
Schema Computation col3 1: T<results>$t</result>col3 2: Agg 3: col6=col7 31: $prices, pricecol12 25: $book, title$t 23: $prices, bidcol7 28: $book, bidcol6 27: R3, /prices/row$prices 20: R1, /book/row$book 14: S“dxv.xml” R3 21: S“dxv.xml” R1 15: DSRG, Worcester Polytechnic Institute
$book, bidcol6 col3 1: 27: $prices, pricecol12 25: T<results>$t</result>col3 2: $prices, bidcol7 28: Agg 3: R3, /prices/row$prices 20: 14: R1, /book/row$book S“dxv.xml” R1 S“dxv.xml” R3 15: 21: After Tagger Cancel Out User Query View Query col6=col7 26: 31: $book, title$t 23: DSRG, Worcester Polytechnic Institute