1 / 46

Beyond the Rainbow: —— A Pot of Gold ala XML Database Projects

Beyond the Rainbow: —— A Pot of Gold ala XML Database Projects. WPI DSRG GROUP. Motivation. XML is new, and here to stay … Universal flexible representation of data De facto standard for information exchange XQuery is useful, and here to stay… Powerful query language for XML

Download Presentation

Beyond the Rainbow: —— A Pot of Gold ala XML Database Projects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Beyond the Rainbow:—— A Pot of Gold ala XML Database Projects WPI DSRG GROUP DSRG, Worcester Polytechnic Institute

  2. Motivation • XML is new, and here to stay … • Universal flexible representation of data • De facto standard for information exchange • XQuery is useful, and here to stay… • Powerful query language for XML • De facto standard for XML querying • Plentitude of relevant new issues … DSRG, Worcester Polytechnic Institute

  3. searching • querying • integrating • restructuring • updating XML Queries And Updates XML Paradigm EVE-Middleware Internet WWW: global scale distributed information system for sharing data XML1 XML5 RDBL6 XML2 XML3 RDB4 XMLn DSRG, Worcester Polytechnic Institute

  4. XML Data Management Middleware Technology What We Aim For… • efficient • flexible • scalable • lightweight • resource-sensitive • adaptive Internet EVE-Middleware RDB5 XML1 XML6 XML2 RDB3 XML4 XMLn DSRG, Worcester Polytechnic Institute

  5. WPI Project Directions • RAINBOW: Exploiting RDB for XML management: • Algebraic-XQuery processing • XCube: Flexible XML Mapping Tool: • Flexible loading/extracting XML to RDB via XQuery • Updating Virtual XML Views: • Update decomposition and trigger-propagation • MASS: Native XML Query Engine: • Multi-axis compressed order-preserving XML storage DSRG, Worcester Polytechnic Institute

  6. WPI Project Directions • XCache: XML Query Caching: • Cache containment and query rewriting • Materialized XML View Maintenance: • Incremental algebraic maintenance strategy • SAXE: XML Incremental Updating & Evolution: • Lightweight updating by update query rewriting • RAINDROP: XQuery-based Stream Processing: • Adaptive on-fly multi-subscription optimization DSRG, Worcester Polytechnic Institute

  7. THE RAINBOW PROJECT DSRG, Worcester Polytechnic Institute

  8. The look and feel of an XML query system with maturity and technology support of RDB XML Relational Database • Emerging web standard • Flexible data representation • Powerful query language 1) Widely used to store business data 2) Efficient, reliable, secure DBMS 3) Mature query processing techniques XML meets Relational DBs Rainbow + DSRG, Worcester Polytechnic Institute

  9. <results> <title>TCP/IP Illustrated</title> <title>Data on the Web</title> </results> <dxv> <book> <row> <bid>001</bid> <title>TCP/IP Illustrated</title> </row> <row> <bid>002</bid> <title>Data on the Web</title> </row> </book> <prices> <row> <bid>001</bid> <price>65.95</price> </row> <row> <bid>002</bid> <price>34.95</price> </row> </prices> </dxv> <prices> FOR $book IN document(“dxv.xml”)/book/row $prices IN document(“dxv.xml”)/prices/row WHERE $book/bid = $prices/bid RETURN <book> $book/title, $prices/price </book> </prices> Bid Title Bid Price 001 TCP/IP Illustrated 001 65.95 002 Data on the Web 002 34.95 Running Example <result> FOR $t IN document(“prices.xml”)/book/title RETURN $t </result> <prices> <book> <title>TCP/IP Illustrated</title> <price>65.95</price> </book> <book> <title>Data on the Web</title> <price>34.95</price> </book> </prices> DSRG, Worcester Polytechnic Institute

  10. XML Default View XML Default View • Fixed and straight-forward mapping scheme. <DB> <BOOKS> <ROW> <Cover>Paperback</Cover> <TITLE>Texas Holdem'</TITLE> <AUTHORS>David Sklansky, Straight Flush</AUTHORS> </ROW> <ROW> <Cover>Paperback</Cover> <TITLE>Dracula</TITLE> <AUTHORS>Bram Stoker</AUTHORS> </ROW> </BOOKS> <…> … </DB> DSRG, Worcester Polytechnic Institute

  11. Generic Loading • Knowledge of schema of XML document to be loaded helps to reduce unnecessary parts. FUNCTION Q1($root){ LET $maintag := gettag($root) RETURN <$maintag $root/@*> FOR $actual IN $root/* LET $innertag := gettag($actual) RETURN IF ($actual/element()) THEN Q1($actual) ELSE <$innertag $actual/@*> IF ($actual/text()) THEN <PCDATA value=$actual/text()/> ELSE "" </$innertag> </$maintag> } DSRG, Worcester Polytechnic Institute

  12. Instantiation • Generic loading XQuery expression recursive. • + It works for every XML document. • Many recursive calls return no value.- Unnecessary FOR-loops, IF-clauses, and getName()-fct. XML Schema XML Schema XQuery Expression (flat) XQuery Expression Instantiator XQuery Expression (recursive) XQuery Expression DSRG, Worcester Polytechnic Institute

  13. Instantiation (Example) (First Step of CLOCK mapping scheme) Instantiated Loading Query • Short, non-recursive, more efficient … • But: XML schema dependent! FUNCTION Q1($root){ <BOOKLIST> FOR $book IN $root/BOOK RETURN <BOOK$book/@cover> <TITLE> <PCDATAvalue=$book/TITLE/text()/> </TITLE> <AUTHOR> FOR $name IN $book/AUTHOR/NAME RETURN <NAME> <PCDATAvalue=$book/A…/> </NAME> </AUTHOR> </BOOK> </BOOKLIST> } DSRG, Worcester Polytechnic Institute

  14. 1 XML XML’ 2 H F G g Relation Relation’ f RDB Default XQuery (Load) XML XML View Reverser RDB Default XQuery (Extract) XML XML View Flexible Mapping Management DSRG, Worcester Polytechnic Institute

  15. XCube in a Nutshell • Easy-to-use (no new transformation language). • Flexible (interchangeable XQuery expressions). • Adaptable (to workload, data specifics, …). • General (Schema independent). • Extendable (with new mapping schemes). • Tunable (Loading manager). • Generic XQuery loading expressions • XQuery load expression instantiation DSRG, Worcester Polytechnic Institute

  16. XML Document Virtual XML Document Virtual XML Document Virtual XML Document Virtual XML Document Virtual XML Document Architecture Rainbow XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User Query Results in XML User XQuery XAT Executor XAT Optimizer SQL Generator XAT XAT XAT Tuples XAT View XAT View XAT View XQuery SQL RDBMS XAT: XML Algebra Tree DSRG, Worcester Polytechnic Institute

  17. XQuery-Level Optimization • XAT - XML Algebra Tree Model • XAT Algebraic Query Plan Optimization • XAT Query Plan Reduction DSRG, Worcester Polytechnic Institute

  18. XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery User XML Algebra Tree (XAT) User Query col3 1: <result> FOR $t IN document(“prices.xml”)/book/title RETURN $t </result> 2: T<results>$t</result>col3 Agg 3: R0, book/title$t 6: S”prices.xml”R0 7: DSRG, Worcester Polytechnic Institute

  19. XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery View XML Algebra Tree (XAT) View Query <prices> FOR $book IN document(“dxv.xml”)/book/row $prices IN document(“dxv.xml”)/prices/row WHERE $book/bid = $prices/bid RETURN <book> $book/title, $prices/price </book> </prices> col6=col7 26: $book, bidcol6 27: $prices, bidcol7 28:  31: T<book> [col10][col12] </book>col5 22: 23: T<prices>col5</prices>col4 11: $book, titlecol10 R3, /prices/row$prices R1, /book/row$book 20: 14: Agg 12: $prices, pricecol12 25: 15: S“dxv.xml” R3 S“dxv.xml” R1 21: DSRG, Worcester Polytechnic Institute

  20. XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery Merged XML Algebra Tree (XAT) User Query View Query col6=col7 col3 26: 1: 2: T<prices>col5</prices>col4 $book, bidcol6 T<results>$t</result>col3 11: 27: Agg 3: Agg $prices, bidcol7 12: 28: R0, book/title$t T<book> [col10][col12] </book>col5 6:  31: 22: col4 R0 23: 7: $book, titlecol10 R3, /prices/row$prices R1, /book/row$book 20: 14: $prices, pricecol12 25: 15: S“dxv.xml” R3 S“dxv.xml” R1 21: DSRG, Worcester Polytechnic Institute

  21. XQuery-Level Optimization • XML Algebra Representation: XAT • XAT Query Plan Rewriting • XAT Query Plan Reduction DSRG, Worcester Polytechnic Institute

  22. XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery XAT Rewrite • Query Optimization at Logic Algebra Level. • Goals: • Redundancy Elimination. • Computation Pushdown. • Technique: • Equivalence Rewrite Rules. • Heuristics: • Pushdown Navigates • Remove Construction of Intermediate Result • Combine Multiple Operators. DSRG, Worcester Polytechnic Institute

  23. Before Navigation Pushdown User Query View Query col6=col7 col3 26: 1: $book, bidcol6 2: T<results>$t</result>col3 27: Agg 3: $prices, bidcol7 28: T<book> [col10][col12] </book>col5  R0, book/title$t 6: 31: 22: col4 R0 23: 7: $book, titlecol10 R3, /prices/row$prices R1, /book/row$book 20: T<prices>col5</prices>col4 14: 11: $prices, pricecol12 25: 15: Agg S“dxv.xml” R3 S“dxv.xml” R1 12: 21: DSRG, Worcester Polytechnic Institute

  24. R3, /prices/row$prices 20: 14: R1, /book/row$book S“dxv.xml” R1 S“dxv.xml” R3 15: 21: After Navigation Pushdown User Query View Query T<book> [col10][col12] </book>col5 22: col3 1: col6=col7 26: 2: T<results>$t</result>col3  31: Agg 3: $prices, pricecol12 $book, titlecol10 23: 25: R0, book/title$t 6: $prices, bidcol7 $book, bidcol6 28: 27: T<prices>col5</prices>R0 11: Agg 12: DSRG, Worcester Polytechnic Institute

  25. R3, /prices/row$prices 20: 14: R1, /book/row$book S“dxv.xml” R1 S“dxv.xml” R3 15: 21: Remove any Taggers? User Query View Query T<book> [col10][col12] </book>col5 22: col3 1: col6=col7 26: 2: T<results>$t</result>col3  31: Agg 3: $prices, pricecol12 $book, titlecol10 23: 25: R0, book/title$t 6: $prices, bidcol7 $book, bidcol6 28: 27: T<prices>col5</prices>R0 11: Agg 12: DSRG, Worcester Polytechnic Institute

  26. $book, bidcol6 col3 1: 27: $prices, pricecol12 25: T<results>$t</result>col3 2: $prices, bidcol7 28: Agg 3: R3, /prices/row$prices 20: 14: R1, /book/row$book S“dxv.xml” R1 S“dxv.xml” R3 15: 21: After Tagger Cancel Out User Query View Query col6=col7 26:  31: $book, title$t 23: DSRG, Worcester Polytechnic Institute

  27. $book, bidcol6 col3 1: 27: T<results>$t</result>col3 2: $book, title$t 23: Agg 3: 14: R1, /book/row$book S“dxv.xml” R1 15: After Making Join User Query View Query JOINcol6=col7 31: $prices, pricecol12 25: $prices, bidcol7 28: R3, /prices/row$prices 20: S“dxv.xml” R3 21: DSRG, Worcester Polytechnic Institute

  28. XQuery-Level Optimization • XML Algebra Representation: XAT • XAT Query Plan Rewriting • XAT Query Plan Reduction DSRG, Worcester Polytechnic Institute

  29. XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery XAT Cleanup • Why: • SQL engine cannot reduce redundancy in XQuery. • How: • Data Redundancy by Schema Cleanup • Each operator produced, consumed and modified some columns. • Minimum schema is then computed. • Tree Redundancy by Unused Operator Cutting • Cutting matrix generation. • Required columns analysis. • Operator cutting. DSRG, Worcester Polytechnic Institute

  30. XAT Operator Properties • Produced • Desc: New column generated by operator. • Example: , S, T • Consumed • Desc: Columns required by operator. • Example: ,  • Modified • Desc: Columns modified by operator. • Example: , ,  DSRG, Worcester Polytechnic Institute

  31. Schema Computation Node Parent Produced Consumed Old Schema 1 {} {col3} {col3} 1: col3 2 1 {col3} {$t} {col3, R1, $book, col6, $t, R3, $prices, col7, col12} T<results>$t</result>col3 2: 3 2 {} {} {R1, $book, col6, $t, R3, $prices, col7, col12} Agg 3: 31 3 {} {col6, col7} {R1, $book, col6, $t, R3, $prices, col7, col12} col6=col7 31: 23 31 {$t} {$book} {R1, $book, col6, $t} 25: $prices, pricecol12 $book, title$t 23: 27 23 {col6} {$book} {R1, $book, col6} 14 27 {$book} {R1} {R1, $book} 28: $prices, bidcol7 $book, bidcol6 27: 15 14 {R1} {} {R1} 20: 25 31 {col12} {$prices} {R3, $prices, col7, col12} R3, /prices/row$prices R1, /book/row$book 14: 28 25 {col7} {$prices} {R3, $prices, col7} S“dxv.xml” R3 21: 20 28 {$prices} {R3} {R3, $prices} S“dxv.xml” R1 15: 21 20 {R3} {} {R3} DSRG, Worcester Polytechnic Institute

  32. Intuition: Don’t keep anything that’s not used later. Schema Computation # Parent() col3 $t col6 col7 $book R1 col12 $prices R3 New Schema 1: 1 C {col3} col3 2 1 P C {col3} T<results>$t</result>col3 2: 3 2 {$t} 31* 3 C C {$t} Agg 3: 23 31 P C {col6, $t} 27 23 P C {$book, col6} col6=col7 31: 14 27 P C {$book} 25: $prices, pricecol12 $book, title$t 23: 15 14 P {R1} 25 31 P C {col7, col12} 28: $prices, bidcol7 $book, bidcol6 27: 28 25 P C {$prices, col7} 20: R3, /prices/row$prices R1, /book/row$book 20 28 P C {$prices} 14: 21 20 P {R3} S“dxv.xml” R3 21: S“dxv.xml” R1 *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. 15: DSRG, Worcester Polytechnic Institute

  33. Schema Cleanup Result DSRG, Worcester Polytechnic Institute

  34. XAT Cleanup • Schema Cleanup • Each operator produced, consumed and modified some columns. • Minimum schema is then computed. • Unused Operator Cutting • Cutting matrix generation. • Required columns analysis. • Operator cutting. DSRG, Worcester Polytechnic Institute

  35. Cutting Matrix • Purpose: • Get rid of unused operators. • Equations: • Propagation of modified • Propagation of required • Identify cuttable node. DSRG, Worcester Polytechnic Institute

  36. Matrix Computation 1: col3 T<results>$t</result>col3 2: Agg 3: JOINcol6=col7 31: 25: $prices, pricecol12 $book, title$t 23: 28: $prices, bidcol7 $book, bidcol6 27: 20: R3, /prices/row$prices R1, /book/row$book *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. 14: S“dxv.xml” R3 21: S“dxv.xml” R1 15: DSRG, Worcester Polytechnic Institute

  37. Intuition: Give me only the required columns in order to get the final result. Matrix Computation (Cont.1) 1: col3 # Parent() col3 $t col6 col7 $book R1 col12 $prices R3 Cut? T<results>$t</result>col3 2: 1 R R R R 2 1 P C 3 2 - M - - - - - - - Agg 3: 31* 3 C C 23 31 P C JOINcol6=col7 31: 27 23 P C 14 27 P C 25: $prices, pricecol12 $book, title$t 23: 15 14 P 25 31 P C 28: $prices, bidcol7 28 25 P C $book, bidcol6 27: 20 28 P C 20: 21 20 P R3, /prices/row$prices R1, /book/row$book *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. 14: S“dxv.xml” R3 21: S“dxv.xml” R1 15: DSRG, Worcester Polytechnic Institute

  38. Matrix Computation (Cont. 2) 1: col3 T<results>$t</result>col3 2: Agg 3: JOINcol6=col7 31: 25: $prices, pricecol12 $book, title$t 23: 28: $prices, bidcol7 $book, bidcol6 27: 20: R3, /prices/row$prices R1, /book/row$book *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. 14: S“dxv.xml” R3 21: S“dxv.xml” R1 15: DSRG, Worcester Polytechnic Institute

  39. XAT after Cutting 1: col3 col3 1: T<results>$t</result>col3 2: Agg T<results>$t</result>col3 3: 2: Agg JOINcol6=col7 31: 3: Reduced To 25: $prices, pricecol12 $book, title$t $book, title$t 23: 23: 28: R1, /book/row$book $prices, bidcol7 14: $book, bidcol6 27: 20: S“dxv.xml” R1 R3, /prices/row$prices 15: R1, /book/row$book 14: S“dxv.xml” R3 21: S“dxv.xml” R1 15: DSRG, Worcester Polytechnic Institute

  40. XAT Generator XAT Decorrelator XAT Merger User XAT User XAT User XQuery XAT Executor XAT Optimizer XAT XAT XAT SQL Generator XAT View XAT View XAT View XQuery SQL Generated SELECT “$book”.title as “$t”, “$book”.bid as “col6”, “$prices”.price as “col12”, “$prices”.bid as “col7” FROM book “$book”, prices “$prices” WHERE “col6”=“col7” 1: col3 col3 1: 2: T<results>$t</result>col3 T<results>$t</result>col3 2: Agg 3: Agg 3: 31: JOINcol6=col7 $book, title$t 23: 25: $prices, pricecol12 23: $book, title$t R1, /book/row$book 14: 28: $prices, bidcol7 27: $book, bidcol6 S“dxv.xml” R1 15: 20: R3, /prices/row$prices 14: R1, /book/row$book SELECT “$book”.title as “$t”, FROM book “$book”, S“dxv.xml” R3 21: 15: S“dxv.xml” R1 DSRG, Worcester Polytechnic Institute

  41. XQuery-Level Optimization • XML Algebra Representation: XAT • XAT Query Plan Rewriting • XAT Query Plan Reduction DSRG, Worcester Polytechnic Institute

  42. Performance Gain in Execution DSRG, Worcester Polytechnic Institute

  43. Rainbow Engine Overhead XAT Optimizer XAT Generator XAT Decorrelator XAT Merger User XAT User XAT XAT Rewrite User XQuery XAT XAT XAT XAT Executor SQL Generator XAT View XAT View XAT XAT Cleanup View XQuery Total: 32,522 ms Ack.: XQuery using Kweelt Parser DSRG, Worcester Polytechnic Institute

  44. http://davis.wpi.edu/dsrg/rainbow https://sourceforge.net/projects/rainbow-engine/ Rainbow DSRG, Worcester Polytechnic Institute

  45. Related Work • XPERANTO[VLDBJ2000]: XQGM vs. XAT • Xquery Views over RDB, Extension by UDFs for XML features • SilkRoute[IEEE2001(24:2)]: • Xquery Views over RDB, Generate SQL Efficiently • AGORA[VLDB2000]: • Syntax level rewriting. DSRG, Worcester Polytechnic Institute

  46. Summary • Efficient XQuery Processing • XML Algebra Tree (XAT) • XAT Optimization: • Rewrite by using equivalent rules • Cleanup • Schema cleanup • Operator cutting • Prototype system implementation. DSRG, Worcester Polytechnic Institute

More Related