1 / 35

Rainbow XML-Query Processing Revisited: The In complete Story (Part II)

This article explores XAT decorrelation, optimization, computation pushdown, data model cleanup, and cutting for XML query processing. It provides examples of simple and complex decorrelation with additional sources and aggregate functions.

fmccain
Download Presentation

Rainbow XML-Query Processing Revisited: The In complete Story (Part II)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

  2. Outline • XAT Decorrelation. • Optimization • XAT Computation Pushdown. • XAT Data Model Cleanup. • XAT Cutting. • Conclusion & Future Works.

  3. XAT Decorrelation • XQuery is Correlated Query • Decorrelation is required for Optimization • XAT Computation Pushdown. • XAT Data Model Cleanup. • XAT Cutting.

  4. Three kinds of Decorrelation • Simple Decorrelation • No Additional sources • No Aggregate Functions • Complex Decorrelation with Additional Sources • Complex Decorrelation with Aggregate Functions

  5. Example* of XML Use Cases. <prices> <book> <title> TCP/IP Illustrated </title> <price>65.95</price> </book> <book> <title> TCP/IP Illustrated </title> <price>65.95</price> </book> <book> <title>Data on the Web</title> <price>34.95</price> </book> <book> <title>Data on the Web</title> <price>39.95</price> </book> </prices> <!ELEMENT prices (book*)> <!ELEMENT book (title, price)> <!ELEMENT title (#PCDATA)> <!ELEMENT source (#PCDATA)> <!ELEMENT price (#PCDATA)>

  6. Simple Query Example <results> { for $t in distinct (document("prices.xml") /book/title) return <minprice> $t </minprice> } </results> T(<results>[col1]</results>):col0 Agg() FOR($t) In the document "prices.xml", find the book title. T (<minprice>[$t] </minprice>):col1 distinct(col2):$t (R1, /book/title):col2 S(“prices.xml”):R1

  7. Simple Decorrelation Linear the Tree: T[FOR(CB, T2[])[T1[S1]]]  T[T2[T1[S1]]] T(<results>[col1]</results>):col0 T(<results>[col1]</results>):col0 Agg() Agg() FOR($t) T (<minprice>[$t] </minprice>):col1 distinct(col2):$t T (<minprice>[$t] </minprice>):col1 distinct(col2):$t (R1, /book/title):col2 (R1, /book/title):col2 S(“prices.xml”):R1 S(“prices.xml”):R1

  8. Is Simple Decorrelation Right? • Every operator, except Groupby, has the semantic of “for each” tuple in the input table. • Hence, the FOR operator can be omitted in the simple decorrelation scenario.

  9. Two types of Navigates • Navigate Unnesting: U • Unnesting the parent-children relationship, and duplicates the parent values for each child. • Navigate Collection: C • Nesting the parent-children relationship, create a collection of children, but keep the single parent.

  10. Where to use two types • Navigate Unnesting: U • FOR binding. • Navigate Collection: C • LET binding.

  11. Complex Query Example <results> { for $t in distinct (document("prices.xml") /book/title), let$b := document(“prices.xml") /book [title = $t] return <minprice> $t, $b/price </minprice> } </results> T(<results>[col1]</results>):col0 Agg() T (<minprice> [$t], [col4] </minprice>):col1 FOR($t) distinct(col2):$t c($b, price):col4 In the document "prices.xml", find the book title and its prices. (R1, /book/title):col2 (col3=$t) S(“prices.xml”):R1 c($b, title):col3 C(R2, /book):$b S(“prices.xml”):R2

  12. Complex Decorrelation with Additional Source  : T[FOR(CB, T2[S2])[T1[S1]]]  T[T2[[T1[S1],S2]]] T(<results>[col1]</results>):col0 T(<results>[col1]</results>):col0 T (<minprice> [$t], [col4] </minprice>):col1  Agg() Agg() T (<minprice> [$t], [col4] </minprice>):col1 FOR($t) c($b, price):col4 distinct(col2):$t C($b, price):col4 (col3=$t) distinct(col2):$t (R1, /book/title):col2 (col3=$t) c($b, title):col3 (R1, /book/title):col2 S(“prices.xml”):R1 S(“prices.xml”):R2 C($b, title):col3 C(R2, /book):$b S(“prices.xml”):R1 C(R2, /book):$b S(“prices.xml”):R2

  13. Full Query Example <results> { for $t in distinct (document("prices.xml") /book/title), let$b := document(“prices.xml") /book [title = $t] return <minprice> $t, <price>min($b/price/text())</price> </minprice> } </results> T(<results>[col1]</results>):col0 Agg() T (<minprice> [$t], <price>[col5]</price> </minprice>):col1 FOR($t) distinct(col2):$t min(col4):col5 (R1, /book/title):col2 c($b, price/text()):col4 S(“prices.xml”):R1 c($b, title):col3 In the document "prices.xml", find the minimum price for each book, in the form of a "minprice" element. (col3=$t) C(R2, /book):$b S(“prices.xml”):R2

  14. Complex Query Decorrelation with one Aggregation Function T[FOR(CB, T2[Agg(T3[])])[T1[S1]]]  T[(DM(T1))[T1,T2[(DM(T1),Agg(T3[[Distinct(T1[S1]), S2]))]]] DM(T1) is data model computed from T1. T T T2  FOR($rate) Groupby(DM(T1), Agg()) T2 T1 T3 Agg() S1  T1 T3 S2 Distinct S2 S1

  15. The Query after Decorrelation T (<minprice> [$t], <price>[col5]</price> </minprice>):col1 T(<results>[col1]</results>):col0 T (<minprice> [$t], [col4] </minprice>):col1 T(<results>[col1]</results>):col0 Agg() Agg() GB(DM, min(col4):col5) min(col4):col5  FOR($t) C($b, price/text()):col4 c($b, price/text()):col4 (col3=$t) distinct(col2):$t (col3=$t) C($b, title):col3 (R1, /book/title):col2 c($b, title):col3 distinct(col2):$t C(R2, /book):$b S(“prices.xml”):R1 C(R2, /book):$b  (R1, /book/title):col2 S(“prices.xml”):R2 S(“prices.xml”):R1 S(“prices.xml”):R2

  16. Where are we? • XAT Decorrelation. • Optimization • XAT Computation Pushdown. • XAT Data Model Cleanup. • XAT Cutting. • Conclusion & Future Works.

  17. XAT Computation Pushdown • To push the execution into relational database • Steps: • Push Navigation down. • Cancel out Navigation and Tagger. • Generating SQL stmt.

  18. Navigation Pushdown • Basically Navigation can push through all the operators until: • Has dependency on its child operator. • Example Rewriting rules: • (x1, path):x2[(y1, path):y2[T]]  (y1, path):y2[(x1, path):x2[T]] (x1 != y2) • (x1, path):x2[(c) [T]]  (c) [(x1, path):x2[T]] • (x1, path):x2[[T1, T2]]  [T1, (x1, path):x2[T2]] (if x1 in DM(T2)) • (x1, path):x2[[T1, T2]]  [(x1, path):x2[T1], T2] (if x1 in DM(T1))

  19. Navigation Pushdown Example T(<results>[col1]</results>):col0 T(<results>[col1]</results>):col0 T (<minprice> [$t], [col4] </minprice>):col1 T (<minprice> [$t], [col4] </minprice>):col1 Agg() Agg() GB(DM, min(col4):col5) GB(DM, min(col4):col5)  C($b, price/text()):col4  (col3=$t) (col3=$t)  C($b, title):col3 C($b, price/text()):col4 distinct(col2):$t C(R2, /book):$b distinct(col2):$t C($b, title):col3  (R1, /book/title):col2 (R1, /book/title):col2 C(R2, /book):$b S(“prices.xml”):R1 S(“prices.xml”):R2 S(“prices.xml”):R1 S(“prices.xml”):R2

  20. Navigation/Tagger Cancel Out • Used to simplify a composite XAT tree. • Transformation Rules: • (x, /):y[T(<tag>[z]</tag>):x[s]]  s • Note: Also use type analysis for the cancel out.

  21. View Query Example <prices> { for $row in distinct (DXV /book/row), return <book> $row/title, $row/price </book> } </prices> <DB> <book> <row> <title> TCP/IP Illustrated </title> <price>65.95</price> </row> <row> <title> TCP/IP Illustrated </title> <price>65.95</price> </row> <row> <title>Data on the Web</title> <price>34.95</price> </row> <row> <title>Data on the Web</title> <price>39.95</price> </row> </book> </prices> T(<prices>[col6]</prices>):col5 Agg() T(<book>[col7],[col8]</book>):col6 ($row, title):col7 ($row, price):col8 (R3, /book/row):$row S(DXV):R3

  22. Cancel Out Example (1) (x, y)[op():x[s]]  op():y[s] ... ... T(<prices>[col6]</prices>):R2 T(<prices>[col6]</prices>):col5 C($b, price/text()):col4 Agg() C($b, price/text()):col4 Agg() C($b, title):col3 T(<book>[col7],[col8]</book>):col6 C($b, title):col3 T(<book>[col7],[col8]</book>):col6 C(R2, /book):$b ($row, title):col7 C(R2, /book):$b ($row, title):col7 ($row, price):col8 S(“prices.xml”):R2 ($row, price):col8 (R3, /book/row):$row (R3, /book/row):$row S(DXV):R3 S(DXV):R3

  23. Cancel Out Example (2) ... ... T(<prices>[col6]</prices>):R2 C($b, price/text()):col4 C($b, price/text()):col4 Agg() C($b, title):col3 T(<book>[col7],[col8]</book>):$b C($b, title):col3 T(<book>[col7],[col8]</book>):col6 ($row, title):col7 C(R2, /book):$b ($row, title):col7 ($row, price):col8 ($row, price):col8 (R3, /book/row):$row (R3, /book/row):$row S(DXV):R3 S(DXV):R3

  24. Cancel Out Example (3) ... ... C($b, price/text()):col4 C($b, price/text()):col4 C($b, title):col3 T(<book>[col7],[col8]</book>):$b T(<book>[col7],[col8]</book>):$b ($row, title):col7 ($row, title):col3 ($row, price):col8 ($row, price):col8 (R3, /book/row):$row (R3, /book/row):$row S(DXV):R3 S(DXV):R3

  25. Cancel Out Example (4) ... ... C(temp1, text()):col4 C(temp1, text()):col4 T(<book>[col7],[col8]</book>):$b C($b, price):temp1 ($row, title):col3 ($row, title):col3 ($row, price):col8 ($row, price):temp1 (R3, /book/row):$row (R3, /book/row):$row S(DXV):R3 S(DXV):R3

  26. SQL Generation • Find a pattern in the XAT • Translate that pattern into a SQL operator that will access the relational database.

  27. SQL Generation Example ... ... C(temp1, text()):col4 C(temp1, text()):col4 ($row, title):col3 SQL( select title as col3, price as temp1 from book):{col3,temp} ($row, price):temp1 (R3, /book/row):$row S(DXV):R3

  28. Where are we? • XAT Decorrelation. • Optimization • XAT Computation Pushdown. • XAT Data Model Cleanup. • XAT Cutting. • Conclusion & Future Works.

  29. XAT Data Model Cleanup • By Default Each operator will append one additional columns to the data model. • Used to Help: • Execute: used to optimize the data storage during the execution • Cutting: get rid of the un-used operators in the XQuery • Equations for Data Model Cleanup • Only keep the columns required by ancestors. • DM := (DMp – Pp)  Cp  (P – C)

  30. DM := (DMp – Pp)  Cp  (P – C) Data Model Example for $b in document("prices.xml") /book let$prices := $b/price return $b 1 Agg() 2 ($b,):col1 3 C($b, price):$prices 4 (R1, /book):$b 5 S(“prices.xml”):R1

  31. Where are we? • XAT Decorrelation. • Optimization • XAT Computation Pushdown. • XAT Data Model Cleanup. • XAT Cutting. • Conclusion & Future Works.

  32. XAT Cutting • General Idea: • Get rid of the operators that’s produce useless data. • Equations: • R := (Rp – P)  C • (P  M)  (Rp  Mp) = NULL

  33. R := (Rp – P)  C (P  M)  (Rp  Mp)= NULL XAT Cutting Example for $b in document("prices.xml") /book let$prices := $b/price return $b 1 Agg() 2 ($b,):col1 3 C($b, price):$prices 4 (R1, /book):$b 5 S(“prices.xml”):R1

  34. Conclusions • XQuery are heavily correlated, hence need to be decorrelated for better optimization. • After Decorrelation, more optimization techniques can be applied: • Computation Pushdown. • Data Model Cleanup. • Cutting.

  35. Future Works • Write TR to formalize the XAT. • Compare with ORDB, ODB, also XQA operators. • Wrap Up: • Finalize uncertain operators deal with collections • Union, Navigate • Formalize the Pushdown Rewriting Rules by Type (Reg. Exp. Type) Analysis • Finalize the XAT Rewriting Rules for: • Order Handling • Update propagation. • Translation from XAT back to Query • Next Step: • Generate Search Space and Optimization Algorithm for XAT, ready for Schema Generation.

More Related