Rainbow XML-Query Processing Revisited: The In complete Story (Part II)

Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

Outline • XAT Decorrelation. • Optimization • XAT Computation Pushdown. • XAT Data Model Cleanup. • XAT Cutting. • Conclusion & Future Works.

XAT Decorrelation • XQuery is Correlated Query • Decorrelation is required for Optimization • XAT Computation Pushdown. • XAT Data Model Cleanup. • XAT Cutting.

Three kinds of Decorrelation • Simple Decorrelation • No Additional sources • No Aggregate Functions • Complex Decorrelation with Additional Sources • Complex Decorrelation with Aggregate Functions

Example* of XML Use Cases. <prices> <book> <title> TCP/IP Illustrated </title> <price>65.95</price> </book> <book> <title> TCP/IP Illustrated </title> <price>65.95</price> </book> <book> <title>Data on the Web</title> <price>34.95</price> </book> <book> <title>Data on the Web</title> <price>39.95</price> </book> </prices> <!ELEMENT prices (book*)> <!ELEMENT book (title, price)> <!ELEMENT title (#PCDATA)> <!ELEMENT source (#PCDATA)> <!ELEMENT price (#PCDATA)>

Simple Query Example <results> { for $t in distinct (document("prices.xml") /book/title) return <minprice> $t </minprice> } </results> T(<results>[col1]</results>):col0 Agg() FOR($t) In the document "prices.xml", find the book title. T (<minprice>[$t] </minprice>):col1 distinct(col2):$t (R1, /book/title):col2 S(“prices.xml”):R1

Simple Decorrelation Linear the Tree: T[FOR(CB, T2[])[T1[S1]]]  T[T2[T1[S1]]] T(<results>[col1]</results>):col0 T(<results>[col1]</results>):col0 Agg() Agg() FOR($t) T (<minprice>[$t] </minprice>):col1 distinct(col2):$t T (<minprice>[$t] </minprice>):col1 distinct(col2):$t (R1, /book/title):col2 (R1, /book/title):col2 S(“prices.xml”):R1 S(“prices.xml”):R1

Is Simple Decorrelation Right? • Every operator, except Groupby, has the semantic of “for each” tuple in the input table. • Hence, the FOR operator can be omitted in the simple decorrelation scenario.

Two types of Navigates • Navigate Unnesting: U • Unnesting the parent-children relationship, and duplicates the parent values for each child. • Navigate Collection: C • Nesting the parent-children relationship, create a collection of children, but keep the single parent.

Where to use two types • Navigate Unnesting: U • FOR binding. • Navigate Collection: C • LET binding.

Complex Query Example <results> { for $t in distinct (document("prices.xml") /book/title), let$b := document(“prices.xml") /book [title = $t] return <minprice> $t, $b/price </minprice> } </results> T(<results>[col1]</results>):col0 Agg() T (<minprice> [$t], [col4] </minprice>):col1 FOR($t) distinct(col2):$t c($b, price):col4 In the document "prices.xml", find the book title and its prices. (R1, /book/title):col2 (col3=$t) S(“prices.xml”):R1 c($b, title):col3 C(R2, /book):$b S(“prices.xml”):R2

Complex Decorrelation with Additional Source  : T[FOR(CB, T2[S2])[T1[S1]]]  T[T2[[T1[S1],S2]]] T(<results>[col1]</results>):col0 T(<results>[col1]</results>):col0 T (<minprice> [$t], [col4] </minprice>):col1  Agg() Agg() T (<minprice> [$t], [col4] </minprice>):col1 FOR($t) c($b, price):col4 distinct(col2):$t C($b, price):col4 (col3=$t) distinct(col2):$t (R1, /book/title):col2 (col3=$t) c($b, title):col3 (R1, /book/title):col2 S(“prices.xml”):R1 S(“prices.xml”):R2 C($b, title):col3 C(R2, /book):$b S(“prices.xml”):R1 C(R2, /book):$b S(“prices.xml”):R2

Full Query Example <results> { for $t in distinct (document("prices.xml") /book/title), let$b := document(“prices.xml") /book [title = $t] return <minprice> $t, <price>min($b/price/text())</price> </minprice> } </results> T(<results>[col1]</results>):col0 Agg() T (<minprice> [$t], <price>[col5]</price> </minprice>):col1 FOR($t) distinct(col2):$t min(col4):col5 (R1, /book/title):col2 c($b, price/text()):col4 S(“prices.xml”):R1 c($b, title):col3 In the document "prices.xml", find the minimum price for each book, in the form of a "minprice" element. (col3=$t) C(R2, /book):$b S(“prices.xml”):R2

Complex Query Decorrelation with one Aggregation Function T[FOR(CB, T2[Agg(T3[])])[T1[S1]]]  T[(DM(T1))[T1,T2[(DM(T1),Agg(T3[[Distinct(T1[S1]), S2]))]]] DM(T1) is data model computed from T1. T T T2  FOR($rate) Groupby(DM(T1), Agg()) T2 T1 T3 Agg() S1  T1 T3 S2 Distinct S2 S1

The Query after Decorrelation T (<minprice> [$t], <price>[col5]</price> </minprice>):col1 T(<results>[col1]</results>):col0 T (<minprice> [$t], [col4] </minprice>):col1 T(<results>[col1]</results>):col0 Agg() Agg() GB(DM, min(col4):col5) min(col4):col5  FOR($t) C($b, price/text()):col4 c($b, price/text()):col4 (col3=$t) distinct(col2):$t (col3=$t) C($b, title):col3 (R1, /book/title):col2 c($b, title):col3 distinct(col2):$t C(R2, /book):$b S(“prices.xml”):R1 C(R2, /book):$b  (R1, /book/title):col2 S(“prices.xml”):R2 S(“prices.xml”):R1 S(“prices.xml”):R2

Where are we? • XAT Decorrelation. • Optimization • XAT Computation Pushdown. • XAT Data Model Cleanup. • XAT Cutting. • Conclusion & Future Works.

XAT Computation Pushdown • To push the execution into relational database • Steps: • Push Navigation down. • Cancel out Navigation and Tagger. • Generating SQL stmt.

Navigation Pushdown • Basically Navigation can push through all the operators until: • Has dependency on its child operator. • Example Rewriting rules: • (x1, path):x2[(y1, path):y2[T]]  (y1, path):y2[(x1, path):x2[T]] (x1 != y2) • (x1, path):x2[(c) [T]]  (c) [(x1, path):x2[T]] • (x1, path):x2[[T1, T2]]  [T1, (x1, path):x2[T2]] (if x1 in DM(T2)) • (x1, path):x2[[T1, T2]]  [(x1, path):x2[T1], T2] (if x1 in DM(T1))

Navigation Pushdown Example T(<results>[col1]</results>):col0 T(<results>[col1]</results>):col0 T (<minprice> [$t], [col4] </minprice>):col1 T (<minprice> [$t], [col4] </minprice>):col1 Agg() Agg() GB(DM, min(col4):col5) GB(DM, min(col4):col5)  C($b, price/text()):col4  (col3=$t) (col3=$t)  C($b, title):col3 C($b, price/text()):col4 distinct(col2):$t C(R2, /book):$b distinct(col2):$t C($b, title):col3  (R1, /book/title):col2 (R1, /book/title):col2 C(R2, /book):$b S(“prices.xml”):R1 S(“prices.xml”):R2 S(“prices.xml”):R1 S(“prices.xml”):R2

Navigation/Tagger Cancel Out • Used to simplify a composite XAT tree. • Transformation Rules: • (x, /):y[T(<tag>[z]</tag>):x[s]]  s • Note: Also use type analysis for the cancel out.

View Query Example <prices> { for $row in distinct (DXV /book/row), return <book> $row/title, $row/price </book> } </prices> <DB> <book> <row> <title> TCP/IP Illustrated </title> <price>65.95</price> </row> <row> <title> TCP/IP Illustrated </title> <price>65.95</price> </row> <row> <title>Data on the Web</title> <price>34.95</price> </row> <row> <title>Data on the Web</title> <price>39.95</price> </row> </book> </prices> T(<prices>[col6]</prices>):col5 Agg() T(<book>[col7],[col8]</book>):col6 ($row, title):col7 ($row, price):col8 (R3, /book/row):$row S(DXV):R3

Cancel Out Example (1) (x, y)[op():x[s]]  op():y[s] ... ... T(<prices>[col6]</prices>):R2 T(<prices>[col6]</prices>):col5 C($b, price/text()):col4 Agg() C($b, price/text()):col4 Agg() C($b, title):col3 T(<book>[col7],[col8]</book>):col6 C($b, title):col3 T(<book>[col7],[col8]</book>):col6 C(R2, /book):$b ($row, title):col7 C(R2, /book):$b ($row, title):col7 ($row, price):col8 S(“prices.xml”):R2 ($row, price):col8 (R3, /book/row):$row (R3, /book/row):$row S(DXV):R3 S(DXV):R3

Cancel Out Example (2) ... ... T(<prices>[col6]</prices>):R2 C($b, price/text()):col4 C($b, price/text()):col4 Agg() C($b, title):col3 T(<book>[col7],[col8]</book>):$b C($b, title):col3 T(<book>[col7],[col8]</book>):col6 ($row, title):col7 C(R2, /book):$b ($row, title):col7 ($row, price):col8 ($row, price):col8 (R3, /book/row):$row (R3, /book/row):$row S(DXV):R3 S(DXV):R3

Cancel Out Example (3) ... ... C($b, price/text()):col4 C($b, price/text()):col4 C($b, title):col3 T(<book>[col7],[col8]</book>):$b T(<book>[col7],[col8]</book>):$b ($row, title):col7 ($row, title):col3 ($row, price):col8 ($row, price):col8 (R3, /book/row):$row (R3, /book/row):$row S(DXV):R3 S(DXV):R3

Cancel Out Example (4) ... ... C(temp1, text()):col4 C(temp1, text()):col4 T(<book>[col7],[col8]</book>):$b C($b, price):temp1 ($row, title):col3 ($row, title):col3 ($row, price):col8 ($row, price):temp1 (R3, /book/row):$row (R3, /book/row):$row S(DXV):R3 S(DXV):R3

SQL Generation • Find a pattern in the XAT • Translate that pattern into a SQL operator that will access the relational database.

SQL Generation Example ... ... C(temp1, text()):col4 C(temp1, text()):col4 ($row, title):col3 SQL( select title as col3, price as temp1 from book):{col3,temp} ($row, price):temp1 (R3, /book/row):$row S(DXV):R3

XAT Data Model Cleanup • By Default Each operator will append one additional columns to the data model. • Used to Help: • Execute: used to optimize the data storage during the execution • Cutting: get rid of the un-used operators in the XQuery • Equations for Data Model Cleanup • Only keep the columns required by ancestors. • DM := (DMp – Pp)  Cp  (P – C)

DM := (DMp – Pp)  Cp  (P – C) Data Model Example for $b in document("prices.xml") /book let$prices := $b/price return $b 1 Agg() 2 ($b,):col1 3 C($b, price):$prices 4 (R1, /book):$b 5 S(“prices.xml”):R1

XAT Cutting • General Idea: • Get rid of the operators that’s produce useless data. • Equations: • R := (Rp – P)  C • (P  M)  (Rp  Mp) = NULL

R := (Rp – P)  C (P  M)  (Rp  Mp)= NULL XAT Cutting Example for $b in document("prices.xml") /book let$prices := $b/price return $b 1 Agg() 2 ($b,):col1 3 C($b, price):$prices 4 (R1, /book):$b 5 S(“prices.xml”):R1

Conclusions • XQuery are heavily correlated, hence need to be decorrelated for better optimization. • After Decorrelation, more optimization techniques can be applied: • Computation Pushdown. • Data Model Cleanup. • Cutting.

Future Works • Write TR to formalize the XAT. • Compare with ORDB, ODB, also XQA operators. • Wrap Up: • Finalize uncertain operators deal with collections • Union, Navigate • Formalize the Pushdown Rewriting Rules by Type (Reg. Exp. Type) Analysis • Finalize the XAT Rewriting Rules for: • Order Handling • Update propagation. • Translation from XAT back to Query • Next Step: • Generate Search Space and Optimization Algorithm for XAT, ready for Schema Generation.

Rainbow XML-Query Processing Revisited: The In complete Story (Part II)