490 likes | 616 Views
XQuery to SQL by XAT. Xin Zhang Thanks: Brian, Mukesh, Maged, Lily, Elke. Outline. Merged algebra proposed based on Niagara XPERANTO One thorough example of XQuery SQL. Data Model. An Ordered Table in two dimensions Tuple order Column order. Every cell has its own domain
E N D
XQuery to SQL by XAT Xin Zhang Thanks: Brian, Mukesh, Maged, Lily, Elke
Outline • Merged algebra proposed based on • Niagara • XPERANTO • One thorough example of • XQuery SQL
Data Model • An Ordered Table in two dimensions • Tuple order • Column order. • Every cell has its own domain • Every column binds to one variable. • The domain can be: • SQL domains. • XML Fragment. • Can be a list of XML elements. • Comparison are done by values
Data Model Examples • Table of XML Fragments. • Explicit Naming • E.g. variable bindings • Implicit Naming • E.g. XPath notations. • Reduce complexity of many internal variables. <carrier> $carrier </carrier <carrier> $carrier </carrier <carrier> $carrier </carrier> ……….
Naming of Columns • Implicit: • SQL operators • Navigate • Explicit ( “name”): • Variable binding: • Holding a set of values. • Variable name ($name) is name of a column • Rename • Distinguish in one operator where, • same “names” from different sources • Abbreviate a very long “name”. • Create a new name for creation operators • Need to used with those operators. • E.g. Tagger
Operators • SQL like (9): • Project, Select, Join (Theta, Outer, Semi), Groupby, Orderby, Union (Node, Outer), COp. • XML like (4): • Tagger, Navigate, is(Element, Text), Aggregate. • Special: • SQL, Function, Source, Name, FOR
Operator Specification • Description • Input Specification. • Output Specification. • Logic description. • Illustrative Example
Naming Operator • Syntax: • Name(“from_name”, “to_name”) • Simplified Syntax: • to_name := from_name
Steps in Translation • XQuery XML Algebra Tree • User View XML Algebra Tree • View Composition • Computation Pushdown • Optimization
Example of Telephone Bill • <?xml version=”1.0” encoding=”US-ASCII” ?> • <!DOCTYPE invoice [ • <!ELEMENT invoice (account_number, • bill_period, • carrier+, • itemized_call*, • total)> • <!ELEMENT account_number (#PCDATA)> • <!ELEMENT bill_period (#PCDATA)> • <!ELEMENT carrier (#PCDATA)> • <!ELEMENT itemized_call EMPTY> • <!ATTLIST itemized_call • no ID #REQUIRED • date CDATA #REQUIRED • number_called CDATA #REQUIRED • time CDATA #REQUIRED • rate (NIGHT|DAY) #REQUIRED • min CDATA #REQUIRED • amount CDATA #REQUIRED> • <!ELEMENT total (#PCDATA)> • ]> • <invoice> • <account_number>555 777-3158 573 234 3</account_number> • <bill_period>Jun 9 - Jul 8, 2000</bill_period> • <carrier>Sprint</carrier> • <itemized_callno=”1” date=”JUN 10” number_called=”973 555-8888” time=”10:17pm” rate=”NIGHT” min=”1” amount=”0.05” /> • <itemized_callno=”2” date=”JUN 13” number_called=”973 650-2222” time=”10:19pm” rate=”DAY” min=”1” amount=”0.15” /> • <itemized_callno=”3” date=”JUN 15” number_called=”206 365-9999” time=”10:25pm” rate=”NIGHT” min=”3” amount=”0.15” /> • <total>$0.35</total> • </invoice>
Example XQuery User XQuery: <summary> { FOR $rate IN distinct(document(“invoice”)/invoice/itemized_call@rate) LET $itemized_call := document(“invoice”)/invoice/itemized_call[@rate=$rate] WHERE $itemized_call/@number_called LIKE ‘973%’ RETURN <rate>$rate</rate> <number_of_calls>count($itemized_call)</number_of_calls> } </summary> Count number of itemized_calls in calling area 973 grouped by the calling rate.
XQuery XML Algebra Tree • Divide into query blocks • Convert each query block into XML Algebra Tree (XAT). • Identify Correlated Operators • Combine into one XML Algebra Tree. • Query decorrelation
The block identification is arbitrary (wrong). Query Blocks B1 User XQuery: <summary> { FOR $rate IN distinct(document(“invoice”)/invoice/itemized_call@rate) LET $itemized_call := document(“invoice”)/invoice/itemized_call[@rate=$rate] WHERE $itemized_call/@number_called LIKE ‘973%’ RETURN <rate>$rate</rate> <number_of_calls>count($itemized_call)</number_of_calls> } </summary> B2 B3 B1: Construct summary from the result from B2 B2: Get all the distinct rate and iterate through it. B3: Count itemized call for a given rate.
XAT of B1 XAT: B1 Name(“Tagger(<summary>[V1]</summary>)”, “V2”) <summary> </summary> B2 Tagger(<summary> [V1]</summary>) B2 [V2] it is a name instead of a part of pattern.
XAT of B2 Aggregate XAT: • B3 { FOR $rate IN distinct(document(“invoice”)/invoice/itemized_call@rate) • } FOR($rate) B3 Name(“distinct(invoice/itemized_call/@rate:/)”, “$rate”) B3 Select(distinct(“invoice/itemized_call/@rate:/”)) Navigate(“/”, invoice/itemized_call/@rate) Source(“invoice.xml”)
XAT of B3 XAT: • B4 LET $itemized_call := document(“invoice”) /invoice/itemized_call [@rate=$rate] WHERE $itemized_call /@number_called LIKE ‘973%’ RETURN <rate>$rate</rate> <number_of_calls> count($itemized_call) </number_of_calls> Select(“@rate:$itemized_call” = “$rate”) B2 Navigate(“$itemized_call”, @rate) Name(“invoice/itemized_call:/”, “$itemized_call”) Navigate(“/”, invoice/itemized_call) Source(“invoice.xml”)
XAT of B3 (Cont.) XAT: • B4 LET $itemized_call := document(“invoice”) /invoice/itemized_call [@rate=$rate] WHERE $itemized_call /@number_called LIKE ‘973%’ RETURN <rate>$rate</rate> <number_of_calls> count($itemized_call) </number_of_calls> Select(“@number_called:$itemized_call” like ‘973%’) Navigate(“$itemized_call”, @number_called)
XAT of B3 (Cont.) XAT: • B4 LET $itemized_call := document(“invoice”) /invoice/itemized_call [@rate=$rate] WHERE $itemized_call /@number_called LIKE ‘973%’ RETURN <rate>$rate</rate> <number_of_calls> count($itemized_call) </number_of_calls> Name(“Tagger(<rate>[$rate]</rate> <number_of_calls> [count($itemized_call)]</number_of_calls>)”, “V1”) Tagger(<rate>[$rate]</rate> <number_of_calls> [count($itemized_call)] </number_of_calls>) B2 Select(count(“$itemized_call”))
B1 B3 Put it Together Name(“Tagger(<summary>[V1]</summary>)”, “V2”) Tagger(<summary>[V1]</summary>) Name(“Tagger(<rate>[$rate]</rate> <number_of_calls> [count($itemized_call)]</number_of_calls>)”, “V1”) Tagger(<rate>[$rate]</rate><number_of_calls> [count($itemized_call)]</number_of_calls>) Aggregate() Select(count(“$itemized_call”)) FOR($rate) Select(“@number_called:$itemized_call” like ‘973%’) Name(“distinct(invoice/itemized_call/@rate:/)”, “$rate”) Navigate(“$itemized_call”, @number_called) Select(distinct(“invoice/itemized_call/@rate:/”)) Select(“@rate:$itemized_call” = “$rate”) Navigate(“$itemized_call”, @rate) Navigate(“/”, invoice/itemized_call/@rate) Name(“invoice/itemized_call:/”, “$itemized_call”) Source(“invoice.xml”) Navigate(“/”, invoice/itemized_call) B2 Source(“invoice.xml”)
Syntax Suger B3 B1 V1:=Tagger(<rate>[$rate]</rate><number_of_calls> [count($itemized_call)]</number_of_calls>) V2 := Tagger(<summary>[V1]</summary>) Select(count(“$itemized_call”)) Aggregate() Select(“@number_called:$itemized_call” like ‘973%’) FOR($rate) Navigate(“$itemized_call”, @number_called) $rate := Select(distinct(“invoice/itemized_call/@rate:/”)) Select(“@rate:$itemized_call” = “$rate”) Navigate(“/”, invoice/itemized_call/@rate) Navigate(“$itemized_call”, @rate) Source(“invoice.xml”) $itemized_call := Navigate(“/”, invoice/itemized_call) Source(“invoice.xml”) B2
Query Decorrelation for COp • Top-down approach over XAT Tree. • Approach: • Correlated Binding (CB) • Op1[COp(CB, Op2)[Op3[Correlated Operator[A],B]]] Op1[ROJ(CB)[Op2[Groupby(CB, Op3[]) [Operator[Cartesian[A,B]]]], B]] • For example: • Correlated Join Outer Join with Groupby with Cartesian
Query Decorrelation for FOR • Top-down approach over XAT Tree. • Approach: • Correlated Binding (CB) • Op1[FOR(CB)[Op2[Correlated Operator[A],B]]] Op1[Groupby(CB, Op2[]) [Operator[Cartesian[A,B]]]] • Differences: • SQL Decorrelation: Return Outer Query • XQuery Decorrelation: Return Inner Query • CO: Return both Outer/Inner Query
FOR Decorrelation Example B1 B1 B2 B2 Aggregate Aggregate FOR($rate) …1 Groupby(“$ratel”, ) …2 Source(“invoice.xml”) …1 …2 B3 Select(“@rate:$itemized_call” = “$rate”) Source(“invoice.xml”) Select(“@rate:$itemized_call” = “$rate”) Cartesian …3 …3 B3 Source(“invoice.xml”) Source(“invoice.xml”)
Default XML View <invoice> <row> <id> 1 </id> <account_number>555 777-3158 573 234 3</account_number> <bill_period> Jun 9 – Jun 8, 2000 </bill_period> <total>$0.35</total> </row> </invoice> <carrier> <row> <invoice_id> 1 </invoice_id> <carrier>Sprint</carrier> </row> </carrier> ... invoice carrier itemized_call
User Defined XML View <invoice> <row> <id> 1 </id> <account_number>555 777-3158 573 234 3</account_number> <bill_period> Jun 9 – Jun 8, 2000 </bill_period> <total>$0.35</total> </row> </invoice> <carrier> <row> <invoice_id> 1 </invoice_id> <carrier>Sprint</carrier> </row> </carrier> <itemized_call> <row> <invoice_id>1</invoice_id> <no>1</no> … </row> … </itemized_call> • <invoice> • <account_number>555 777-3158 573 234 3</account_number> • <bill_period>Jun 9 - Jul 8, 2000</bill_period> • <carrier>Sprint</carrier> • <itemized_callno=”1” date=”JUN 10” number_called=”973 555-8888” time=”10:17pm” rate=”NIGHT” min=”1” amount=”0.05” /> • <itemized_callno=”2” date=”JUN 13” number_called=”973 650-2222” time=”10:19pm” rate=”DAY” min=”1” amount=”0.15” /> • <itemized_callno=”3” date=”JUN 15” number_called=”206 365-9999” time=”10:25pm” rate=”NIGHT” min=”3” amount=”0.15” /> • <total>$0.35</total> • </invoice>
User Defined XML View Cont. Create view invoice as ( FOR $invoice IN view(“default”)/invoice/row RETURN <invoice> <account_number>$invoice/account_number/text()</account_number> <bill_period>$invoice/bill_period/text()</bill_period> FOR $carrier in view(“default”)/carrier/row WHERE $carrier/invoice_id = $invoice/id RETURN <carrier>$carrier/carrier/text()</carrier> FOR $itemized_call in view(“default”)/itemized_call/row WHERE $itemized_call/invoice_id = $invoice/id RETURN <itemized_call no=$itemized_call/no/text() date=$itemized_call/date/text() number_called=$itemized_call/number_called/text() time=$itemized_call/time/text() rate=$itemized_call/rate/text() min=$itemized_call/min/text() amount=$itemized_call/amount /text()> SORTBY (@no) <total>$invoice/total/text()</total> </invoice> )
User Defined XML View Block Create view invoice as ( FOR $invoice IN view(“default”)/invoice/row RETURN <invoice> <account_number>$invoice/account_number/text()</account_number> <bill_period>$invoice/bill_period/text()</bill_period> FOR $carrier in view(“default”)/carrier/row WHERE $carrier/invoice_id = $invoice/id RETURN <carrier>$carrier/carrier/text()</carrier> FOR $itemized_call in view(“default”)/itemized_call/row WHERE $itemized_call/invoice_id = $invoice/id RETURN <itemized_call no=$itemized_call/no/text() date=$itemized_call/date/text() number_called=$itemized_call/number_called/text() time=$itemized_call/time/text() rate=$itemized_call/rate/text() min=$itemized_call/min/text() amount=$itemized_call/amount /text()> SORTBY (@no) <total>$invoice/total/text()</total> </invoice> ) B4 B5 B6
XML View XAT Aggregate() Aggregate() V3 := Tagger(<itemized_call no=[] date=[]number_called=[] time=[] rate=[] min=[] amount=[]/> V4 := Tagger(<invoice> <account_number>[$invoice/account_number/text()]</account_number> <bill_period>[$invoice/bill_period/text()</bill_period> …[V3]<total>[$invoice/total/text()]</total></invoice>) Navigate($itemized_call, no/text()) … Navigate($itemized_call, amount/text()) FOR($invoice/id) Select(“$itemized_call/invoice_id”=“$invoice/id”) Navigate(“$invoice”, id) Navigate($itemized_call, invoice_id) $invoice := Navigate(“/”,invoice/row ) B5 $itemized_call := Navigate(“/”, itemized_call/row) Source(“default..xml”) Source(“default.xml”)
3-Way Correlation B4 …1 FOR($invoice/id) …2 Source(“invoice.xml”) B6 B5
3-Way Decorrelation B4 …1 JOIN($invoice/id) GB($invoice/id, …) GB($invoice/id, …) B6 with Cartesian B5 with Cartesian …2 …2 Source(“default.xml”) Source(“default.xml”)
View XAT After Decorrelation Groupby($invoice/id, Aggregate()) Aggregate() V3 := Tagger(<itemized_call no=[] date=[]number_called=[] time=[] rate=[] min=[] amount=[]/> V4 := Tagger(<invoice> <account_number>[$invoice/account_number/text()]</account_number> <bill_period>[$invoice/bill_period/text()</bill_period> …[V3]<total>[$invoice/total/text()]</total></invoice>) Navigate($itemized_call, no/text()) … Join($invoice/id) Navigate($itemized_call, amount/text()) Join(“$itemized_call/invoice_id”=“$invoice/id”) Groupby($invoice/id…) Navigate(“$invoice”, id) B5 Navigate($itemized_call, invoice_id) Navigate(“$invoice”, id) $itemized_call := Navigate(“/”, itemized_call/row) $invoice := Navigate(“/”,invoice/row ) $invoice := Navigate(“/”,invoice/row ) Source(“default.xml”) Source(“default..xml”) Source(“default..xml”)
View Composition • Input: • User Query XAT + User View XAT • Output: • Simplified composite XAT • Approach: • XAT Cutting: Remove un-referenced columns and operators. • Pushdown Navigation • By using the commutative rules • Cancel out the navigation operators • By using the composition rules
XAT Cutting • Cut Query Blocks • User query only require itemized_call. • B5 is cut, • Invoice is cut • B4 is simplified. • B6 is simplified. • Cut Columns • User query only used itemized_call@rate.
View XAT After B5 is Cut. Groupby($invoice/id, Aggregate()) Aggregate() V3 := Tagger(<itemized_call no=[] date=[]number_called=[] time=[] rate=[] min=[] amount=[]/> V4 := Tagger(<invoice> <account_number>[$invoice/account_number/text()]</account_number> <bill_period>[$invoice/bill_period/text()</bill_period[V3] <total>[$invoice/total/text()]</total></invoice>) Navigate($itemized_call, no/text()) … Navigate($itemized_call, amount/text()) Join(“$itemized_call/invoice_id”=“$invoice/id”) Navigate($itemized_call, invoice_id) Navigate(“$invoice”, id) $itemized_call := Navigate(“/”, itemized_call/row) $invoice := Navigate(“/”,invoice/row ) Source(“default.xml”) Source(“default..xml”)
View After Columns are Cut. Aggregate() V4 := Tagger(<invoice>[V3] </invoice>) Groupby($invoice/id, Aggregate()) V3 := Tagger(<itemized_call number_called=[] rate=[] /> Navigate($itemized_call, number_called/text()) Navigate($itemized_call, rate/text()) Join(“$itemized_call/invoice_id”=“$invoice/id”) Navigate($itemized_call, invoice_id) Navigate(“$invoice”, id) $itemized_call := Navigate(“/”, itemized_call/row) $invoice := Navigate(“/”,invoice/row ) Source(“default.xml”) Source(“default..xml”)
Navigation Cancel Out • Navigation Pushdown • Based on some transformation rules. • E.g. commutative of navigation and other operators. • Navigation + Tagger Cancel Out • Composition Rules. • The cancellation result is “renaming”
Query XAT Navi. Pushdown B3 V1:=Tagger(<rate>[$rate]</rate><number_of_calls> [count($itemized_call)]</number_of_calls>) V1:=Tagger(<rate>[$rate]</rate><number_of_calls> [count($itemized_call)]</number_of_calls>) Select(count(“$itemized_call”)) Select(count(“$itemized_call”)) Select(“@number_called:$itemized_call” like ‘973%’) Select(“@number_called:$itemized_call” like ‘973%’) Navigate(“$itemized_call”, @number_called) Select(“@rate:$itemized_call” = “$rate”) Select(“@rate:$itemized_call” = “$rate”) Navigate(“$itemized_call”, @number_called) Navigate(“$itemized_call”, @rate) Navigate(“$itemized_call”, @rate) $itemized_call := Navigate(“/”, invoice/itemized_call) $itemized_call := Navigate(“/”, invoice/itemized_call) Source(“invoice.xml”) Source(“invoice.xml”)
Navi. Tagger Cancel Out B3 …1 Aggregate() Navigate(“$itemized_call”, @number_called) V4 := Tagger(<invoice>[V3] </invoice>) Navigate(“$itemized_call”, @rate) Groupby($invoice/id, Aggregate()) $itemized_call := Navigate(“/”, invoice/itemized_call) V3 := Tagger(<itemized_call number_called=[] rate=[] /> Source(“invoice.xml”) Navigate($itemized_call, number_called/text()) Navigate($itemized_call, rate/text()) …2
The Result of Cancel Out …1 $Itemized_call@number_called := Navigate($itemized_call, number_called/text()) $itemized_call@rate := Navigate($itemized_call, rate/text()) …2
Computation Pushdown • Goal: XAT SQL operators + XML operators • Step 0: Navigation Pushdown. • Step 1: XML Default View SQL Operators • Renaming columns • Step 2: SQL Computation Pushdown. • By commutative and composition rules. • E.g: predicates pushdown.
Navigation Pushdown. $Itemized_call@number_called := Navigate($itemized_call, number_called/text()) Join(“$itemized_call/invoice_id”=“$invoice/id”) $itemized_call@rate := Navigate($itemized_call, rate/text()) $Itemized_call@number_called := Navigate($itemized_call, number_called/text()) Join(“$itemized_call/invoice_id”=“$invoice/id”) $itemized_call@rate := Navigate($itemized_call, rate/text()) Navigate($itemized_call, invoice_id) $itemized_call := Navigate(“/”, itemized_call/row) $itemized_call := Navigate(“/”, itemized_call/row) Navigate(“$invoice”, id) Source(“default.xml”) $invoice := Navigate(“/”,invoice/row ) Source(“default.xml”) Source(“default..xml”)
XML Default View SQL … $Itemized_call@number_called := Navigate($itemized_call, number_called/text()) $itemized_call@rate := Navigate($itemized_call, rate/text()) … $itemized_call := Navigate(“/”, itemized_call/row) $Itemized_call@number_called := number_called Source(“default.xml”) $itemized_call@rate := rate Project(rate, number_called) Source(“itemized_call”)
Computation Pushdown B3 V1:=Tagger(<rate>[$rate]</rate><number_of_calls> [count($itemized_call)]</number_of_calls>) V1:=Tagger(<rate>[$rate]</rate><number_of_calls> [count($itemized_call)]</number_of_calls>) Select(count(“$itemized_call”)) Select(count(“$itemized_call”)) Select(“@number_called:$itemized_call” like ‘973%’) Select(“@rate:$itemized_call” = “$rate”) Select(“@rate:$itemized_call” = “$rate”) Select(“@number_called:$itemized_call” like ‘973%’) A SQL Block A SQL Block
Result of the Transformation Tagger(<summary>[V1]</summary>) V1 := Aggregate Tagger(<rate>[rate]</rate> <number_of_calls>[count(*)]</number_of_calls>) SQL: SELECT rate, count(*) FROM itemized_call, invoice WHERE number_called LIKE ‘973%’ AND invoice.id = itemized_call.invoice_id GROUPBY rate
Optimization • Efficient Publishing XML Views • Sorted Outer Union. • Special Tagger implementation • A lot More!
Summary • XQuery XAT • Query Block Identification • Query Decorrelation • View Composition • XAT Cutting • Navigation Pushdown • Navigation Cancel Out • Computation Pushdown • Navigation Pushdown • XML Default View SQL Operators • Computation Pushdown • Optimization