1 / 49

XQuery to SQL by XAT

XQuery to SQL by XAT. Xin Zhang Thanks: Brian, Mukesh, Maged, Lily, Elke. Outline. Merged algebra proposed based on Niagara XPERANTO One thorough example of XQuery  SQL. Data Model. An Ordered Table in two dimensions Tuple order Column order. Every cell has its own domain

quinn-vega
Download Presentation

XQuery to SQL by XAT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XQuery to SQL by XAT Xin Zhang Thanks: Brian, Mukesh, Maged, Lily, Elke

  2. Outline • Merged algebra proposed based on • Niagara • XPERANTO • One thorough example of • XQuery  SQL

  3. Data Model • An Ordered Table in two dimensions • Tuple order • Column order. • Every cell has its own domain • Every column binds to one variable. • The domain can be: • SQL domains. • XML Fragment. • Can be a list of XML elements. • Comparison are done by values

  4. Data Model Examples • Table of XML Fragments. • Explicit Naming • E.g. variable bindings • Implicit Naming • E.g. XPath notations. • Reduce complexity of many internal variables. <carrier> $carrier </carrier <carrier> $carrier </carrier <carrier> $carrier </carrier> ……….

  5. Naming of Columns • Implicit: • SQL operators • Navigate • Explicit ( “name”): • Variable binding: • Holding a set of values. • Variable name ($name) is name of a column • Rename • Distinguish in one operator where, • same “names” from different sources • Abbreviate a very long “name”. • Create a new name for creation operators • Need to used with those operators. • E.g. Tagger

  6. Operators • SQL like (9): • Project, Select, Join (Theta, Outer, Semi), Groupby, Orderby, Union (Node, Outer), COp. • XML like (4): • Tagger, Navigate, is(Element, Text), Aggregate. • Special: • SQL, Function, Source, Name, FOR

  7. SQL like Operators (9)

  8. XML like Operators

  9. Special Operators

  10. Operator Specification • Description • Input Specification. • Output Specification. • Logic description. • Illustrative Example

  11. Naming Operator • Syntax: • Name(“from_name”, “to_name”) • Simplified Syntax: • to_name := from_name

  12. Steps in Translation • XQuery  XML Algebra Tree • User View  XML Algebra Tree • View Composition • Computation Pushdown • Optimization

  13. Example of Telephone Bill • <?xml version=”1.0” encoding=”US-ASCII” ?> • <!DOCTYPE invoice [ • <!ELEMENT invoice (account_number, • bill_period, • carrier+, • itemized_call*, • total)> • <!ELEMENT account_number (#PCDATA)> • <!ELEMENT bill_period (#PCDATA)> • <!ELEMENT carrier (#PCDATA)> • <!ELEMENT itemized_call EMPTY> • <!ATTLIST itemized_call • no ID #REQUIRED • date CDATA #REQUIRED • number_called CDATA #REQUIRED • time CDATA #REQUIRED • rate (NIGHT|DAY) #REQUIRED • min CDATA #REQUIRED • amount CDATA #REQUIRED> • <!ELEMENT total (#PCDATA)> • ]> • <invoice> • <account_number>555 777-3158 573 234 3</account_number> • <bill_period>Jun 9 - Jul 8, 2000</bill_period> • <carrier>Sprint</carrier> • <itemized_callno=”1” date=”JUN 10” number_called=”973 555-8888” time=”10:17pm” rate=”NIGHT” min=”1” amount=”0.05” /> • <itemized_callno=”2” date=”JUN 13” number_called=”973 650-2222” time=”10:19pm” rate=”DAY” min=”1” amount=”0.15” /> • <itemized_callno=”3” date=”JUN 15” number_called=”206 365-9999” time=”10:25pm” rate=”NIGHT” min=”3” amount=”0.15” /> • <total>$0.35</total> • </invoice>

  14. Example XQuery User XQuery: <summary> { FOR $rate IN distinct(document(“invoice”)/invoice/itemized_call@rate) LET $itemized_call := document(“invoice”)/invoice/itemized_call[@rate=$rate] WHERE $itemized_call/@number_called LIKE ‘973%’ RETURN <rate>$rate</rate> <number_of_calls>count($itemized_call)</number_of_calls> } </summary> Count number of itemized_calls in calling area 973 grouped by the calling rate.

  15. XQuery  XML Algebra Tree • Divide into query blocks • Convert each query block into XML Algebra Tree (XAT). • Identify Correlated Operators • Combine into one XML Algebra Tree. • Query decorrelation

  16. The block identification is arbitrary (wrong). Query Blocks B1 User XQuery: <summary> { FOR $rate IN distinct(document(“invoice”)/invoice/itemized_call@rate) LET $itemized_call := document(“invoice”)/invoice/itemized_call[@rate=$rate] WHERE $itemized_call/@number_called LIKE ‘973%’ RETURN <rate>$rate</rate> <number_of_calls>count($itemized_call)</number_of_calls> } </summary> B2 B3 B1: Construct summary from the result from B2 B2: Get all the distinct rate and iterate through it. B3: Count itemized call for a given rate.

  17. XAT of B1 XAT: B1 Name(“Tagger(<summary>[V1]</summary>)”, “V2”) <summary> </summary> B2 Tagger(<summary> [V1]</summary>) B2 [V2] it is a name instead of a part of pattern.

  18. XAT of B2 Aggregate XAT: • B3 { FOR $rate IN distinct(document(“invoice”)/invoice/itemized_call@rate) • } FOR($rate) B3 Name(“distinct(invoice/itemized_call/@rate:/)”, “$rate”) B3 Select(distinct(“invoice/itemized_call/@rate:/”)) Navigate(“/”, invoice/itemized_call/@rate) Source(“invoice.xml”)

  19. XAT of B3 XAT: • B4 LET $itemized_call := document(“invoice”) /invoice/itemized_call [@rate=$rate] WHERE $itemized_call /@number_called LIKE ‘973%’ RETURN <rate>$rate</rate> <number_of_calls> count($itemized_call) </number_of_calls> Select(“@rate:$itemized_call” = “$rate”) B2 Navigate(“$itemized_call”, @rate) Name(“invoice/itemized_call:/”, “$itemized_call”) Navigate(“/”, invoice/itemized_call) Source(“invoice.xml”)

  20. XAT of B3 (Cont.) XAT: • B4 LET $itemized_call := document(“invoice”) /invoice/itemized_call [@rate=$rate] WHERE $itemized_call /@number_called LIKE ‘973%’ RETURN <rate>$rate</rate> <number_of_calls> count($itemized_call) </number_of_calls> Select(“@number_called:$itemized_call” like ‘973%’) Navigate(“$itemized_call”, @number_called)

  21. XAT of B3 (Cont.) XAT: • B4 LET $itemized_call := document(“invoice”) /invoice/itemized_call [@rate=$rate] WHERE $itemized_call /@number_called LIKE ‘973%’ RETURN <rate>$rate</rate> <number_of_calls> count($itemized_call) </number_of_calls> Name(“Tagger(<rate>[$rate]</rate> <number_of_calls> [count($itemized_call)]</number_of_calls>)”, “V1”) Tagger(<rate>[$rate]</rate> <number_of_calls> [count($itemized_call)] </number_of_calls>) B2 Select(count(“$itemized_call”))

  22. B1 B3 Put it Together Name(“Tagger(<summary>[V1]</summary>)”, “V2”) Tagger(<summary>[V1]</summary>) Name(“Tagger(<rate>[$rate]</rate> <number_of_calls> [count($itemized_call)]</number_of_calls>)”, “V1”) Tagger(<rate>[$rate]</rate><number_of_calls> [count($itemized_call)]</number_of_calls>) Aggregate() Select(count(“$itemized_call”)) FOR($rate) Select(“@number_called:$itemized_call” like ‘973%’) Name(“distinct(invoice/itemized_call/@rate:/)”, “$rate”) Navigate(“$itemized_call”, @number_called) Select(distinct(“invoice/itemized_call/@rate:/”)) Select(“@rate:$itemized_call” = “$rate”) Navigate(“$itemized_call”, @rate) Navigate(“/”, invoice/itemized_call/@rate) Name(“invoice/itemized_call:/”, “$itemized_call”) Source(“invoice.xml”) Navigate(“/”, invoice/itemized_call) B2 Source(“invoice.xml”)

  23. Syntax Suger B3 B1 V1:=Tagger(<rate>[$rate]</rate><number_of_calls> [count($itemized_call)]</number_of_calls>) V2 := Tagger(<summary>[V1]</summary>) Select(count(“$itemized_call”)) Aggregate() Select(“@number_called:$itemized_call” like ‘973%’) FOR($rate) Navigate(“$itemized_call”, @number_called) $rate := Select(distinct(“invoice/itemized_call/@rate:/”)) Select(“@rate:$itemized_call” = “$rate”) Navigate(“/”, invoice/itemized_call/@rate) Navigate(“$itemized_call”, @rate) Source(“invoice.xml”) $itemized_call := Navigate(“/”, invoice/itemized_call) Source(“invoice.xml”) B2

  24. Query Decorrelation for COp • Top-down approach over XAT Tree. • Approach: • Correlated Binding (CB) • Op1[COp(CB, Op2)[Op3[Correlated Operator[A],B]]]  Op1[ROJ(CB)[Op2[Groupby(CB, Op3[]) [Operator[Cartesian[A,B]]]], B]] • For example: • Correlated Join  Outer Join with Groupby with Cartesian

  25. Query Decorrelation for FOR • Top-down approach over XAT Tree. • Approach: • Correlated Binding (CB) • Op1[FOR(CB)[Op2[Correlated Operator[A],B]]]  Op1[Groupby(CB, Op2[]) [Operator[Cartesian[A,B]]]] • Differences: • SQL Decorrelation: Return Outer Query • XQuery Decorrelation: Return Inner Query • CO: Return both Outer/Inner Query

  26. FOR Decorrelation Example B1 B1 B2 B2 Aggregate Aggregate FOR($rate) …1 Groupby(“$ratel”, ) …2 Source(“invoice.xml”) …1 …2 B3 Select(“@rate:$itemized_call” = “$rate”) Source(“invoice.xml”) Select(“@rate:$itemized_call” = “$rate”) Cartesian …3 …3 B3 Source(“invoice.xml”) Source(“invoice.xml”)

  27. Default XML View <invoice> <row> <id> 1 </id> <account_number>555 777-3158 573 234 3</account_number> <bill_period> Jun 9 – Jun 8, 2000 </bill_period> <total>$0.35</total> </row> </invoice> <carrier> <row> <invoice_id> 1 </invoice_id> <carrier>Sprint</carrier> </row> </carrier> ... invoice carrier itemized_call

  28. User Defined XML View <invoice> <row> <id> 1 </id> <account_number>555 777-3158 573 234 3</account_number> <bill_period> Jun 9 – Jun 8, 2000 </bill_period> <total>$0.35</total> </row> </invoice> <carrier> <row> <invoice_id> 1 </invoice_id> <carrier>Sprint</carrier> </row> </carrier> <itemized_call> <row> <invoice_id>1</invoice_id> <no>1</no> … </row> … </itemized_call> • <invoice> • <account_number>555 777-3158 573 234 3</account_number> • <bill_period>Jun 9 - Jul 8, 2000</bill_period> • <carrier>Sprint</carrier> • <itemized_callno=”1” date=”JUN 10” number_called=”973 555-8888” time=”10:17pm” rate=”NIGHT” min=”1” amount=”0.05” /> • <itemized_callno=”2” date=”JUN 13” number_called=”973 650-2222” time=”10:19pm” rate=”DAY” min=”1” amount=”0.15” /> • <itemized_callno=”3” date=”JUN 15” number_called=”206 365-9999” time=”10:25pm” rate=”NIGHT” min=”3” amount=”0.15” /> • <total>$0.35</total> • </invoice>

  29. User Defined XML View Cont. Create view invoice as ( FOR $invoice IN view(“default”)/invoice/row RETURN <invoice> <account_number>$invoice/account_number/text()</account_number> <bill_period>$invoice/bill_period/text()</bill_period> FOR $carrier in view(“default”)/carrier/row WHERE $carrier/invoice_id = $invoice/id RETURN <carrier>$carrier/carrier/text()</carrier> FOR $itemized_call in view(“default”)/itemized_call/row WHERE $itemized_call/invoice_id = $invoice/id RETURN <itemized_call no=$itemized_call/no/text() date=$itemized_call/date/text() number_called=$itemized_call/number_called/text() time=$itemized_call/time/text() rate=$itemized_call/rate/text() min=$itemized_call/min/text() amount=$itemized_call/amount /text()> SORTBY (@no) <total>$invoice/total/text()</total> </invoice> )

  30. User Defined XML View Block Create view invoice as ( FOR $invoice IN view(“default”)/invoice/row RETURN <invoice> <account_number>$invoice/account_number/text()</account_number> <bill_period>$invoice/bill_period/text()</bill_period> FOR $carrier in view(“default”)/carrier/row WHERE $carrier/invoice_id = $invoice/id RETURN <carrier>$carrier/carrier/text()</carrier> FOR $itemized_call in view(“default”)/itemized_call/row WHERE $itemized_call/invoice_id = $invoice/id RETURN <itemized_call no=$itemized_call/no/text() date=$itemized_call/date/text() number_called=$itemized_call/number_called/text() time=$itemized_call/time/text() rate=$itemized_call/rate/text() min=$itemized_call/min/text() amount=$itemized_call/amount /text()> SORTBY (@no) <total>$invoice/total/text()</total> </invoice> ) B4 B5 B6

  31. XML View XAT Aggregate() Aggregate() V3 := Tagger(<itemized_call no=[] date=[]number_called=[] time=[] rate=[] min=[] amount=[]/> V4 := Tagger(<invoice> <account_number>[$invoice/account_number/text()]</account_number> <bill_period>[$invoice/bill_period/text()</bill_period> …[V3]<total>[$invoice/total/text()]</total></invoice>) Navigate($itemized_call, no/text()) … Navigate($itemized_call, amount/text()) FOR($invoice/id) Select(“$itemized_call/invoice_id”=“$invoice/id”) Navigate(“$invoice”, id) Navigate($itemized_call, invoice_id) $invoice := Navigate(“/”,invoice/row ) B5 $itemized_call := Navigate(“/”, itemized_call/row) Source(“default..xml”) Source(“default.xml”)

  32. 3-Way Correlation B4 …1 FOR($invoice/id) …2 Source(“invoice.xml”) B6 B5

  33. 3-Way Decorrelation B4 …1 JOIN($invoice/id) GB($invoice/id, …) GB($invoice/id, …) B6 with Cartesian B5 with Cartesian …2 …2 Source(“default.xml”) Source(“default.xml”)

  34. View XAT After Decorrelation Groupby($invoice/id, Aggregate()) Aggregate() V3 := Tagger(<itemized_call no=[] date=[]number_called=[] time=[] rate=[] min=[] amount=[]/> V4 := Tagger(<invoice> <account_number>[$invoice/account_number/text()]</account_number> <bill_period>[$invoice/bill_period/text()</bill_period> …[V3]<total>[$invoice/total/text()]</total></invoice>) Navigate($itemized_call, no/text()) … Join($invoice/id) Navigate($itemized_call, amount/text()) Join(“$itemized_call/invoice_id”=“$invoice/id”) Groupby($invoice/id…) Navigate(“$invoice”, id) B5 Navigate($itemized_call, invoice_id) Navigate(“$invoice”, id) $itemized_call := Navigate(“/”, itemized_call/row) $invoice := Navigate(“/”,invoice/row ) $invoice := Navigate(“/”,invoice/row ) Source(“default.xml”) Source(“default..xml”) Source(“default..xml”)

  35. View Composition • Input: • User Query XAT + User View XAT • Output: • Simplified composite XAT • Approach: • XAT Cutting: Remove un-referenced columns and operators. • Pushdown Navigation • By using the commutative rules • Cancel out the navigation operators • By using the composition rules

  36. XAT Cutting • Cut Query Blocks • User query only require itemized_call. • B5 is cut, • Invoice is cut • B4 is simplified. • B6 is simplified. • Cut Columns • User query only used itemized_call@rate.

  37. View XAT After B5 is Cut. Groupby($invoice/id, Aggregate()) Aggregate() V3 := Tagger(<itemized_call no=[] date=[]number_called=[] time=[] rate=[] min=[] amount=[]/> V4 := Tagger(<invoice> <account_number>[$invoice/account_number/text()]</account_number> <bill_period>[$invoice/bill_period/text()</bill_period[V3] <total>[$invoice/total/text()]</total></invoice>) Navigate($itemized_call, no/text()) … Navigate($itemized_call, amount/text()) Join(“$itemized_call/invoice_id”=“$invoice/id”) Navigate($itemized_call, invoice_id) Navigate(“$invoice”, id) $itemized_call := Navigate(“/”, itemized_call/row) $invoice := Navigate(“/”,invoice/row ) Source(“default.xml”) Source(“default..xml”)

  38. View After Columns are Cut. Aggregate() V4 := Tagger(<invoice>[V3] </invoice>) Groupby($invoice/id, Aggregate()) V3 := Tagger(<itemized_call number_called=[] rate=[] /> Navigate($itemized_call, number_called/text()) Navigate($itemized_call, rate/text()) Join(“$itemized_call/invoice_id”=“$invoice/id”) Navigate($itemized_call, invoice_id) Navigate(“$invoice”, id) $itemized_call := Navigate(“/”, itemized_call/row) $invoice := Navigate(“/”,invoice/row ) Source(“default.xml”) Source(“default..xml”)

  39. Navigation Cancel Out • Navigation Pushdown • Based on some transformation rules. • E.g. commutative of navigation and other operators. • Navigation + Tagger Cancel Out • Composition Rules. • The cancellation result is “renaming”

  40. Query XAT Navi. Pushdown B3 V1:=Tagger(<rate>[$rate]</rate><number_of_calls> [count($itemized_call)]</number_of_calls>) V1:=Tagger(<rate>[$rate]</rate><number_of_calls> [count($itemized_call)]</number_of_calls>) Select(count(“$itemized_call”)) Select(count(“$itemized_call”)) Select(“@number_called:$itemized_call” like ‘973%’) Select(“@number_called:$itemized_call” like ‘973%’) Navigate(“$itemized_call”, @number_called) Select(“@rate:$itemized_call” = “$rate”) Select(“@rate:$itemized_call” = “$rate”) Navigate(“$itemized_call”, @number_called) Navigate(“$itemized_call”, @rate) Navigate(“$itemized_call”, @rate) $itemized_call := Navigate(“/”, invoice/itemized_call) $itemized_call := Navigate(“/”, invoice/itemized_call) Source(“invoice.xml”) Source(“invoice.xml”)

  41. Navi. Tagger Cancel Out B3 …1 Aggregate() Navigate(“$itemized_call”, @number_called) V4 := Tagger(<invoice>[V3] </invoice>) Navigate(“$itemized_call”, @rate) Groupby($invoice/id, Aggregate()) $itemized_call := Navigate(“/”, invoice/itemized_call) V3 := Tagger(<itemized_call number_called=[] rate=[] /> Source(“invoice.xml”) Navigate($itemized_call, number_called/text()) Navigate($itemized_call, rate/text()) …2

  42. The Result of Cancel Out …1 $Itemized_call@number_called := Navigate($itemized_call, number_called/text()) $itemized_call@rate := Navigate($itemized_call, rate/text()) …2

  43. Computation Pushdown • Goal: XAT  SQL operators + XML operators • Step 0: Navigation Pushdown. • Step 1: XML Default View  SQL Operators • Renaming columns • Step 2: SQL Computation Pushdown. • By commutative and composition rules. • E.g: predicates pushdown.

  44. Navigation Pushdown. $Itemized_call@number_called := Navigate($itemized_call, number_called/text()) Join(“$itemized_call/invoice_id”=“$invoice/id”) $itemized_call@rate := Navigate($itemized_call, rate/text()) $Itemized_call@number_called := Navigate($itemized_call, number_called/text()) Join(“$itemized_call/invoice_id”=“$invoice/id”) $itemized_call@rate := Navigate($itemized_call, rate/text()) Navigate($itemized_call, invoice_id) $itemized_call := Navigate(“/”, itemized_call/row) $itemized_call := Navigate(“/”, itemized_call/row) Navigate(“$invoice”, id) Source(“default.xml”) $invoice := Navigate(“/”,invoice/row ) Source(“default.xml”) Source(“default..xml”)

  45. XML Default View  SQL … $Itemized_call@number_called := Navigate($itemized_call, number_called/text()) $itemized_call@rate := Navigate($itemized_call, rate/text()) … $itemized_call := Navigate(“/”, itemized_call/row) $Itemized_call@number_called := number_called Source(“default.xml”) $itemized_call@rate := rate Project(rate, number_called) Source(“itemized_call”)

  46. Computation Pushdown B3 V1:=Tagger(<rate>[$rate]</rate><number_of_calls> [count($itemized_call)]</number_of_calls>) V1:=Tagger(<rate>[$rate]</rate><number_of_calls> [count($itemized_call)]</number_of_calls>) Select(count(“$itemized_call”)) Select(count(“$itemized_call”)) Select(“@number_called:$itemized_call” like ‘973%’) Select(“@rate:$itemized_call” = “$rate”) Select(“@rate:$itemized_call” = “$rate”) Select(“@number_called:$itemized_call” like ‘973%’) A SQL Block A SQL Block

  47. Result of the Transformation Tagger(<summary>[V1]</summary>) V1 := Aggregate Tagger(<rate>[rate]</rate> <number_of_calls>[count(*)]</number_of_calls>) SQL: SELECT rate, count(*) FROM itemized_call, invoice WHERE number_called LIKE ‘973%’ AND invoice.id = itemized_call.invoice_id GROUPBY rate

  48. Optimization • Efficient Publishing XML Views • Sorted Outer Union. • Special Tagger implementation • A lot More!

  49. Summary • XQuery  XAT • Query Block Identification • Query Decorrelation • View Composition • XAT Cutting • Navigation Pushdown • Navigation Cancel Out • Computation Pushdown • Navigation Pushdown • XML Default View  SQL Operators • Computation Pushdown • Optimization

More Related