1 / 36

SilkRoute: A framework for publishing relational data in XML

This paper presents a framework for publishing and querying XML data using a relational engine, enabling generalized mappings from relations to XML with selective and efficient materialization of XML fragments.

hotte
Download Presentation

SilkRoute: A framework for publishing relational data in XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SilkRoute: A framework for publishing relational data in XML In ACM Transactions on Databases, 2002 By Mary Fernández, AT&T Labs - Research Dan Suciu, Univ. of Washington Yada Kadiyska, Univ. of Washington Atsuyuki Morishima, Univ. of Tsukuba Wang-Chiew Tan, Univ. of Pennsylvania Presented by Yaniv Kaplan Original slides by M. Fernández, D.Suciu and W.C.Tan

  2. ? ? Problem Supplier Supplier • XML is widely used as an exchange format (regardless of how data is actually stored) • Most data resides in RDBMS • Goal: Publish and query XML data using the relational engine Common XML Illusion Broker

  3. Requirements • Generality: support general mappings from relations to XML • Selectivity: only the required fragments of the XML view should be materialized • Efficiency: results should be quickly obtained

  4. Supplier Example Shared XML Schema element supplier { element company, element product* } element product { element name, element category, element description, element retail, element sale?, element report* } elements company, name, category { string } elements retail, sale { float } element report { attribute code { string }, string } Supplier’s Database CREATE TABLE Clothing( pid CHAR(10) PRIMARY KEY, item VARCHAR(30), category VARCHAR(20), description VARCHAR(200), price REAL, cost REAL) CREATE TABLE Problems( pid CHAR(10), code CHAR(10), comments VARCHAR(200)) CREATE TABLE Discount( pid CHAR(10) PRIMARY KEY, item VARCHAR(30), discount REAL)

  5. Canonical XML View of Relational Data R(A,B,C) = {(a1,b1,c1), (a2,b2,c2), …, (an,bn,cn)} <R> <Tuple> <A>a1</A><B>b1</B><C>c1</C> </Tuple> … <Tuple> <A>an</A><B>bn</B><C>cn</C> </Tuple> </R>

  6. <Clothing> <Tuple> <pid>001</pid> <item>Suede jacket</item> <category>outerwear</category> <description>Hip length</description> <price>300.00</price> <cost>175.00</cost> </Tuple> <Tuple> <pid>002</pid> <item>Rain boots</item> <category>outerwear</category> <description>Ankle height</description> <price>45.00</price> <cost>19.99</cost> </Tuple> </Clothing> Canonical XML View of Supplier Example Clothing(pid, item, category, description, price, cost) : { <001, “Suede Jacket”, “outerwear”,”Hip length”, 300.00, 175.00>, <002, “Rain boots”, “outerwear”, “Ankle height”, 45.00, 19.99> }

  7. Public XML View/Public Query • Defined by a XQuery over canonical XML view of relational data • Client/Application sees this virtual XML document • Written by DB administrator

  8. Input (relational) schema S: Clothing(pid, item, category, description, price, cost) Discount(pid, item, discount) Problems(pid, code, comments) <supplier> <company>Acme Clothing</company> { FOR $c IN $CanonicalView/Clothing/Tuple WHERE data($c/category) = “outerwear” RETURN <product> <name>{ data($c/item) } </name> <category>{ data($c/category) }</category> <description>{ data($c/description) }</description> <retail>{ data($c/price) } </retail> { FOR $d IN $CanonicalView/Discount/Tuple WHERE $d/pid = $c/pid RETURN <sale> { data($c/price)*data($d/discount) } </sale> } { FOR $p IN $CanonicalView/Problems/Tuple WHERE $p/pid = $c/pid RETURN <reportcode=“{ data($p/code) }”>{ $p/comments }</report> } </product> } </supplier> QP: Public Query Example Output (XML)Schema: Supplier company product* name category description retail sale? report*

  9. Public View of Supplier Example Problems(pid, code, comments) : { <002, “defective”, “leaks in heel”>, <002, “defective”, “heel separates”> } Discount(pid, item, discount) : { <001, “Suede Jacket”, 0.70>, <002, “Rain boots”, 0.50> } <supplier> <company>ACME Clothing</company> <product> <name>Suede jacket</name> <category>outerwear</category> <description>Hip length</description> <retail>300.00</retail> <sale>210.00</sale> </product> <product> <name>Rain boots</name> <category>outerwear</category> <description>Ankle height</description> <retail>45.00</retail> <sale>22.50</sale> <report code=“defective”>leaks in heel</sale> <report code=“defective”>heel separates</sale> </product> </supplier>

  10. Application Query • Written by a client or application • A XQuery over the public XML view

  11. QA: Application Query Example • Find all products with sale price less than ½ of its retail price FOR $s in $PublicView/supplier RETURN <supplier> { <name>{ data($PublicView/supplier/company }></name> <discounted> { FOR $p in $s/product WHERE data($p/sale) < 0.5*data($p/retail) RETURN <product>{ data($p/name) }</product> } </discounted> } </supplier>

  12. QA+QP (Intuition only) • Goal: Compose QA+QP to get an XQuery expression that takes as input only the canonical view • 1st option: replace every instance of $PublicView in QA with QP • 2nd option: next slide…

  13. QA+QP (Cont.) <supplier> <name>Acme Clothing</name> <discounted> { for $c in $CanonicalView/Clothing/Tuple, $d in $CanonicalView/Discount/Tuple where data($c/category) = “outerware”, data($c/pid) = data($d/pid) data($c/price) * data($d/discount) < 0.5 * data($c/price) return <product>{ data($c/item) }</product> } </discounted> </supplier>

  14. View Forest/Tree • Internal abstraction of how an XML document can be materialized using SQL queries over relational tables • Also an abstraction of XQuery to SQL translation • Nodes correspond to XML hierarchy; internal nodes – elements/attributes, leafs – element/attribute type • Each node “holds” an SQL query to retrieve all elements of node type • The trick: connect elements in correct parent-sibling relation • Compositional

  15. N1 <supplier> N1.1 N1.2 <company> <product> N1.1.1 N1.2.2 N1.2.1 N1.2.3 N1.2.4 N1.2.5 N1.2.6 <sale> <report> <category> <name> <description> <retail> string N1.2.1.1 N1.2.2.1 N1.2.3.1 N1.2.4.1 N1.2.5.1 N1.2.6.1 N1.2.6.2 string string string float @code float string N1.2.6.1.1 string CN1 = SELECT * FROM ( ) CN1.2 = SELECT * FROM Clothing c WHERE c.category = "outerwear" CN1.2 = SELECT * FROM Clothing c, Problems p WHERE c.category = "outerwear" AND p.pid = c.pid

  16. View Forest of a Canonical Mapping CN1: SELECT * FROM () CN1.1: SELECT * FROM Clothing c CN1.1.2: SELECT * FROM Clothing c CN1.1.2.1: SELECT c.item FROM Clothing c

  17. Main Modules • View Composer: compose QA± Qp to get a query that refers only to the canonical XML documents, i.e. relations • Query Planner: creates one or more SQL queries • XML Generator: massages tabular results into XML format

  18. SilkRoute Architecture

  19. View Forest of QA? • If we can obtain the view forest of QA, we can compute the XML result of QA using only SQL queries • Can we always translate QA into a view forest?

  20. XQueryCore • A subset of XQuery Core Language given in W3C XQuery Formal semantics • No recursive functions, no operators that depend on “orderness” of XML. E.g., n << m (returns true if node n precedes node m in document order) • Every XQueryCore expression can be translated into a view forest • QA± Qp can be translated into a view forest

  21. Property of View Forest Composition Algorithm (VFCA) • Let Q be a XQueryCore expression defined over the view forests X1, …, Xn • The output of VFCA(Q, X1, …, Xn) is a view forest VQ such that for all relation instances I over the relation schema S VQ(I) = Q(X1(I), …, Xn(I)) • Note that each Xi is either a canonical view forest or a view forest of a XQueryCore expression defined directly or indirectly over the canonical view forest of S

  22. Computing VFCA(Q, X1, …, Xn) • Refer to paper for details…

  23. Query Planner • Takes a view forest and returns one or more SQL queries to send to the relational engine • Idea: partition the view forest • Extreme strategies: Fully partitioned strategy vs. unified strategy • Many SQL queries and connect calls to RDBMS vs. one big SQL query and only one connect call to RDBMS • Optimal strategy usually lies somewhere in between

  24. Steps in Query Planning

  25. Query Planning Example FOR $c IN $CanonicalView/Clothing/Tuple RETURN <product> { FOR $d IN $CanonicalView/Discount/Tuple WHERE $d/pid = $c/pid RETURN <sale> { data($c/price)*data($d/discount) } </sale> } { FOR $p IN $CanonicalView/Problems/Tuple WHERE $p/pid = $c/pid RETURN <report>{ $p/comments }</report> } </product> CN1.1.1: SELECT (d.discount*c.price) as sale FROM Clothing c, Discount d WHERE c.pid=d.pid CN1.2.1: SELECT p.comments FROM Clothing c, Problems p WHERE c.pid=p.pid

  26. Step I. Identify keys • Identify and add keys to every SELECT clause in view forest SELECT A FROM E1 x1, …, Ek xk  SELECT A , key(E1) , … , key(Ek) FROM E1 x1, …, Ek xk • Keys are used to merge data from multiple tuple streams efficiently • If tuple streams are sorted on keys, the XML result can be constructed by making one pass through the tuple streams

  27. Input (relational) schema S: Clothing(pid, item, category, description, price, cost) Discount(pid, item, discount) Problems(pid, code, comments) Identify Keys • Add keys c.pid, p.pid • SELECT c.pid, p.pid, p.comments • FROM Clothing c, Problems p • WHERE c.pid=p.pid • Add keys c.pid and d.pid • SELECT c.pid, d.pid, (d.discount*c.price) as sale • FROM Clothing c, Discount d • WHERE c.pid=d.pid

  28. Step II. Partition the View Forest • Exponentially many plans in general • Let m be the number of edges in a view tree. Number of possible partitions = 2m • Step III: Generate a SQL query for each partition: Connect elements in correct parent-sibling relation using joins: OJP (Outer-Join Plan) of a partition = SQL query of root node LEFT OUTER-JOIN (OJP(n1) UNION … UNION OJP(nk)) where n_1, …, n_k are children nodes root node

  29. Step III: OJP of View Forest (a) SELECT 1 AS L1, c.pid, L2, (c.price*Q.discount) as Sale, Q.comments FROM Clothing c LEFT OUTER JOIN ( ( SELECT 1 AS L2 d.pid AS pid, d.discount AS discount, NULL AS comments FROM Discount d) UNION ( SELECT 2 AS L2, p.pid AS pid, NULL AS discount, p.comments AS comments FROM Problems p) ) AS Q ON c.pid = Q.pid ORDER BY L1, c.pid, L2, sale, Q.comments Ordering information: <sale> comes before <report>

  30. Step IV: Generate XML Output These tuples generate information for the same <product> element. Step IV: XML Generation <product> <sale>10</sale> <report>fits poorly</report> <report>button missing</report> </product>

  31. Corresponding XML Output <product> <sale>10</sale> <report>fits poorly</report> <report>button missing</report> </product> <product> <sale>120</sale> </product> <product/> <product> <sale>56</sale> <report>zipper jeans</report> <product>

  32. Step III: OJP of view forest (b) For <product>-<sale> edge: SELECT 1 AS L1, c.pid, L2, (c.price*Q.discount) as sale FROM Clothing c LEFT OUTER JOIN ( ( SELECT 1 as L2, d.pid, d.discount FROM Discount d) ) AS Q ON c.pid = Q.pid ORDER BY L1, c.pid, L2, Q.pid, Q.sale For <report> edge: SELECT 1 AS L1, 2 AS L2, c.pid, p.comment FROM Clothing c, Problems p WHERE c.pid = p.pid ORDER BY L1, c.pid, L2, p.comment

  33. Step IV: Generate XML Output

  34. Step IV: Generate XML Output XML Generator makes one pass through the two tuple streams: <product> <sale>10</sale> <report>fits poorly</report> <report>button missing</report> </product>

  35. Partition Greedily • Cost of a query Q, cost(Q) = a*evaluationcost(Q) + b*datasize(Q) • Cost of an edge E, cost(E) = cost(Qc) – (cost(Q1) + cost(Q2)) Q1, Q2 = queries at parent and child nodes of E resp. Qc = combined query • Evaluationcost, datasize are estimates given by RDBMS • Pick an edge(E) if cost(E) is less than some threshold

  36. We can go on and on… • VFCA • View Forest/Tree details • View Tree reductions • Other alternatives to OJP • Experimental results • Other available systems • SilkRoute is available at silkroute.sourceforge.net

More Related