1 / 49

Efficiently Publishing Relational Data as XML Documents

Efficiently Publishing Relational Data as XML Documents. University of Wisconsin-Madison/ IBM Almaden Research Center. Jayavel Shanmugasundaram. Joint work with:. Rimon Barr Michael Carey Bruce Lindsay Hamid Pirahesh Berthold Reinwald Eugene Shekita. Outline. Why? How? Which? Hence.

Download Presentation

Efficiently Publishing Relational Data as XML Documents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficiently Publishing Relational Data as XML Documents University of Wisconsin-Madison/IBM Almaden Research Center Jayavel Shanmugasundaram Joint work with: Rimon BarrMichael CareyBruce LindsayHamid PiraheshBerthold ReinwaldEugene Shekita

  2. Outline • Why? • How? • Which? • Hence

  3. XML Example <department name=“Purchasing”> <emplist> <employee> John </employee> <employee> Mary </employee> </emplist> <projlist> <project> Internet </project> <project> Recycling </project> </projlist> </department>

  4. What is the big deal about XML? • Elegantly models complex, hierarchical/ graph-structured data • Domain-specific tags (unlike HTML) • Simple! • Fast emerging as dominant standard for data exchange on the WWW

  5. Why Relational Data? • Most business data stored in relational databases • Unlikely to change in the near future • Scalability, Reliability, Performance, Tools • Need efficient means to publish relational data as XML documents

  6. Usage Scenario Application/User Query to produce XML Documents XML Result (processed or displayed in browser) The Internet Existing Database System (RDBMS)

  7. Project Employee Department DeptId ProjName ProjId DeptId EmpName Salary EmpId DeptId DeptName 10 Internet 888 10 John 50K 101 Purchasing 795 10 Recycling 10 91 10 Mary 70K Example Relational Schema

  8. XML Representation <department name=“Purchasing”> <emplist> <employee> John </employee> <employee> Mary </employee> </emplist> <projlist> <project> Internet </project> <project> Recycling </project> </projlist> </department>

  9. Main Issues • Relational data is flat, XML is a tagged graph • How do we specify translation from flat model to a graph model? • A query language to map from relations to XML • How do we transform flat representations to tagged nested representations? • Efficient implementation strategies

  10. Outline • Why? • How? • Language? • Mechanism? • Which? • Hence

  11. Transformation Languages • Two obvious choices: • XML Query Language • SQL

  12. Project Employee Department DeptId ProjName ProjId DeptId EmpName Salary EmpId DeptId DeptName 10 Internet 888 10 John 50K 101 Purchasing 795 10 Recycling 10 91 10 Mary 70K Example Relational Schema

  13. XMLQL: Default XML View <defaultview> <department> <row> <deptid>10</> <deptname>Purchasing</> </row> </department> <employee> <row> <empid>101</> <deptid>10</> <empname>John</> <salary>50K</> </row> <row> <empid>91</> <deptid>10</> <empname>Mary</> <salary>70K</> </row> </employee> <project> <row> <projid>888</> <deptid>10</> <projname>Internet</> </row> <row> <projid>795</> <deptid>10</> <projname>Recycling</> </row> </project> </defaultview>

  14. XMLQL: Query Over Default View WHERE <defaultview.department.row> <deptid> $did </> <deptname> $dname </> </> IN DefaultView CONSTRUCT <department name=$dname> <emplist> </emplist> <projlist> </projlist> </> { WHERE <defaultview.employee.row> <deptid> $did </> <empname> $ename </> </> IN DefaultView CONSTRUCT <employee> $ename </> } { WHERE <defaultview.project.row> <deptid> $did </> <projname> $pname </> </> IN DefaultView CONSTRUCT <project> $pname </> }

  15. XMLQL: Query Result <department name=“Purchasing”> <emplist> <employee> John </employee> <employee> Mary </employee> </emplist> <projlist> <project> Internet </project> <project> Recycling </project> </projlist> </department>

  16. XMLQL: Pros and Cons • Pros: • Natural for XML users • Infrastructure to build hierarchies of XML views • One query language for XML and relational data • Cons: • Ignores existing API (JDBC), tools, support • Need to mature new query language (aggregates etc.)

  17. SQL: Key Ideas • Sub-queries to specify nesting • Scalar functions to specify tags/attributes • XML Constructors • Aggregate functions to group child elements

  18. SQL: Query to publish XML Select DEPT(d.name, <subquery to produce emplist>, <subquery to produce projlist> )From Department d

  19. SQL: XML Constructor Define XML Constructor DEPT(dname: varchar(20), emplist: xml, projlist: xml) As ( <department name=$dname> <emplist> $emplist </emplist> <projlist> $projlist </projlist></department> )

  20. SQL: Query to publish XML Select DEPT(d.name, <subquery to produce emplist>, <subquery to produce projlist> )From Department d

  21. SQL: Query to publish XML Select DEPT(d.name, (Select XMLAGG(EMP(e.name)) From Employee e Where e.deptno = d.deptno), <subquery to produce projlist> )From Department d

  22. SQL: XML Constructor Define XML Constructor EMP(ename: varchar(20)) As ( <employee> <name> $ename </name></employee> )

  23. SQL: Query to publish XML Select DEPT(d.name, (Select XMLAGG(EMP(e.name)) From Employee e Where e.deptno = d.deptno), <subquery to produce projlist> )From Department d

  24. SQL: Query to publish XML Select DEPT(d.name, (Select XMLAGG(EMP(e.name)) From Employee e Where e.deptno = d.deptno), (Select XMLAGG(PROJ(p.name)) From Project p Where p.deptno = d.deptno) )From Department d

  25. Query Result (<XML Result>) <department name=“Purchasing”> <emplist> <employee> John </employee> <employee> Mary </employee> </emplist> <projlist> <project> Internet </project> <project> Recycling </project> </projlist> </department>

  26. SQL: Pros and Cons • Pros: • Reuses SQL infrastructure/API • Natural for SQL users • Efficient execution inside relational engine • Cons: • Limited support for XML View Composition

  27. Outline • Why? • How? • Language? • Mechanism? • Which? • Hence

  28. Relations to XML: Issues • Two main differences: • Nesting (structuring) • Tagging • Space of alternatives: Early Tagging Late Tagging Outside Engine Outside Engine Early Structuring Inside Engine Inside Engine Outside Engine Late Structuring Inside Engine

  29. Early Tagging, Early Structuring, Outside Engine Stored Procedure Approach • Issue queries for sub-structures and tag them • Could be a Stored Procedure (10, Purchasing) DBMS Engine Department (Internet) (Recycling) (John) (Mary) Employee Project • Problem: Too many SQL queries!

  30. Early Tagging, Early Structuring, Inside Engine Correlated CLOB Approach Select DEPT(d.name, (Select XMLAGG(EMP(e.name)) From Employee e Where e.deptno = d.deptno), (Select XMLAGG(PROJ(p.name)) From Project p Where p.deptno = d.deptno) )From Department d • Problem: Correlated execution of sub-queries

  31. Early Tagging, Early Structuring, Inside Engine De-Correlated CLOB Approach With EmpStruct(deptname, empinfo) AS ( Select d.deptname, XMLAGG(EMP(employee, e.empname)) From department d left join employee e on d.deptid = e.deptid Group By d.deptname) With ProjStruct (deptname, projinfo) AS ( Select d.deptname, XMLAGG(PROJ(employee, p.projname)) From department d left join project p on d.deptid = e.deptid Group By d.deptname) Select DEPT(name, d1.empinfo, d2.projinfo)) From EmpStruct d1 full join ProjStruct d2 on d1.deptname = d2.deptname • Problem: CLOBs during processing

  32. Late Tagging, Late Structuring • XML document content produced without structure (in arbitrary order) • Tagger enforces order as final step Result XML Document Tagging Unstructured content Relational QueryProcessing

  33. (10, John) (10, Mary) (Purchasing, John, Internet) (Purchasing, John, Recycling) (Purchasing, Mary, Internet) (Purchasing, Mary, Recycling) (10, Purchasing) (10, Internet) (10, Recycling) Late Tagging, Late Structuring Redundant Relation Approach • How do we represent nested content as relations? • Problem: Large relation due to data redundancy!

  34. Employee Department Department Employee Project Project Union Late Tagging, Late Structuring Outer Union Approach • How do we represent nested content as relations? (Purchasing, null, Internet , 0) (Purchasing, null, Recycling, 0) (Purchasing, John, null , 1) (Purchasing, Mary, null , 1) (Purchasing, John) (Purchasing, Mary) (Purchasing, Internet) (Purchasing, Recycling) (10, Purchasing) • Problem: Wide tuples (having many columns)

  35. Late Tagging, Late Structuring Hash-based Tagger • Results not structured early • In arbitrary order • Tagger has to enforce order during tagging • Hash-based approach • Inside/Outside engine tagger • Problem: Requires memory for entire document

  36. Late Tagging, Early Structuring • Structured XML document content produced • Tagger just adds tags (constant space) Result XML Document Tagging Structured content Relational QueryProcessing

  37. A Late Tagging, Early Structuring Sorted Outer Union Approach A B n D nn n A B n n E n n B C A n C n n F n D E F G A n C n nn G Sort By: Aid, Bid, Cid • Problem: Only partial ordering required

  38. Late Tagging, Late Structuring Constant Space Tagger • Detects changes in XML document hierarchy • Adds appropriate opening/closing tags • Inside/outside engine

  39. Classification of Alternatives Early Tagging Late Tagging Inside Engine Inside Engine De-Correlated CLOB Sorted Outer Union(Tagging inside) Correlated CLOB Outside Engine Outside Engine EarlyStructuring Sorted Outer Union(Tagging outside) Stored Procedure Inside Engine Unsorted Outer Union(Tagging inside) Outside Engine LateStructuring Unsorted Outer Union(Tagging outside)

  40. Outline • Why? • How? • Language? • Mechanism? • Which? • Hence

  41. Performance Evaluation Database Size Query Depth Query Fan Out

  42. Inside vs. Outside Engine

  43. Where Does Time Go?

  44. Effect of Query Fan Out

  45. Effect of Query Depth

  46. Memory Considerations • Sorted outer union more robust • Relational sort highly scalable!

  47. Outline • Why? • How? • Language? • Mechanism? • Which? • Hence

  48. Conclusion • Publishing XML from relational sources important in Internet • Language alternatives: • SQL based • XML query language based • Implementation Alternatives • Inside engine >> Outside engine • Unsorted Outer Union : sufficient main memory • Sorted Outer Union : otherwise

More Related