1 / 36

Storing and Querying XML Documents Using Relational Databases

Storing and Querying XML Documents Using Relational Databases. Mustafa Atay matay@wayne.edu Wayne State University Detroit, MI February 28, 2006. Outline of Talk. Schema Mapping Data Mapping Query Mapping Reconstruction Conclusions. What is XML? HTML vs. XML Problem Statement

cadee
Download Presentation

Storing and Querying XML Documents Using Relational Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Storing and Querying XML Documents Using Relational Databases Mustafa Atay matay@wayne.edu Wayne State University Detroit, MI February 28, 2006

  2. Outline of Talk • Schema Mapping • Data Mapping • Query Mapping • Reconstruction • Conclusions • What is XML? • HTML vs. XML • Problem Statement • Schema-based Relational Approach Wayne State University

  3. What is XML? • eXtensible Markup Language • primarily created by Jon Bosak of Sun Microsystems • officially recommended by W3C (World Wide Web Consortium) since 1998 • a simplified form of SGML (Standard Generalized Markup Language) Wayne State University

  4. What is XML? (cont.) • a meta language • allows you to create and format your own document markups • separates content from format • a method for putting structured data into a text file; these files are • easy to read • unambiguous • extensible • platform-independent Wayne State University

  5. <html> <body> <table border=1> <tr> <td><b><font color=red>First Value</td> <td><b><font type="Arial" color=blue>Second Value</td> <tr> <td><b>87</td> <td><b>99</td> <tr> <td><b>45</td> <td><b>67</td> <tr> <td><b>86</td> <td><b>84</td> </table> </body> </html> HTML vs. XML Wayne State University

  6. <?xml version="1.0"?> <!DOCTYPE house SYSTEM "house.dtd"> <house year=“1980”> <entrance>front door</entrance> <entrance>back door</entrance> <rooms> <living_room> <window>double hung1</window> <window>double hung2</window> <leads_to>kitchen</leads_to> <leads_to>hallway</leads_to> </living_room> <kitchen> <window>double hung1</window> <leads_to>living_room</leads_to> </kitchen> </rooms> </house> HTML vs. XML (cont.) Wayne State University

  7. HTML vs. XML (cont.) • HTML - uses tags and attributes - content and formatting can be placed together - tags and attributes are pre-determined and rigid - describes what a document looks like - doesn’t allow user to define content rules • XML - uses tags and attributes - content and format are separate; formatting is contained in a stylesheet - allows user to create his/her own set of tags and attributes - describes the information in a document - allows user to define content rules (DTD) Wayne State University

  8. Why Storing and Querying XML? • XML has emerged as the standard for representing and exchanging data on the World Wide Web. • The increasing amount of XML documents requires the need to store and query XML data efficiently. Wayne State University

  9. A Sample XML Dataset • European Bioinformatics Institute Databases • ftp://ftp.ebi.ac.uk/pub/databases/interpro/ • match.xml ~ 700MB Wayne State University

  10. Approaches of Storing and Querying XML Documents • using Native XML repositories • Software AG’s Tamino • eXcelon’s XIS • using XML-enabled commercial database systems • Oracle XML DB • DB2 XML Extender • Microsoft SQLXML • using RDBMS/ORDBMS to store and query XML documents (Relational Approach) Wayne State University

  11. Why to store XML in RDBMS? • to get advantage of mature RDBMS technology in efficient storage, indexing and optimization techniques • to enable companies or researchers to store and query XML data using their existing RDBMS system • to enable processing of transformed XML data using both XML and relational queries from a middleware environment Wayne State University

  12. Relational Approach • XML-Publishing • XPERANTO - Carey et al., WebDB’00 • SilkRoute – M. Fernandez et al., WWW’00 • Schema-less approach • Edge – D. Florescu et al., IEEE DEB’99 • STORED – A. Deutsch et al., SIGMOD’99 • Schema-based approach • Basic, Shared and Hybrid inlining – J. Shanmugasundaram et al., VLDB’99 • ODTDMap – M. Atay et al., IS’06 Wayne State University

  13. Schema-based Relational Approach • Schema Mapping • XML data model is mapped into the relational model • Data Mapping • XML documents are shredded and composed into tuples to be inserted into the relational database • Query Mapping • XML queries are translated into SQL queries • Reverse Data Mapping (Reconstruction) • Original XML document is recovered from the RDBMS Wayne State University

  14. Schema Mapping • Schema mapping algorithm ODTDMap contains the following steps: • Simplifying DTDs • Creating and inlining DTD graphs • Generating relational schema and the schema mapping file. Wayne State University

  15. Sample DTD – univ.dtd <!DOCTYPE univ [ <!ELEMENT univ (colleges, schools?) > <!ATTLIST univ uName CDATA #REQUIRED> <!ELEMENT colleges (college+) > <!ELEMENT college (dep*) > <!ATTLIST college cName CDATA #REQUIRED> <!ELEMENT schools (school+) > <!ELEMENT school (dep*) > <!ATTLIST school sName CDATA #REQUIRED> <!ELEMENT dep (tel?, fax?, website?) > <!ATTLIST dep dName CDATA #REQUIRED> <!ELEMENT tel (#PCDATA) > <!ELEMENT fax (#PCDATA) > <!ELEMENT website (#PCDATA) > ]> Wayne State University

  16. Creating DTD Graph <!DOCTYPE univ [ <!ELEMENT univ (colleges, schools?) > <!ATTLIST univ uName CDATA #REQUIRED> <!ELEMENT colleges (college+) > <!ELEMENT college (dep*) > <!ATTLIST college cName CDATA #REQUIRED> <!ELEMENT schools (school+) > <!ELEMENT school (dep*) > <!ATTLIST school sName CDATA #REQUIRED> <!ELEMENT dep (tel?, fax?, website?) > <!ATTLIST dep dName CDATA #REQUIRED> <!ELEMENT tel (#PCDATA) > <!ELEMENT fax (#PCDATA) > <!ELEMENT website (#PCDATA) > ]> Wayne State University

  17. Inlining DTD Graph Wayne State University

  18. Generating Relational Schema Wayne State University

  19. Data Mapping • Challenging issues of data mapping • Should respect to schema mapping • Varying document structure • Scalability • We introduced two efficient linear algorithms • OXInsert • main memory data mapping algorithm • DOM-based • SDM • streaming data mapping algorithm • SAX-based Wayne State University

  20. Sample XML document - univ.xml <?xml version=”1.0” ?> <!DOCTYPE univ SYSTEM “univ.dtd”> <univ uname=”WSU”> <colleges> <college cname=”Science”> <dep dname=”CS”> <website>www.cs.wayne.edu</website> </dep> </college> <college cname=”Engineering”> <dep dname=”ECE”> <tel>313-5773920</tel> </dep> <dep dname=”IE”></dep> </college> <college cname=”Pharmacy”></college> </colleges> </univ> Wayne State University

  21. XMLTree for univ.xml <?xml version=”1.0” ?> <!DOCTYPE univ SYSTEM “univ.dtd”> 1 <univ uname=”WSU”> 3 <colleges> 4 <college cname=”Science”> 6 <dep dname=”CS”> 8 <website>www.cs.wayne.edu</website> </dep> </college> 9 <college cname=”Engineering”> 11 <dep dname=”ECE”> 13 <tel>313-5773920</tel> </dep> 14 <dep dname=”IE”></dep> </college> 16 <college cname=”Pharmacy”></college> </colleges> </univ> Wayne State University

  22. XMLTree for univ.xml <?xml version=”1.0” ?> <!DOCTYPE univ SYSTEM “univ.dtd”> 1<univ uname=”WSU”> 3<colleges> 4 <college cname=”Science”> 6 <dep dname=”CS”> 8 <website>www.cs.wayne.edu</website> </dep> </college> 9 <college cname=”Engineering”> 11 <dep dname=”ECE”> 13 <tel>313-5773920</tel> </dep> 14 <dep dname=”IE”></dep> </college> 15 <college cname=”Pharmacy”></college> </colleges> </univ> Wayne State University

  23. Database state after univ.xml is mapped Wayne State University

  24. Performance of OXInsert and SDM Wayne State University

  25. Data Mapping Across Different Schema Mappings Wayne State University

  26. Query Mapping • We translate simple XPath expressions to SQL • XPath is the core of XML query languages. • We identified 3 algorithms for query mapping • Naïve • Cluster • Containment Join Wayne State University

  27. Naïve • Takes an XPath expression creates a nested SQL query comprised of SQL queries for each XPath step • e.g. • XPath: /univ /colleges /college /dep[@dName=‘CS’] • SQL: Select dep.ID from dep where dep.dName=‘CS’ and dep.parentID in (Select college.ID from college where college.parentID in (Select colleges.ID from univ where colleges.parentID in (Select univ.ID from univ where univ.parentID=0) ) ) Wayne State University

  28. Cluster • A cluster is a sequence of consecutive elements stored in the same table • Takes an XPath expression and creates a nested SQL query comprised of SQL queries for each XPath cluster • e.g. • XPath: /univ /colleges/college/dep[@dName=‘CS’] • SQL: Select dep.ID from dep where dep.dName=‘CS’ and dep.parentID in (Select college.ID from college where college.parentID in (Select colleges.ID from univ) ) Wayne State University

  29. Containment Join • Relies on the well-formedness of XML documents • Requires the pre-computation of max. ID of descendants of each element instance (endID) • Facilitates efficient evaluation of recursive XML queries • e.g. • XPath: /univ /colleges /college /dep[@dName=‘CS’] • SQL: Select dep.ID from dep, college, univ where dep.dName=‘CS’ and dep.ID>=college.ID and dep.ID<=college.endID and college.parentID=univ.colleges.ID Wayne State University

  30. A Recursive Query Example • XPath: /univ //dep • Sub queries of the recursive query • /univ /colleges /college /dep • /univ /schools /school /dep • Naïve: 8 SQL queries + 6 joins + 1 union • Cluster: 6 SQL queries + 4 joins + 1 union • Containment Join: 1 SQL query + 1 join Select dep.ID from dep, univ where dep.ID>=univ.ID and dep.ID<=univ.endID Wayne State University

  31. Reconstruction • In query mapping stage, the elements selected by an XML query can be returned in one of the following two modes: • Select mode: returns IDs • Reconstruct mode: returns XML subtrees • Algorithm Reconstruct reconstructs the XML subtree rooted at a given element • The importance of Reconstruction lies in two aspects: • XML subtree reconstruction has great impact on the query response time in reconstruct mode. • It demonstrates that our mapping scheme is lossless Wayne State University

  32. Conclusions • Schema mapping [1,5] • lossless and order preserving • processing set-valued XML attributes • simple processing of recursion • Data mapping [1,3] • We described the first linear-time schema-based data mapping algorithms • We justified their effectiveness on different schema mapping algorithms Wayne State University

  33. Conclusions (cont.) • Query mapping • We identified 3 algorithms • Our CJ algorithm outperforms the only published recursive query mapping algorithm by Krishnamurthy et al., IEEE ICDE’04 • Reconstruction [2] • We introduced an efficient reconstruction algorithm • It can be used in relational schema-based mapping unlike its rivals used in XML-publishing Wayne State University

  34. Future Work • Extending the schema mapping to XML Schema • Extending the query mapping to XQuery • Introducing DTD/Schema constraints to the proposed mapping scheme • Incorporating access control methods to the proposed mapping scheme Wayne State University

  35. Acknowledgements • Dr. Shiyong Lu • Dr. Farshad Fotouhi • Artem Chebotko • Dapeng Liu • Yezhou Sun Wayne State University

  36. Publications • Mustafa Atay, Artem Chebotko, Dapeng Liu, Shiyong Lu, Farshad Fotouhi, "Efficient Schema-based XML-to-Relational Data Mapping", International Journal of Information Systems, 2006. (to appear) • Artem Chebotko, Dapeng Liu, Mustafa Atay, Shiyong Lu and Farshad Fotouhi, “Reconstructing XML Subtrees from Relational Storage of XML Documents”, in Proc. of the 2nd International Workshop on XML Schema and Data Management (XSDM’05), in conjunction with ICDE’2005, Tokyo, Japan, April, 2005 • Mustafa Atay, Yezhou Sun, Dapeng Liu, Shiyong Lu and Farshad Fotouhi, “Mapping XML Data to Relational Data: DOM-based Approach”, in Proc. of the 8th IASTED International Conference on Internet and Multimedia Systems and Applications (IMSA’2004). Kauai, Hawaii, USA. August, 2004. • Shiyong Lu, Yezhou Sun, Mustafa Atay, and Farshad Fotouhi, “On the consistency of XML DTDs”,International Journal of Data and Knowledge Engineering, 2004. • Shiyong Lu, Yezhou. Sun, Mustafa Atay, and Farshad Fotouhi, “A new inlining algorithm for mapping XML DTDs to relational schemas”, In Proc. of the First International Workshop on XML Schema and Data Management, in conjuction with the 22nd ACM International Conference on Conceptual Modeling (ER2003), Chicago, Illinois, USA, October 2003. • Shiyong Lu, Yezhou Sun, Mustafa Atay, Farshad Fotouhi, "A Sufficient and Necessary Condition for the Consistency of XML DTDs", in Proc. of  the First International Workshop on XML Schema and Data Management, in conjunction with the 22nd ACM International Conference on Conceptual Modeling (ER'2003), Chicago, Illinois, USA, October, 2003. Wayne State University

More Related