1 / 48

XML and Databases

XML and Databases. Ronald Bourret rpbourret@rpbourret.com http://www.rpbourret.com. Overview. Is XML a Database? Why Use XML with Databases? Data vs. Documents Storing and Retrieving Data Storing and Retrieving Documents. Is XML a Database?. Is XML a database?.

Download Presentation

XML and Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML and Databases Ronald Bourretrpbourret@rpbourret.comhttp://www.rpbourret.com

  2. Overview • Is XML a Database? • Why Use XML with Databases? • Data vs. Documents • Storing and Retrieving Data • Storing and Retrieving Documents

  3. Is XML a Database?

  4. Is XML a database? • This is really two questions • Is an XML document a database? • Are XML and its surrounding technologies adatabase management system (DBMS)?

  5. Is an XML document a database? • Yes, it is a collection of data • Pros • Self-describing • Portable (Unicode) • Can store directed graphs • Cons • Slow access • Verbose

  6. Are XML and surrounding technologies a DBMS? • Yes, they have: • Data storage (XML documents) • Schemas (DTDs, XML Schemas, RELAX, etc.) • Query languages (XPath, XQuery, XQL, etc.) • APIs (SAX, DOM)

  7. Are XML and surrounding technologies a DBMS? (cont.) • No, they don’t have: • Separation of logical and physical data • Efficient storage • Indexes • Transactions • Multi-user access • Security • ...

  8. Using XML as a database • Good for small, single-user databases • .ini files • Simple address book • List of browser bookmarks • Catalog of MP3s stolen with the help of Napster • Almost useless for large or multi-user databases

  9. Why Use XML with Databases?

  10. Why use XML with databases? • Expose legacy data as XML • Transfer data between databases • Integrating data from a variety of sources • Store semi-structured data • Queue e-commerce messages • Manage and query large document collections

  11. Data vs. Documents

  12. Data vs. documents • Are you storing documents or the data in them? <Address> <Street>123 Main St.</Street> <City>Chicago</City> <State>IL</State> <PostCode>60609</PostCode> <Country>USA</Country></Address>Yellow = Data White + Yellow = Document • Helps determine the system you need • Look at your XML documents to decide

  13. Data-centric documents • Use XML primarily as a data transport • Designed for machine consumption • Sales orders, scientific data, dynamic Web pages • Characteristics • Regular structure • Fine-grained data • Little or no mixed content • Sibling order not significant

  14. Example: Sales order <Order> <Number>1234</Number> <Customer>Gallagher Industries</Customer> <Date>29.10.00</Date> <Item Number="1"> <Part>A-10</Part> <Quantity>12</Quantity> <Price>10.95</Price> </Item> <Item Number="2"> <Part>B-43</Part> <Quantity>600</Quantity> <Price>3.99</Price> </Item></Order>

  15. Example: Dynamic Web page <html> <head> <title>Flight Schedule: SFO to FRA</title> </head> <body> <p>Daily flights from SFO to FRA</p> <table> <tr><th>Airline</th><th>Num</th><th>Depart</th><th>Arrive</th></tr> <tr><td>Air France</td><td>527</td><td>12:00</td><td>10:33</td></tr> <tr><td>Lufthansa</td><td>459</td><td>13:55</td><td>10:05</td></tr> <tr><td>American</td><td>385</td><td>14:17</td><td>11:48</td></tr> <tr><td>Delta</td><td>99</td><td>15:30</td><td>14:02</td></tr> </table> </body> </html>

  16. Document-centric documents • Designed for human consumption • Use XML to provide structure, metadata • Books, presentations, email, static Web pages • Characteristics • Irregular or semi-regular structure • Large-grained data • Lots of mixed content • Sibling order significant

  17. Example: Product description <Product> <Para><Name>XML-DBMS</Name> is <Summary>middleware for transferring data between XML documents and relational databases</Summary>. It is written by <Developer>Ronald Bourret</Developer>.</Para> <Para>XML-DBMS uses an object-relational mapping in which complex element types are viewed as classes and simple element types, PCDATA, and attributes, as well as references to complex types, are viewed as properties.</Para> <Para>You can: <List> <Item><Link URL="Readme.htm">Read more about XML-DBMS</Link></Item> <Item><Link URL="jxmldbms.zip">Download Java version</Link></Item> <Item><Link URL="pxmldbms.zip">Download PERL version</Link></Item> </List> </Para> </Product>

  18. Storing data and documents • Store data in traditional database • Use a native XML database under certain conditions • Store documents in native XML database • Use a traditional database under certain conditions • Boundary between data and documents not always clear in practice

  19. Storing andRetrieving Data

  20. Goals and non-goals • Goals • Preserve data and hierarchical order • Optionally preserve sibling order • One- or two-way data transfer • Non-goals • Preserve physical structure (entity use, encodings, ...) • Preserve DTD, comments, processing instructions... • Preserve document identity

  21. Data transfer software • May be middleware or integrated into DBMS • If integrated, DBMS is said to be XML-enabled

  22. Mapping data inXML documents to databases • Most common mapping strategies • Template-driven • Model-driven • No mapping needed for native XML databases

  23. Template-driven mappings • Commands embedded in template • Extremely flexible • Retrieve data with SQL or other query language • Place values almost anywhere in document • Parameterize subsequent SQL statements • Programming constructs such as if-then-else and for • Transfer from database to XML only

  24. Example: Template <?xml version="1.0"?> <FlightInfo> <Intro>The following flights have available seats:</Intro> <SelectStmt>SELECT Airline, FltNumber, Depart, Arrive FROM Flights</SelectStmt> <Conclude>We hope one of these meets your needs.</Conclude> </FlightInfo>

  25. Example: Output <?xml version="1.0"?> <FlightInfo> <Intro>The following flights have available seats:</Intro> <Flights> <Row> <Airline>ACME</Airline> <FltNumber>123</FltNumber> <Depart>Dec 12, 1998 13:43</Depart> <Arrive>Dec 13, 1998 01:21</Arrive> </Row> ... </Flights> <Conclude>We hope one of these meets your needs.</Conclude> </FlightInfo>

  26. Model-driven mappings • Two mappings are common • Table-based • Object-relational • Data transferred according to model • Two-way data transfer • Simpler than templates, but less flexible • Often used with XSLT

  27. Table-based mapping • Map document with “table” structure to RDBMS <database> <table1> <row> <column1>value 1</column1> <column2>value 2</column2> ... </row> ... </table1> <table2> ... </table2> ... </database> Table1 Column1 Column2 ... Table2 Column1 Column2 ...

  28. Pros and cons • Pros • Easy to understand • Code is simple and fast • Useful for serializing databases • Cons • Only works on a small subset of XML documents

  29. Object-relational mapping • Map XML document to objects... Order Customer Item Part <Order SONumber="12345"> <Customer CustNumber="543"> ... </Customer> <OrderDate>150999</OrderDate> <Item LineNumber="1"> <Part Name="Cherries"> ... </Part> <Qty Unit="ton">2</Qty> </Item> </Order>

  30. Object-relational mapping (cont.) • ... and objects to tables Orders Number Customer ... Items OrderNumber ItemNumber Part ... Customers ... Parts ... Order Customer Item Part

  31. Objects are data-specific... • Different for each DTD (schema) • Model the content (data) of the document Order Customer Item Part <Order SONumber="12345"> <Customer CustNumber="543"> ... </Customer> <OrderDate>150999</OrderDate> <Item LineNumber="1"> <Part Name="Cherries"> ... </Part> <Qty Unit="ton">2</Qty> </Item> </Order>

  32. ... not the DOM • Same for all XML documents • Model the structure of the document Element Attr (Order) (SONumber) Element Element Element (Customer) (OrderDate) (Item) ... ... ... <Order SONumber="12345"> <Customer CustNumber="543"> ... </Customer> <OrderDate>150999</OrderDate> <Item LineNumber="1"> <Part Name="Cherries"> ... </Part> <Qty Unit="ton">2</Qty> </Item> </Order>

  33. Pros and cons • Pros • Can handle any XML document • Maps well to existing data structures • Cons • Very inefficient for mixed content

  34. Data transfer issues • Data types • All XML data is string • Conversion problems due to many formats • Null data • Equivalent to missing element or attribute

  35. Data transfer issues (cont.) • Binary data • No standard way to store in XML • Commonly stored as unparsed entities or Base64 • Character sets • XML can use any encoding, including Unicode • Databases often require single encoding • Unicode is inefficient to store

  36. Storing data in anative XML database • Data stored in XML (document) format • Pros • Handles semi-structured data efficiently • Fast retrieving whole documents • Support for XML query languages, XLinks, etc.

  37. Storing data in anative XML database (cont.) • Cons • Slow retrieving views outside of document hierarchy • No referential integrity • Data not accessible by non-XML applications

  38. Storing and Retrieving Documents

  39. Goals • Preserve entire document • Data: elements, attributes, PCDATA • Logical structure: element hierarchy, sibling order • Physical structure: entities, CDATA, encoding... • Other: DTD, comments, processing instructions... • Preserve document identity

  40. Storing documents as BLOBs • Pros • Exploits existing capabilities: transactions, security... • Many databases have text search tools • Cons • Text-based searches of XML unreliable

  41. Indexing XML BLOBswith “side tables” • Consider the following DTD <!ELEMENT Brochure (Title, Author, Content)><!ELEMENT Title (#PCDATA)><!ELEMENT Author (#PCDATA)> <!-- To be indexed --><!ELEMENT Content (%Inline;)> <!-- Inline entity from XHTML --> • Store complete documents in one table Brochures---------BrochureID INTEGER <--------- Index brochure IDsBrochure LONGVARCHAR <--------- Complete XML documents

  42. Indexing XML BLOBswith “side tables” (cont.) • Store elements to be indexed in separate table Authors----------------------Author VARCHAR(50) <--------- Index authorsBrochureID INTEGER • Search index table and join to document table SELECT Brochure FROM Brochures WHERE BrochureID IN (SELECT BrochureID FROM Authors WHERE Author='Chen')

  43. Storing documents innative XML databases • Store whole XML documents in “native” form • Define a (logical) model for an XML document • Minimal model is elements, attributes, PCDATA, and document order • Store and retrieve documents according to that model • Have normal database features • Query language, indexes, transactions, security, etc.

  44. Implementation strategies for native XML databases • Text-based • Store documents as text • Proprietary or file-system storage • Model-based • Store pre-parsed documents according to model • Relational, object-oriented, hierarchical, or proprietary storage

  45. Persistent DOMs (PDOMs) • Implement DOM over persistent storage • Returned DOM tree is “live” • Used by DOM applications that process very large XML documents • Database is usually local

  46. Content management systems • Manage document fragments (content) • Hide database from user • Maintain versions, document metadata • Include editors, publishing systems, etc. • Extensible through scripting or programming

  47. Resources • Ronald Bourret’s Papers Page • http://www.rpbourret.com/xml/index.htm • XML:DB.org’s Resources Page • http://www.xmldb.org/resources.html • XML:DB Mailing List • http://www.xmldb.org/projects.html

  48. Questions? Ronald Bourretrpbourret@rpbourret.comhttp://www.rpbourret.com

More Related