490 likes | 631 Views
XML and Databases. Ronald Bourret rpbourret@rpbourret.com http://www.rpbourret.com. Overview. Is XML a Database? Why Use XML with Databases? Data vs. Documents Storing and Retrieving Data Storing and Retrieving Documents. Is XML a Database?. Is XML a database?.
E N D
XML and Databases Ronald Bourretrpbourret@rpbourret.comhttp://www.rpbourret.com
Overview • Is XML a Database? • Why Use XML with Databases? • Data vs. Documents • Storing and Retrieving Data • Storing and Retrieving Documents
Is XML a database? • This is really two questions • Is an XML document a database? • Are XML and its surrounding technologies adatabase management system (DBMS)?
Is an XML document a database? • Yes, it is a collection of data • Pros • Self-describing • Portable (Unicode) • Can store directed graphs • Cons • Slow access • Verbose
Are XML and surrounding technologies a DBMS? • Yes, they have: • Data storage (XML documents) • Schemas (DTDs, XML Schemas, RELAX, etc.) • Query languages (XPath, XQuery, XQL, etc.) • APIs (SAX, DOM)
Are XML and surrounding technologies a DBMS? (cont.) • No, they don’t have: • Separation of logical and physical data • Efficient storage • Indexes • Transactions • Multi-user access • Security • ...
Using XML as a database • Good for small, single-user databases • .ini files • Simple address book • List of browser bookmarks • Catalog of MP3s stolen with the help of Napster • Almost useless for large or multi-user databases
Why use XML with databases? • Expose legacy data as XML • Transfer data between databases • Integrating data from a variety of sources • Store semi-structured data • Queue e-commerce messages • Manage and query large document collections
Data vs. documents • Are you storing documents or the data in them? <Address> <Street>123 Main St.</Street> <City>Chicago</City> <State>IL</State> <PostCode>60609</PostCode> <Country>USA</Country></Address>Yellow = Data White + Yellow = Document • Helps determine the system you need • Look at your XML documents to decide
Data-centric documents • Use XML primarily as a data transport • Designed for machine consumption • Sales orders, scientific data, dynamic Web pages • Characteristics • Regular structure • Fine-grained data • Little or no mixed content • Sibling order not significant
Example: Sales order <Order> <Number>1234</Number> <Customer>Gallagher Industries</Customer> <Date>29.10.00</Date> <Item Number="1"> <Part>A-10</Part> <Quantity>12</Quantity> <Price>10.95</Price> </Item> <Item Number="2"> <Part>B-43</Part> <Quantity>600</Quantity> <Price>3.99</Price> </Item></Order>
Example: Dynamic Web page <html> <head> <title>Flight Schedule: SFO to FRA</title> </head> <body> <p>Daily flights from SFO to FRA</p> <table> <tr><th>Airline</th><th>Num</th><th>Depart</th><th>Arrive</th></tr> <tr><td>Air France</td><td>527</td><td>12:00</td><td>10:33</td></tr> <tr><td>Lufthansa</td><td>459</td><td>13:55</td><td>10:05</td></tr> <tr><td>American</td><td>385</td><td>14:17</td><td>11:48</td></tr> <tr><td>Delta</td><td>99</td><td>15:30</td><td>14:02</td></tr> </table> </body> </html>
Document-centric documents • Designed for human consumption • Use XML to provide structure, metadata • Books, presentations, email, static Web pages • Characteristics • Irregular or semi-regular structure • Large-grained data • Lots of mixed content • Sibling order significant
Example: Product description <Product> <Para><Name>XML-DBMS</Name> is <Summary>middleware for transferring data between XML documents and relational databases</Summary>. It is written by <Developer>Ronald Bourret</Developer>.</Para> <Para>XML-DBMS uses an object-relational mapping in which complex element types are viewed as classes and simple element types, PCDATA, and attributes, as well as references to complex types, are viewed as properties.</Para> <Para>You can: <List> <Item><Link URL="Readme.htm">Read more about XML-DBMS</Link></Item> <Item><Link URL="jxmldbms.zip">Download Java version</Link></Item> <Item><Link URL="pxmldbms.zip">Download PERL version</Link></Item> </List> </Para> </Product>
Storing data and documents • Store data in traditional database • Use a native XML database under certain conditions • Store documents in native XML database • Use a traditional database under certain conditions • Boundary between data and documents not always clear in practice
Goals and non-goals • Goals • Preserve data and hierarchical order • Optionally preserve sibling order • One- or two-way data transfer • Non-goals • Preserve physical structure (entity use, encodings, ...) • Preserve DTD, comments, processing instructions... • Preserve document identity
Data transfer software • May be middleware or integrated into DBMS • If integrated, DBMS is said to be XML-enabled
Mapping data inXML documents to databases • Most common mapping strategies • Template-driven • Model-driven • No mapping needed for native XML databases
Template-driven mappings • Commands embedded in template • Extremely flexible • Retrieve data with SQL or other query language • Place values almost anywhere in document • Parameterize subsequent SQL statements • Programming constructs such as if-then-else and for • Transfer from database to XML only
Example: Template <?xml version="1.0"?> <FlightInfo> <Intro>The following flights have available seats:</Intro> <SelectStmt>SELECT Airline, FltNumber, Depart, Arrive FROM Flights</SelectStmt> <Conclude>We hope one of these meets your needs.</Conclude> </FlightInfo>
Example: Output <?xml version="1.0"?> <FlightInfo> <Intro>The following flights have available seats:</Intro> <Flights> <Row> <Airline>ACME</Airline> <FltNumber>123</FltNumber> <Depart>Dec 12, 1998 13:43</Depart> <Arrive>Dec 13, 1998 01:21</Arrive> </Row> ... </Flights> <Conclude>We hope one of these meets your needs.</Conclude> </FlightInfo>
Model-driven mappings • Two mappings are common • Table-based • Object-relational • Data transferred according to model • Two-way data transfer • Simpler than templates, but less flexible • Often used with XSLT
Table-based mapping • Map document with “table” structure to RDBMS <database> <table1> <row> <column1>value 1</column1> <column2>value 2</column2> ... </row> ... </table1> <table2> ... </table2> ... </database> Table1 Column1 Column2 ... Table2 Column1 Column2 ...
Pros and cons • Pros • Easy to understand • Code is simple and fast • Useful for serializing databases • Cons • Only works on a small subset of XML documents
Object-relational mapping • Map XML document to objects... Order Customer Item Part <Order SONumber="12345"> <Customer CustNumber="543"> ... </Customer> <OrderDate>150999</OrderDate> <Item LineNumber="1"> <Part Name="Cherries"> ... </Part> <Qty Unit="ton">2</Qty> </Item> </Order>
Object-relational mapping (cont.) • ... and objects to tables Orders Number Customer ... Items OrderNumber ItemNumber Part ... Customers ... Parts ... Order Customer Item Part
Objects are data-specific... • Different for each DTD (schema) • Model the content (data) of the document Order Customer Item Part <Order SONumber="12345"> <Customer CustNumber="543"> ... </Customer> <OrderDate>150999</OrderDate> <Item LineNumber="1"> <Part Name="Cherries"> ... </Part> <Qty Unit="ton">2</Qty> </Item> </Order>
... not the DOM • Same for all XML documents • Model the structure of the document Element Attr (Order) (SONumber) Element Element Element (Customer) (OrderDate) (Item) ... ... ... <Order SONumber="12345"> <Customer CustNumber="543"> ... </Customer> <OrderDate>150999</OrderDate> <Item LineNumber="1"> <Part Name="Cherries"> ... </Part> <Qty Unit="ton">2</Qty> </Item> </Order>
Pros and cons • Pros • Can handle any XML document • Maps well to existing data structures • Cons • Very inefficient for mixed content
Data transfer issues • Data types • All XML data is string • Conversion problems due to many formats • Null data • Equivalent to missing element or attribute
Data transfer issues (cont.) • Binary data • No standard way to store in XML • Commonly stored as unparsed entities or Base64 • Character sets • XML can use any encoding, including Unicode • Databases often require single encoding • Unicode is inefficient to store
Storing data in anative XML database • Data stored in XML (document) format • Pros • Handles semi-structured data efficiently • Fast retrieving whole documents • Support for XML query languages, XLinks, etc.
Storing data in anative XML database (cont.) • Cons • Slow retrieving views outside of document hierarchy • No referential integrity • Data not accessible by non-XML applications
Goals • Preserve entire document • Data: elements, attributes, PCDATA • Logical structure: element hierarchy, sibling order • Physical structure: entities, CDATA, encoding... • Other: DTD, comments, processing instructions... • Preserve document identity
Storing documents as BLOBs • Pros • Exploits existing capabilities: transactions, security... • Many databases have text search tools • Cons • Text-based searches of XML unreliable
Indexing XML BLOBswith “side tables” • Consider the following DTD <!ELEMENT Brochure (Title, Author, Content)><!ELEMENT Title (#PCDATA)><!ELEMENT Author (#PCDATA)> <!-- To be indexed --><!ELEMENT Content (%Inline;)> <!-- Inline entity from XHTML --> • Store complete documents in one table Brochures---------BrochureID INTEGER <--------- Index brochure IDsBrochure LONGVARCHAR <--------- Complete XML documents
Indexing XML BLOBswith “side tables” (cont.) • Store elements to be indexed in separate table Authors----------------------Author VARCHAR(50) <--------- Index authorsBrochureID INTEGER • Search index table and join to document table SELECT Brochure FROM Brochures WHERE BrochureID IN (SELECT BrochureID FROM Authors WHERE Author='Chen')
Storing documents innative XML databases • Store whole XML documents in “native” form • Define a (logical) model for an XML document • Minimal model is elements, attributes, PCDATA, and document order • Store and retrieve documents according to that model • Have normal database features • Query language, indexes, transactions, security, etc.
Implementation strategies for native XML databases • Text-based • Store documents as text • Proprietary or file-system storage • Model-based • Store pre-parsed documents according to model • Relational, object-oriented, hierarchical, or proprietary storage
Persistent DOMs (PDOMs) • Implement DOM over persistent storage • Returned DOM tree is “live” • Used by DOM applications that process very large XML documents • Database is usually local
Content management systems • Manage document fragments (content) • Hide database from user • Maintain versions, document metadata • Include editors, publishing systems, etc. • Extensible through scripting or programming
Resources • Ronald Bourret’s Papers Page • http://www.rpbourret.com/xml/index.htm • XML:DB.org’s Resources Page • http://www.xmldb.org/resources.html • XML:DB Mailing List • http://www.xmldb.org/projects.html
Questions? Ronald Bourretrpbourret@rpbourret.comhttp://www.rpbourret.com