1 / 75

XML In An RDBMS World

XML In An RDBMS World. Michael D. Thomas mdthomas@ibiblio.org. Data, data, data. In general, computing is never far from data XML, RDBMSes deal with data – play in the same very large playground Extensive overlap between RDBMSes and XML

mada
Download Presentation

XML In An RDBMS World

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML In An RDBMS World Michael D. Thomas mdthomas@ibiblio.org

  2. Data, data, data • In general, computing is never far from data • XML, RDBMSes deal with data – play in the same very large playground • Extensive overlap between RDBMSes and XML • Goal: Learn how to choose between RDBMS vs. XML, how to integrate the two technologies

  3. First, some pragmatism… • “When all you have is a hammer, everything looks like a nail.” • What if all you have is a hammer? • “When all you have is a hammer, everything must look like a nail.” • Best to have XML and RDBMS in your toolset • Architectural purity is often impossible • But we should know when & what we are compromising

  4. Tonight’s Agenda • Compare XML and SQL RDBMS Technologies • Understand Their Sweet Spots • Examine Their Use Through Studies of Anti-Patterns and Patterns Of Usage • Discuss XQuery and XML Storage In The RDBMS

  5. Topics • Defining The Playground – XML vs. RDBMS • Anti-Pattern: XML As Transactional Data Store • Data Stores & The ACID Test • Anti-Pattern: Document Storage & The RDBMS • XQuery & RDBMS Doc Storage • Anti-Pattern: XML Declarative Languages & Gratuitous Encapsulation Of SQL • Understanding Declarative Languages, Such As SQL • Anti-Pattern: XML Meta-data & The Overly Abstract Database • Managing The Split Between XML-based & RDBMS-based Data Models In The Same Application

  6. XML As Database Anti-Pattern • “I’ll use an XML file as a database” • Advantages: Simple, Cheap • An XML Schema defines the metadata of the database • Data is right on the filesystem, not “hidden” in a complicated SQL RDBMS

  7. XML As Database Problems • Updates are hard • Have to re-invent concurrency handling – multiple users changing data at the same time • No constraint checking • No optimization, such as indexes, caching, etc.

  8. Organization of Data • Been formalizing the storage information for a long time • Age of computing called for new approaches • Relational model highly successful • XML is relatively new • Good for loosely structured data – documents -- and data transmission

  9. Transmission & Interoperability • Electronic transmission is newer than electronic storage • With the growth of the Internet, transmission of data is exploding • No Interoperability problems in 1950 – lots of interoperability problems now • XML is an important standard for transmission and interoperability

  10. Vocabulary • Datastore – Anything that stores data • SQL Relational Database – A database that organizes info in tables and adheres to relational theory • XML – 1) eXtensible Markup Language 2) The XML standard 3) All or part of an XML document

  11. Vocabulary • XML Database – A database specialized for the storage of XML documents • Object Database – A database that stores objects • Object-Relational Database – A relational database with an extensible type system

  12. Data Centric vs. Document Centric • Data Centric approach – the datastore is focused on handling highly structured, fine grained data. Favors the relational model • Document centric approach – the datastore is focused on handling semi-structured data, such as web pages, books, etc. Favors XML

  13. Data Centric vs. Document Centric • Best queries are largely a result of structuring the data well (Messy desk vs. organized file cabinet) • Documents are semi-structured data with ad hoc structures • The overhead of defining rigorous structure for each type of document increases the overall cost of management • (Would still give you the best queries)

  14. Storage vs. Transmission • Data Storage and Data Transmission are two different concepts • XML is very strong for interoperable transmission • Can store by writing XML to a file • You can ‘transmit’ relational data by exporting a few tables and ftping, but isn’t a strong solution

  15. Datastore Application Stack

  16. XML/Relational Comparison

  17. Application With XML Transmissions

  18. Three Data Models • Persistent Model – how data is stored • Active Model (Object Model) – how data is arranged when it is being manipulated by a program, usually written in an imperative language • Presentation/Transmission Model – how data is transmitted, usually as XML

  19. Datastore Basics • Any Datastore must tackle the following issues: • Concurrency • Transactions – the ACID test • Locking • Joins • Normalization • Administration Issues

  20. Concurrency • Datastores support concurrency if multiple users can access the same datastore at the same time • Datastores must not allow the same data to be modified at the same time

  21. Transactions • Atomicity – No matter how complex, a transaction is atomic and indivisible. Transactions are “all or nothing.” • Consistency – Transactions must leave the database in a consistent state, i.e., consistent with the rules • Isolation – Transaction is isolated from other transactions • Durability – The effects of a transaction persist

  22. Isolation levels • TRANSACTION_READ_UNCOMMITTED – Dirty reads, non-repeatable reads, phantom reads • TRANSACTION_READ_COMMITTED – Non-repeatable reads, phantom reads • TRANSACTION_REPEATABLE_READ – phantom reads • TRANSACTION_SERIALIZABLE

  23. Locking • Data must be locked for transactions to be isolated • Locking is both a datastore and an application concern • How much extraneous data is locked? (Page level locking, document level locking) • Pessimistic locking: prevents reading of locked data • Optimistic locking: generates an error when inappropriate data updates are attempted

  24. Joins • A Join joins the data between two different data entities • E.g., SELECT * from emp, dept WHERE emp.deptno = dept.deptno • Joins are the cornerstone of SQL • XPath doesn’t do joins between XML documents! • XQuery, others can

  25. Normalization • Normalization – organizing data to minimize redundancy • Normalized data is easier to maintain and easier to understand conceptually • In relational db design, normalized DBs need to be denormalized for performance reasons • XML docs can also be normalized, but not as much support for tying elements together • Normalization is important when designing, less crucial at implementation

  26. Administrative Issues • Includes backup & recovery, installation, upgrades, optimization, maintenance, etc. • In general: • Bigger is better (economies of scale, 24x7 support) • More popular, more standard is better (law of increasing returns) • Whatever is already working in your organization is better (no need to hire, re-train administrators) • Existing DB vendors have a huge advantage

  27. Datastores • Different types: • Relational Database – highly structured data such as account balances, inventory quantities, etc.) • Document Database – used to store documents, probably in XML format • The same DB can serve as both, e.g., Oracle

  28. Overview Of Relational Databases • The Relational Model • SQL • Entity Relationship Diagrams

  29. Relational Model -- structure • Data is grouped in tables • Tables have columns and rows • Columns are fairly fixed – the set of columns shouldn’t change much (if at all) over the life of a table • A table can have any number of rows • Rows change constantly

  30. Relational model – primary key • In general, a table should have one or more columns defined as the primary key • The primary key is unique and non-null • Usually only one column • Can consist of more than one column (composite primary key)

  31. Relational model – foreign key • A foreign key describes a relationship between two tables • The foreign key column of tableA points to a column in tableB • TableB is said to be the parent of tableA • Often, the foreign key points at a primary key

  32. SQL Three types: • Query SELECT * FROM emp WHERE deptno=10 ORDER BY ename • Data Modification Language (DML): Update, Insert • Data Definition Language (DDL): Create Table

  33. Joins • A Join is a Cartesian Cross-Product: SELECT count(*) FROM emp; SELECT count(*) FROM emp; SELECT count(*) FROM emp, dept; SELECT ename, emp.deptno, dept.deptno, dname FROM emp, dept; SELECT ename, emp.deptno, dept.deptno, dname FROM emp, dept WHERE emp.deptno = dept.deptno;

  34. Joins SELECT ename, dname FROM emp, dept WHERE emp.deptno = dept.deptno

  35. Entity Relationship Diagrams • Used to create a map of your data • Describes tables, attributes of tables and relationships between tables • Come at it from two directions: lay out the entities, assign the attributes; group the attributes into entities

  36. XML vs. SQL Tables • XML – order matters! (Rows in a relational table are unordered) • XML is a tree structure • XML documents tend to be semi-structured, SQL tables are highly structured • SQL tables aren’t as flexible or inter-changable • XML joins aren’t straightforward • An XML document/element doesn’t serve multiple purposes as well as DB schema/table

  37. Anti-Pattern: XML Document Storage As A SQL BLOB • Need to store XML in a database • Better than storing in a filesystem • Make a BLOB (Binary Large Object) and store a document as an element in a row of a table • Problem: can’t query contents of document without extracting it from the db! • For simple queries, might have to extract all of the documents into the application – very inefficient

  38. Anti-Pattern: One-off SQL Schema For A Particular XML Document • BLOB storage is bad, so why not “shred” the document across multiple DB tables? • I.e., for XML elements named “Dept” make a “Dept” table, “Emp” elements make a “Emp” table, describe the hierarchy with foreign key constraints • Query performance is much, much better – probably better than XPath against XML as a file

  39. Problems • A lot of work! Have to do this for every XML schema • Hard! XML schemas are inherently more flexible than SQL schemas. Some mappings can be difficult • Negates flexibility of XML • Not as learnable – Programmers have to learn to query your SQL schema, not just XPath and XML Schema

  40. Anti-Pattern: Developing A General “Shredding” Solution For All Of XML • Your Application has several XML schemas, and it’s time-consuming to develop “shredding” for all of them • So, you try to do a more generalized shredding • Is possible, but is a very horizontal problem. You probably won’t get the time to solve it completely. • Still might present learnability problems

  41. XML Datastore Architectures • XML Views Of Relational Data • Relational Wrappers Of XML • Independent Storage of XML Documents (Native XML Database) • Text Storage Of XML In RDBMS • XML Shredding Across Relational Tables • Storing XML as Objects in Object- Relational DBs (Oracle XMLType) • Everything is XML (XQuery approach)

  42. Vocabulary: XML Collections • XML Collection is a collection of XML documents • “A row is to a table as an XML document is to an XML collection.”

  43. XML Derived From RDBMS • Data exists naturally as relational data • Needs to be represented as an XML document for some reason • The derived XML is usually used for transmission

  44. Independent XML Store • A specialized database is used to store XML documents • Typically, an application will either have two datastores – relational and XML – or relational data will be stored as XML • In the dual datastore case, the application code has to join the different data sets

  45. Text Storage Of XML In Databases • XML is stored in a column of a relational table • Allows you to mix structured and semi-structured approaches • However, hard to query against the XML directly

  46. XML Shredding Across Tables • An XML document is stored across many tables • Is possible to use SQL queries against the document’s parts • Vendors can implement an XPath-to-SQL translator • Can structure the tables based on a schema • A pain to handle yourself

  47. Exercise • Shred an XML schema across relational tables

  48. XMLType • Object-Relational databases allow you to define your own types • You could define an Address object, define functions for the object, and store instances of the object in a column in the database • Oracle defines an object-relational type, XMLType, for the storage of XML documents in the database • SELECT e.poDoc.getClobval() AS poXML FROM po_xml_tab e WHERE e.poDoc.existsNode('/PO[PNAME = "po_2"]') = 1; • Shredding is managed & encapsulated for you

  49. Oracle 9i: XMLType, Text

  50. Everything Is XML (XQuery) • With XMLType, Oracle says that everything fits in to the Object-Relational realm • XQuery says that everything can be represented and queried as XML • Relational data is derived from RDBMS using SQL/XML

More Related