300 likes | 448 Views
XML Schemas in Oracle XML DB. Ravi Murthy Sandeepan Banerjee Oracle Corporation. Talk Overview. Why XML in DB ? Background XML Schema Object-Relational DB XML Schema Support in Oracle XML DB XML Storage XML Query XML Update Future directions and Conclusions. Why XML in DB ?.
E N D
XML Schemas in Oracle XML DB Ravi Murthy Sandeepan Banerjee Oracle Corporation
Talk Overview • Why XML in DB ? • Background • XML Schema • Object-Relational DB • XML Schema Support in Oracle XML DB • XML Storage • XML Query • XML Update • Future directions and Conclusions
Why XML in DB ? • Trends in industry • More XML content being generated • More applications dealing with (structured) data and (semi-structured) documents • Need for unified management for all kinds of data • XML Schema language provides strong typing • Many benefits to storing XML in DB • Better Queriability • Optimized Updates • Stronger Validation • Fidelity of XML very important
Oracle XML DB Overview • Native XML data type • Supports W3C XML Schema data model • XML/SQL Duality • Support for XML standards • Namespaces, XPath, XSLT, SQL/XML • High performance XML repository • Hierarchical File System Abstraction • Protocols : FTP, HTTP/WebDAV • Access Control (ACL) • Versioning
XML Schema • W3C Recommendation • Large number of built-in scalar data types • Simple type definitions • Length, pattern and other constraints • Complex type definitions • sequence / choice / all • Mixed content • Extensible • Derivation by extension and restriction • Substitution Groups • Wildcards • Gradually replacing the traditional DTD • Commonly used as a validation mechanism
Object Relational DB Basics • Object types • Collection types • Object References • LOBs
XMLType • Native data type for XML • Used to define columns of tables and views, arguments to stored procedures, etc. • XML specific methods and operators for • Querying and extracting XML using XPath • Transforming XML using XSLT • Validating XML using XML Schema • Multiple Storage Options • Unstructured Storage in CLOB • Structured Storage into object-relational rows and columns • Hybrid Storage • Maintains application transparency to physical storage choice
XML DB and XML Schema • XML Schema controls all aspects of processing • Storage mappings • In-memory representations • Language Bindings • XML Schema Registration Process • Associates XML Schema with URL • Generates Object types • Creates default tables • XMLType column can be constrained to a global element of registered schema
XML Schema Example <schema targetNamespace=“http://www.oracle.com/PO.xsd” xmlns:po=“http://www.oracle.com/PO.xsd” elementFormDefault=”qualified” xmlns="http://www.w3.org/2001/XMLSchema"> <complexType name="PurchaseOrderType"> <sequence> <element name="PONum" type="decimal"/> <element name="Company"> <simpleType> <restriction base="string"> <maxLength value="100"/> </restriction> </simpleType> </element>
XML Schema Example (contd) <element name="Item" maxOccurs="1000"> <complexType> <sequence> <element name="Part"> <simpleType> <restriction base="string"> <maxLength value="1000"/> </restriction> </simpleType> </element> <element name="Price" type="float"/> </sequence> </complexType> </element> </sequence> </complexType> <element name="PurchaseOrder" type="po:PurchaseOrderType"/> </schema>
Generated Object Types TYPE "Item_T" (part varchar2(1000), price number); TYPE "Item_COLL" AS VARRAY(1000) OF "Item_T"; TYPE "PurchaseOrderType_T" (ponum number, company varchar2(100), item Item_COLL);
Generated Tables TABLE po_tab OF XMLTYPE XMLSCHEMA “http://www.oracle.com/PO.xsd" ELEMENT "PurchaseOrder“ VARRAY(item) STORE AS item_tab;
XML Document Example <PurchaseOrder xmlns="http://www.oracle.com/PO.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.oracle.com/PO.xsd http://www.oracle.com/PO.xsd"> <PONum>1001</PONum> <Company>Oracle Corp</Company> <Item> <Part>9i Doc Set</Part> <Price>2550</Price> </Item> <Item> <Part>8i Doc Set</Part> <Price>350</Price> </Item> </PurchaseOrder>
Structured Storage • Attributes and single-valued elements • Stored as columns in single row • SQL data types correspond to XML Schema types • SQL constraints correspond to XML Schema constraints • Multi-valued elements (collections) stored in separate nested tables • One row per item in collection • Nested table row stores parent key • Array Index column stores the position information • Number column – uses full range of floating points • Elements inserted in the middle of the collection get (previous_array_index + next_array_index)/2 • Supports multiple levels of nesting • Embedded object types • Embedded collection types with multiple nested tables
DOM Fidelity • Structured storage guarantees DOM fidelity • No whitespace fidelity • System binary attribute in object types • SYS_XDBPD$ • PD attribute stores non-relational information • Ordering of elements • Comments • Processing Instructions • Namespace declarations • Prefix information • Mixed content – text nodes that are intermixed with elements are stored in the system column
XDB Attributes in XML Schema • Mapping details captured as new attributes within the XML Schema • Oracle attributes use namespace : http://xmlns.oracle.com/xdb • Input XML Schema does not need to be annotated • Default assumptions for all XDB attributes • Input XML Schema can explicitly specify XDB attributes • SQLName : Name of column or type attribute • SQLType : Name of object type • SQLCollType : Name of collection type • MaintainOrder : Permits turning off order maintenance
Hybrid Storage of XML • Two ends of storage spectrum • Store entire content in a single LOB • Full shredding of XML data • Hybrid Storage Options • Combination of shredding and LOB storage • xdb:SQLType=“CLOB” applied to <complexType> • CLOB storage ideal for fragments that are not intended for query or partial updates • Example : XHTML content embedded within resource descriptors (metadata)
Querying XMLType • XPath based operators • existsNode • Boolean operator • Checks for existence of node identified by XPath • extract • Extracts a fragment identified by XPath • extractValue • Retrieves the raw value of leaf node identified by XPath • Namespace Aware • ANSI SQL/XML Standards effort
Query Rewrite • Automatic rewrite of XPaths during query compilation • Rewritten query directly accesses underlying relational columns • Introduces joins with nested tables • Enables use of indexes
Query Rewrite - Example • Original Query SELECT extractValue(value(p), '/PurchaseOrder/PONum') FROM po_tab p WHERE existsNode(value(p), '/PurchaseOrder[Company=Oracle]') = 1; • Rewritten Query SELECT p.ponum FROM po_tab p WHERE p.company = 'Oracle';
Updating XMLType • Updating entire XML document • Partial Update • Uses UpdateXML() operator • XPath identifies element or attribute to be updated • New value for the updated node is specified • Rewritten to directly update underlying column(s) • Similar mechanisms for • Inserting new nodes • Deleting node(s)
Update Rewrite - Example • Original Statement UPDATE po_tab p SET value(p) = updatexml(value(p), '/PurchaseOrder/PONum/text()', 9999) WHERE existsNode(value(p), ‘/PurchaseOrder[Company=Oracle]’) = 1; • Rewritten Statement UPDATE po_tab p SET p.ponum = 9999 WHERE p.Company = ‘Oracle’;
XMLType Views • Provide an XML view of relational data • Good evolutionary strategy • New XML apps on XML abstraction of existing data • SQL/XML Standard operators used to generate XML • Views can generate schema based XML • Insert / Update / Delete operations via ‘Instead of’ Triggers • Queries over XML views are rewritten to directly access underlying relational columns
XMLType View - Example CREATE VIEW po_view of XMLTYPE XMLSCHEMA "po.xsd" ELEMENT "PurchaseOrder" AS SELECT XMLElement("PurchaseOrder", XMLForest(p.ponum "PONum", p.company "Company"), (SELECT XMLAGG( XMLElement("Item", XMLForest(i.part "Part", i.price "Price")) FROM items_rel_tab i WHERE i.po_id = p.id)) FROM po_rel_tab p;
Complex XML Schemas • Cyclic Definitions • Object types created with references • Table row contains reference to other rows stored in same or different table • Keys to document and parent rows stored in nested rows • Complex type derivation • Mapped to object type inheritance • Wildcards • Mapped to CLOB attributes
Indexing XMLType • Multiple index types • B-Tree and bitmap indexes • Function-based indexes • Create index on specific XPath expressions • Text indexes • Inverted lists provide section-based search • Also support keyword based search within textual content
Future Directions • Optimize XML Query support • Translate XML Query to SQL • Works against distributed databases using Oracle connectivity solutions • Optimize support for highly variable documents • New storage and indexing techniques
Conclusions • Oracle XML DB is a robust platform for building XML applications • Strong support for XML Schema based storage, query, indexing and updates • Structured storage maintains XML fidelity • Support for standards such as SQL/XML, XPath and XSLT • Powerful XML/SQL Duality