450 likes | 689 Views
DB2 Version 9: Overview of pureXML. Plus an XPath Primer, XML Column Indexes, Support for Validation. Gregg Lippa Themis Inc. http://www.themisinc.com glippa@themisinc.com.
E N D
DB2 Version 9:Overview of pureXML Plus an XPath Primer,XML Column Indexes, Support for Validation Gregg Lippa Themis Inc. http://www.themisinc.com glippa@themisinc.com
Gregg Lippa is currently a Senior Technical Advisor at Themis Inc. He teaches DB2-related courses on SQL, Application Programming, and optimization, performance and tuning as well as Java, J2EE, distributed computing and Java-related topics. Gregg has worked with DB2 as a consultant and trainer for over 20 years and with Java for 10 years. This material is taken from the Themis course DB1091: DB2 9 for z/OS pureXML Features. For more information visit http://www.themisinc.com. Products and company names mentioned may be trademarks of their respective companies. Mention of third party products or software is for reference only and constitutes neither a recommendation nor an endorsement.
DB2 Version 9 • pureXML technology • Seamlessly integrating XML with relational data • Including publishing and parsing functions • Storing XML data natively in a database table • Including a new XML data type
An XML Document <patient id="13579"> <name>John Doe</name> <addr> <street>123 Main</street> <city>Anytown</city> <state>PA</state> <zip>19134</zip> </addr> <phone type="home">610-654-1234</phone> <phone type="work">610-987-4321</phone> <email>jdoe@mymail.com</email> <service> <sdate>2008-09-22</sdate> <reason>flu shot</reason> <descrip>gave flu shot</descrip> <cost>40.00</cost> <copay>10.00</copay> </service> <service> <sdate>2008-05-11</sdate> <reason>sore foot</reason> <descrip>referred to specialist</descrip> <cost>60.00</cost> <copay>10.00</copay> </service> </patient> Root Nodes Elements Attributes Values (text) Well-formed Valid
XML – Why? • Pervasive • Versatile • Self-describing • Neutral for exchanging data among diverse devices • Universal standard for data interchange • Growth of XML data • XML-based industry and data standards • SOA and Web services; Services-based frameworks • messages are encapsulated as XML • Web 2.0 technologies • XML feeds • Syndication services - rendered as XML files • XML data becoming more critical to enterprise operations
Previous Approaches to XML Document Storage • File systems • Not storing XML documents in a database • Does not scale well • Stuffing • XML data stored as large objects or VARCHAR • Inefficient for querying the XML data • Shredding • Decomposing XML data into multiple columns & tables • Often leads to complex join requirements • May be difficult to recreate original XML document • Utilizing XML-only database systems • Few options and little expertise available
XML vs Relational Model • Major differences: • XML data is hierarchical; relational data is tabular • XML data is self-describing; relational data is not • XML data is ordered; relational data is not • Which approach is right for my data? • XML maximizesflexibility of the data structure • Relational data provides performance benefits for data retrieval • Relational data supports referential integrity requirements • Referential constraints cannot be based on XML columns • Data warehouses are oriented toward relational data
pureXML Capabilities • XML data type • With specialized hierarchical storage structure • Indexing capabilities • Based on data within XML documents • New query languages (XPath and SQL/XML) • New query optimization techniques too • XML schema support • Including validation • Database utilities support • Integration with JDBC, ODBC, Embedded SQL • XML shredding and publishing facilities • For composing and decomposing XML documents
Benefits of DB2 pureXML Technology • Faster development • Code simplification • Avoiding XML-relational transformations • Increased agility • Versatile XML schema evolution • Quickly modify applications to support new or changing requirements • Improved usability • Exploit previously unmanaged XML data • Speed up query processing through XML-optimized storage and indexing
pureXML Usage Scenarios • Integration ofdiverse data sources • DB2 provides the ability to join XML documents • Formprocessing • Store whole electronic forms in DB2 rather than shredding • Document storage and querying • Store and manage less structured (document-centric) XML data in DB2 • Using XML to support transactions • Service-oriented architectures (SOA) offer message-based transactions • XML data can then be retrieved, updated, searched and analyzed • Syndication and XML feeds • Serve XML feeds through a Web services interface • Provide a repository for XML data to support these feeds
DB2 Support for XML 0 – XML Storage 1 – Bind in XML 2 – Store as XML 3 – Shred into Relational 4 – Retrieve XML data 5 – Publish as XML 6 – Bind XML output 7 – XML to XML 8 – XML to Relational 9 – Relational to XML DB2 ENGINE 7 XML 0 2 1 Applicationthat makesuse of XML 4 Textual XML 3 8 9 6 5 Relational XSR (validation)
pureXML Architecture • DB2 9 – a hybrid database system • One database with both relational and native XML data • Single hybrid database enginehandles all processing • Application may combine SQL and SQL/XML • Access relational and XML data in a hybrid database • XML data is stored separately from other table contents • Has its own table space • Supports XML document validation with XML schemas • XML schemas used for validation are registered with DB2 • XML Schema Repository (XSR)
XML Data Type • DB2 native XML support includes a new XML data type • An XML column holds one XML document for each row • XML data is stored in a parsed tree structure • XML document can also be stuffed into LOB or VARCHAR • Provides advantages in certain scenarios • XML storage requirements • Separate XML tablespace plus space for any needed indexes
The XML Data Type • Optimized storage • New XML data type • Supports insert, update, and delete • Stores parsed XML documents • Available when creating or altering tables CREATE TABLE PATIENT (PATIENTID CHAR(6), PATIENT_XML XML) CREATE TABLE P2 (PID INT, PTYP CHAR(8), PX1 XML, PX2 XML) • Supports access to nodes within XML document via XPath • Query optimization • CREATE INDEX supports specification of an xmlpattern
XML Indexes • Indexes are often used to improve query performance • Indexes on XML columns are supported in DB2 9 • Uses an XML pattern (XPath) expression • Indexes paths and values in stored XML documents • XML index entries provide access to document nodes • Not limited to providing access to the beginning of a document • Index keys are created based on XML pattern expressions CREATE INDEX PATINDEX ON PATIENT(PATIENT_XML) GENERATE KEYUSING XMLPATTERN '/patient/service/sdate' as SQL VARCHAR(10)
Application Development Support • Supports developing apps that include XML requirements • Language support: C/C++, Java, Assembler, Cobol, PL/I • API support: JDBC, DB2 / ODBC, Embedded SQL, SQLJ • SQL/XML query support • DB2 sample database enhancements • Universal DB2 driver for JDBC enhanced to support XML • Provided extension XML type: com.ibm.db2.DB2Xml import com.ibm.db2.jcc.DB2Xml; DB2Xml xml1 = (DB2Xml) rs.getObject ("patient_xml"); String s = xml1.getDB2String(); InputStream is = xml1.getDB2XMLBinaryStream("UTF-16");
DB2 and XPath Navigating Through the XML Tree
DB2 XPath • XPath is an expression language • Designed by the World Wide Web Consortium (W3C) • Used to navigate XML documents • XPath expressions are similar to file path notations • DB2 XPath can be used: • With the XMLQUERY SQL built-in function • To extract data from an XML column • With the XMLEXISTS SQL predicate • To evaluate data in an XML column • When creating an XML index • To determine the XML document nodes to be indexed '/patient/name' All name elements within patient elements
XPath Expressions • XPath expressions: the basic building block of Xpath • Types of expressions provided by DB2 XPath: • Primary expressions: basic primitives of the language • Include literals, variable references, and function calls • Path expressions: locate nodes within a document tree • Include Node Tests and Filter Expressions • Arithmetic expressions: add, subtract, multiply, divide, modulus • Comparison expressions: compare two values • Logical expressions: use boolean logic • Anywhere an expression is expected, any kind of expression can be used • Operands of an expression are typically other expressions
XPath Data Model Key: Document node Comment node Element Node Attribute node Text node OneEmp.xml Sample comment Employee Name Phone Phone Address Dept type="home" type="cell" mgr="bob" 800-555-1234 212-321-4321 Sales LastName FirstName Street City State Zipcode Roger Rabbit 123 Main Smallville Wyoming 98765
Types of XPath Nodes • Document node encapsulates an XML document • Parent of root element node • Element node encapsulates an XML element • Can have one parent and many children • Attribute node represents an XML attribute • Belongs to an element • Text node encapsulates XML character content • Elements may have these • Processing Instruction (PI) node • Encapsulates XML processing instruction • Comment node encapsulates an XML comment • Namespaces node is considered to be a node in XPath
XPath Nodes Processing Order <?xml version="1.0"?> <th:course xmlns:th="http://themis.com/test" th:format="instructor led"> <th:title>XML for DB2</th:title> <th:descrip>Lots of information <!-- To be determined --> </th:descrip> <?ourOwnPIforCourseware?> </th:course> Document node 1 Element node 2,5,7 Comment node 9 Processing Instruction 10 Text node 6,8 Attribute node 4 Namespace node 3 1 course 2 title descrip 7 5 10 3 4 th="http://themis.com/test" name=formatvalue=“instructor led” target=ourOwnPIforCourseware 6 8 9 Lots of information XML for DB2 To be determined
Path Expressions Overview • Path expressions navigate the XML tree structure to locate nodes • Navigation axes are used in XPath; always start at context node • ForwardAxis moves down through the XML tree: • attribute (@) • child (default) • descendant • self (.) • descendant-or-self(//=/descendant-or-self:node()/) • ReverseAxis moves up through the XML tree: • parent (..) • Three parts of an axis step: • Axis specifies a direction of movement • Node test specifies node selection criteria • Predicates (zero or more) filter returned sequence parent Element self attribute Element Element Text Comment children descendants Text
Path Expressions Syntax • XML document to support upcoming example code: • XPath expressions support abbreviated syntax in axis steps @ is abbreviated syntax for attribute // is abbreviated syntax for /descendant-or-self::node()/ .. is abbreviated syntax for parent::node() . is abbreviated syntax for self::node() is abbreviated syntax for child::node() • <patient id="11123"> • <name>Sara Lee</name> • <addr> • <street>33 Maple</street> • <city>Nearly</city> • <state>NJ</state> • <zip>07123</zip> • </addr> • <phone type="home">908-842-7531</phone> • <email>saralee@cakemail.com</email> • <service> • <sdate>2004-02-29</sdate> • <reason>brittle nails</reason> • <descrip>prescribed hormones</descrip> • <cost>84.00</cost> • <copay>15.00</copay> • </service> • </patient>
Path Expression Examples • ‘ / ’ slash indicates that path begins at root node • Whole XML document • ‘//service’ two slashes at beginning of path expression • Requested node, service, may be located anywhere in the document • Returns the entire service element, including all of its children • ‘//phone/@*’ All attributes appearing under the phone element • Only attribute under phone is: home • ‘//@*’ All attributes in the XML document • All attributes: 11123 home • ‘/patient/name’ All name elements under patient • Only name is Sara Lee • ‘/patient/addr/city/..’ All child elements of addr • Element addr, the parent of city, plus all child elements of addr
Filter Expressions • A filter expression is a path expression followed by predicates in square brackets • Filter its result based on applying conditions; for example: • All service information of patients who have a $15.00 copay '/patient/*[copay="15.00"]' <service> <sdate>2004-02-29</sdate> <reason>brittle nails</reason> <descrip>prescribe hormones</descrip> <cost>84.00</cost> <copay>15.00</copay> </service> • Set context node to patient and filter based on attribute type rather than on an element /patient/phone[@type="home"] <phone type="home">908-842-7531</phone> • Any patient that has email /patient[email] Entire XML document • <patient id="11123"> • <name>Sara Lee</name> • <addr> • <street>33 Maple</street> • <city>Nearly</city> • <state>NJ</state> • <zip>07123</zip> • </addr> • <phone type="home">908-842-7531</phone> • <email>saralee@cakemail.com</email> • <service> • <sdate>2004-02-29</sdate> • <reason>brittle nails</reason> • <descrip>prescribed hormones</descrip> • <cost>84.00</cost> • <copay>15.00</copay> • </service> • </patient>
Arithmetic Expressions • Arithmetic expressions: perform operations that involve addition, subtraction, multiplication, division, and modulus • The XPath arithmetic operators: *multiplication div division idiv integer division mod modulus + addition - subtraction • An arithmetic expression results in a numeric value • Or an empty sequence or an error • Place arithmetic expressions in parentheses
Arithmetic Expression Example • An arithmetic expression to calculate the the remaining balance after the copay (and its return value): /patient/service/(cost – copay) 69.00 • <patient id="11123"> • <name>Sara Lee</name> • <addr> • <street>33 Maple</street> • <city>Nearly</city> • <state>NJ</state> • <zip>07123</zip> • </addr> • <phone type="home">908-842-7531</phone> • <email>saralee@cakemail.com</email> • <service> • <sdate>2004-02-29</sdate> • <reason>brittle nails</reason> • <descrip>prescribed hormones</descrip> • <cost>84.00</cost> • <copay>15.00</copay> • </service> • </patient>
Comparison Expressions – General Comparisons • Comparison expressions allow comparing two values • The comparison operators are = != < <= > >= • All services with a cost greater than 60 '/patient/service/cost > 60' true • All services with a cost greater than 60, but get the patient’s service info instead of just true or false '/patient/service[cost > 60]' <service> <sdate>2004-02-29</sdate> <reason>brittle nails</reason> <descrip>prescribed hormones </descrip> <cost>84.00</cost> <copay>15.00</copay> </service> • <patient id="11123"> • <name>Sara Lee</name> • <addr> • <street>33 Maple</street> • <city>Nearly</city> • <state>NJ</state> • <zip>07123</zip> • </addr> • <phone type="home">908-842-7531 </phone> • <email>saralee@cakemail.com • </email> • <service> • <sdate>2004-02-29</sdate> • <reason>brittle nails</reason> • <descrip>prescribed hormones • </descrip> • <cost>84.00</cost> • <copay>15.00</copay> • </service> • </patient>
Comparison Expressions – Logical Comparisons • Logical expressions using AND return true if both of two expressions are true • Logical expressions using OR return true if one or both expressions are true • Return phone numbers of the type work or fax (two options shown)'//phone[./@type="work" or ./@type="fax"]''//phone[@type="work" or @type="fax"]' • The result is this phone info <phone type="work">908-842-7531</phone><phone type="fax">908-751-2468</phone> • <patient id="11123"> • <name>Sara Lee</name> • <addr> • <street>33 Maple</street> • <city>Nearly</city> • <state>NJ</state> • <zip>07123</zip> • </addr> • <phone type="home">908-842-7531 </phone> • <phone type="fax">908-751-2468 </phone> • <email>saralee@cakemail.com • </email> • <service> • <sdate>2004-02-29</sdate> • <reason>brittle nails</reason> • <descrip>prescribed hormones • </descrip> • <cost>84.00</cost> • <copay>15.00</copay> • </service> • </patient>
The Built-in Function Library • Built-in functions offered by the DB2 XPath library: • String functions • Numeric functions • Functions working on boolean values • Functions working on sequences • Calls to these function are allowed in an XPath expression anywhere an expression is expected • Example: SELECT XMLQUERY('fn:concat($x,$y)' PASSING 'come ' AS "x", 'together' AS "y") FROM SYSIBM.SYSDUMMY1;
A Sampling of XPath Functions fn:compare compares two strings to see which one is greater fn:concat concatenates two or more strings into a single string fn:contains determines whether a string contains a given substring fn:count returns the number of values in a sequence fn:normalize-space strips leading and trailing whitespace characters fn:lower-case converts a string to lowercase fn:matches determines whether a string matches a given pattern fn:position returns the position of the context item in the sequence fn:replace replaces characters that match a pattern fn:round returns the integer that is closest to a numeric value fn:string returns the string representation of a value fn:string-length returns the length of a string fn:substring returns a substring of a string fn:upper-case converts a string to uppercase
XML Namespaces • XML namespaces prevent naming collisions • An XML namespace is a set of names identified by a namespace URI • Distinguishes element types or attribute names with the same name associated with different DTDs or Schemas • Namespaces allow qualifying names of elements and attributes • Contain an optional namespace prefix, a colon, and a local name • Example: two elements with the same name bound to different URIs: <anElement xmlns:p1="someURI" xmlns:p2="otherURI"> <ElementABC> <p1:table>excel</p1:table> <p2:table>dining</p2:table> <table>DB2</table> </ElementABC> </anElement> empty prefix; bound to default element namespace
Prolog – Namespace Definition • A DB2 XPath expression optionally contains a prolog • Establishes the processing environment • Prolog declaration may specify multiple namespace declarations • May also specify one default namespace declaration • Prolog declaration is always followed by a semicolon (;) • Syntax: • Examples: declare namespace prefix="namespace string literal"; declare default element namespace "namespace string literal"; declare namespace fn="http://www.w3.org/2005/xpath-functions"; declare default element namespace "http://www.xyz.com/movies";
Indexes and XML Indexes Built On Values Within XML Documents
XML Indexing • DB2 supports creating indexes on XML columns • Generated using XML pattern expressions • Support access to nodes in the document • Multiple parts of an XML document can satisfy an XML pattern • Multiple index keys may be generated for insert of a single document • GENERATE KEY USING XMLPATTERN clause of CREATE INDEX • Specifies what you want to index • Contains XML pattern expression Same as before XML CREATE UNIQUE INDEX PATIENT_ID_IX ON PATIENT(PATIENT_XML) GENERATE KEY USING XMLPATTERN '/patient/@id' AS SQL VARCHAR(5) XML node tobe indexed Required keywords Type of storedindex values
Data Types Associated With Pattern Expressions • Keys from XML pattern expression specified in a CREATE INDEX statement must be associated with a data type • May use either DECFLOAT or VARCHAR(n), where n <= 1000 • Value being inserted/indexed must be convertible to this type INSERT INTO PATIENT VALUES('12345', '<patient id="123456"><name>Jim Beam</name></patient>') Value too long forVARCHAR(5) index DSNT408I SQLCODE = -20305, ERROR: AN XML VALUE CANNOT BE INSERTED OR UPDATED BECAUSE OF AN ERROR DETECTED WHEN INSERTING OR UPDATING THE INDEX IDENTIFIED BY 'DBID~132 OBID~23' ON TABLE *N. REASON CODE = 1.
UNIQUE Keyword in XML Index Definition • The UNIQUE keyword is supported in XML index definitions • However, its meaning is different than in relational index definitions • When creating a relational index, the UNIQUE keyword enforces uniqueness across all rows in the table • When creating an index over XML data, the UNIQUE keyword enforces uniqueness across all documents in an XML column
Example Queries and Supporting Indexes • Example 1 SELECT * FROM PATIENT WHERE XMLEXISTS('$Z/patient/service[copay="10.00"]' PASSING BY REF PATIENT_XML AS "Z") Supporting index CREATE INDEX copayIdx on PATIENT(PATIENT_XML) GENERATE KEY USING XMLPATTERN '/patient/service/copay' AS SQL DECFLOAT • Example 2 SELECT * FROM PATIENT WHERE XMLEXISTS('$Z/patient/phone[@type="work"]' PASSING BY REF PATIENT_XML AS "Z") Supporting index CREATE INDEX phoneTypIdx on PATIENT(PATIENT_XML) GENERATE KEY USING XMLPATTERN '/patient/phone/@type' AS SQL VARCHAR(20) • Example 3 SELECT * FROM PATIENT WHERE XMLEXISTS('$Z/patient/addr[city="Uptown"]' PASSING BY REF PATIENT_XML AS "Z") Supporting index CREATE INDEX cityIdx on PATIENT(PATIENT_XML) GENERATE KEY USING XMLPATTERN '/patient/addr/city' AS SQL VARCHAR(20)
XML Schema • XML Schema Definition (XSD) defines structure of XML instance documents • Published as a recommendation by W3C • Defines elements and attributes permitted in a document • Defines parent / child relationship between elements • Defines data types, constraints and values for elements and attributes <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="patient"> <xsd:complexType> <xsd:sequence> <xsd:element name="id" type="xsd:integer"/> <xsd:element name="name" type="xsd:string"/> <xsd:element name="addr" type="xsd:string"/> <xsd:element name="dob" type="xsd:date"/> <xsd:element name="amount" type="xsd:double"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema>
XML Schema Management With the XSR • XML schema repository (XSR) • Set of tables that store XML schemas • Created during DB2 installation or migration • XML schemas may be added to the XSR • Then used to validate XML documents being inserted or updated • Registering XML schema documents (adding to XSR): • Call DB2-supplied stored procedures from a DB2 application • Or invoke a provided JDBC method from a Java application • Removing XML schema documents from the DB2 XSR • Call DB2-supplied stored procedure or invoke JDBC method Only schemas,and not DTDs,may be used forXML validation inDB2 Version 9
XML Schema Validation • Use SQL INSERT statement to insert data into XML column • Inserted data must be a well-formed XML document • Validate the XML against a registered XML schema during insertion using the DSN_XMLVALIDATE function • User Defined Function • XML validation determines whether the structure, content, and data types of an XML document are valid according to a corresponding schema • Validation is optional INSERT into AutoDealers VALUES( '12345', CURRENT DATE, 'Sams Deals', DSN_XMLValidate(:xmlDealerInfo, SYSXSR.DealerInfoSchema));
XML Decomposition • Decomposition, or shredding, is the process of storing XML document content in columns of relational tables • Decomposed data has SQL type of column where it is inserted • An XML schema consists of one or more XML schema documents • Annotated XML schema decomposition • Control of the decomposition process is provided by XML schema annotation
Resources DB2 Version 9.1 for z/OS XML Guide (SC18-9858-03) DB2 9 for z/OS Technical Overview (SG24-7330-00) DB2 Version 9.1 for z/OS Application Programming and SQL Guide (SC18-9841-01) DB2 Version 9.1 for z/OS Utility Guide and Reference (SC18-9855-02) DB2 Version 9.1 for z/OS Application Programming Guide and Reference for Java(SC18-9842-01 )