DB2 9 for z/OS pureXML Performance and Best Practices

DB2 9 for z/OS pureXML Performance and Best Practices Information ManagementJune, 2010

Agenda F • XML performance • Performance monitoring and tuning • Best practices Note: the following performance numbers are not latestand have been improved since measurements.

Create tables with XML columns or alter table add XML columns Insert XML data, optionally validated against schemas Create indexes on XML data Efficiently search XML data Extract XML data Decompose XML data into relational data or create relational view Construct XML documents from relational and XML data Handle XML objects in all the utilities and tools What You Can Do with pureXML - Managing XML data the same way as relational data XML

Storage for UNIFI Messages 70% 52% 58% 71% 67% 96 sample documents Strip WS: Strip Whitespaces Pres WS: Preserve Whitespaces

Whitespace option and Compression 70% saving 80% saving Note : UNIFI (International Standard ISO 20022 – UNIversal Financial Industry message scheme )

Table Compression Impact

Insert Performance (Batch) 3.9 millions 10K docs per hour or 1100 docs/sec Measurement in March 2007, z9 DS8300, Single thread, Docs in EBCDIC

Insert XML – with indexes

XML Index Create or Rebuild //e /a/b/@c /a/b/f/g

Insert Performance – compare w/ CLOB (average of 1K to 10M document insert performance)

Fetch Performance (Batch) 9.3 millions 10K docs per hour or 2580 docs/sec Measurement in March 2007, z9 DS8300, Single thread, Docs in EBCDIC

XML Index Exploitation

XML Insert v.s. Validation v.s. Decomposition

Large Sample Tax Document Insert Performance z9-109, one LPAR with dedicated 3 CPs. Documents were stored remotely in AIX box and inserted using Java application. Time in seconds.

z/OS XML Specialty Engine Support 48% 32% 14%

LOAD Testing Processor IBM System z9 Enterprise Class (z9 EC) LPAR configuration: 4 General Purpose CPs, 1 zAAP, 1 zIIP, all dedicated Memory: 24GB memory Storage IBM DS8300 Operating system z/OS Version 1.9 DB2 DB2 9 Feb 2008 PTF level

TPoX Insert Test Processor IBM System z9 Enterprise Class (z9 EC) LPAR configuration: 3 dedicated General Purpose CPs (no zIIP no zAAP) Memory: 24GB memory Storage IBM DS8300 Case–A Operating system z/OS Version 1.8 DB2 DB2 9 June 2007 PTF level Case –B Operating system z/OS Version 1.9 DB2 DB2 9 Feb 2008 PTF level

TPoX Mixed Transaction Test (1/2) Processor IBM System z9 Enterprise Class (z9 EC) LPAR configuration: 3 General Purpose CPs, dedicated (no zAAP or zIIP) Memory: 24GB memory Storage IBM DS8300 Operating system z/OS Version 1.9 DB2 DB2 9 Feb 2008 PTF level Threads 35

TPoX Mixed Transaction Test (2/2)

TPoX 2500 100.00% 90.00% 2000 80.00% 70.00% 1500 60.00% ETR (tps) 50.00% cpu busy (%) 1000 40.00% 30.00% 500 20.00% 10.00% 0 0.00% PK81260 PK80732 10 PK80732 20 PK80732 30 PK80732 40 TPoX Benchmark October, 2009 z10, 5 CPs Number of users

Agenda • XML performance • Performance monitoring and tuning • Best practices F

Performance Monitoring and Tuning • Since XML native storage is built on top of regular tablespace structure, there are no special changes in DB2 Performance Expert to support XML other than minor points - such as new XML locks (type x’35’). • XML performance problem can be analyzed through accounting traces and performance traces. • There is a new LOAD MODULE for XML: DSNNXML • XML indexes have the same consideration as other indexes. • The REORG utility should be used to maintain order and free space. • Run RUNSTATS for statistics to help pick XML indexes.

XML Query Performance Issues • 85% of the performance issues relate to: • Query execution plans • Index usage (indexing presentation) • Proper coding of SQL/XML and XQuery expressions (Best Practices section)

How to obtain and analyze XML query plans • Create Explain tables • Use member DSNTESC of the SDSNSAMP library • Option E from menu of DB2 admin tool (DSN_STATEMNT_TABLE) • Use Visual Explain • Optim Development Studio • IBM DB2 Optimization Service Center for DB2 for z/OS(OSC) • Gather explain information • Use SPUFI – prefix query with EXPLAIN PLAN SET QUERYNO • SELECT from PLAN_TABLE

Use RUNSTATS • Use RUNSTATS to collect statistics for XML data and indexes so the optimizer can pick the right access methods LISTDEF DBACORDTSLIST INCLUDE TABLESPACES DATABASE DBACORD RUNSTATS TABLESPACE LIST DBACORDTSLIST TABLE(ALL) INDEX(ALL)

Agenda • XML performance • Performance monitoring and tuning • Best practices F

Best Practices • Tip 1: Choose the right table and storage design • Tip 2: Choose the right XML document granularity • Tip 3: Be aware of XML schema validation overhead • Tip 4: Avoid encoding conversion during XML insert and retrieval • Tip 5: In XPath expressions, use fully specified paths as much as possible • Tip 6: Define lean XML indexes • Tip 7: Put document filtering predicates in XMLEXISTS instead of XMLQUERY • Tip 8: Use square brackets [ ] to avoid Boolean predicates in XMLEXISTS • Tip 9: Use RUNSTATS to collects statistics for XML data and indexes • Tip 10: Use SQL/XML publishing views to expose relational data as XML • Tip 11: Use XMLTABLE views to expose XML as relational data • Tip 12: Use SQL/XML statements with parameter markers and host vars

Tip 1: Decision making: XML input => storage Regulatory Requirements Flexible Digital Signature Intact Significant Data (strip whitespace) (Relational/XML) (preserve whitespace) No Structures RegularFixed Search in XML Return XMLalways Yes Relational Complex Flexible Never Yes Yes Light Reporting XML with XMLTABLE() XML with XML indexes LOB VARCHAR VARBIN (can be materialized) Heavy Analytics Relational with XML

Some considerations • Tedious normalization and frustrated changes of schema are an indicator for using native XML. • Store hybrid or redundant data in relational/XML, when • Fully normalized storage is an overkill • Referential integrity: extract into relation columns • Store in XML, but materialize frequently used fields in relational for heavy analytic applications • Document size • Use compression for XML data always

Table Design DOCTAB ID DOCTYPE XMLDOC • Mixed document types in one table • Flexibility in exchange of overhead (such as index maintenance) • Separate tables for different document types • to avoid overhead DOCTYPE1TAB ID XMLDOC DOCTYPE2TAB ID XMLDOC

Tip 2: Choose the right XML document granularity • Small vs. large documents? (KBs vs. MBs)XML Indexes filter at the document level • Smaller documents tend to perform betterBut, rule of thumb: Document granularity should match the predominant granularity of access

Document Granularity: Example

Tip 3:Beware of Schema Validation Overhead create tabledept(deptID char(8), deptdocxml); Validation is optional, and per document (per row): insert intodeptvalues (?, ?) insert intodeptvalues (?, dsn_xmlvalidate(?, ?)) No validation with validation Validation increases CPU time for inserts, and reduces throughput. Use schema validation if needed.Avoid schema validation for highest possible insert performance.

Tip 4: Avoid encoding conversion • Internally encoded XML: encoding derived from the data, e.g. Unicode Byte-Order Mark or optional XML declaration: <?xml version="1.0" encoding="UTF-8" ?> • Externally encoded XML: application encoding determines XML encoding if character type variables are used • Internally encoded XML with UTF-8 is preferred • CLI: use SQL_C_BINARY data buffers rather than SQL_C_CHAR, SQL_C_DBCHAR, SQL_C_WCHAR • Java: use binary stream (setBinaryStream) rather than string (setString). • COBOL: SQL BLOB

Tip 5: Use fully specified paths if possible • As much as possible possible, use fully specified XPath expression rather than wildcards, e.g. • /customerinfo/phone instead of //phone • /customerinfo/addr/state instead of /customerinfo/*/state <customerinfo Cid="1004"> <name>Matt Foreman</name> <addr country="Canada"> <street>1596 Baseline</street> <city>Toronto</city> <state>Ontario</state> <pcode>M3Z-5H9</pcode> </addr> <phone type="work">905-555-4789</phone> <phone type="home">416-555-3376</phone> <assistant> <name>Peter Smith</name> <phone type="home">416-555-3426</phone> </assistant> </customerinfo>

Tip 6: Lean XML Indexes create tablecustomer(infoXML); create indexidx4 on customer(info) generate key using xmlpattern '/customerinfo/phone' as sql varchar(40); • create unique index idx1 on customer(info) • generate key using • xmlpattern '/customerinfo/@Cid' • as sql decfloat; • create index idx2 on customer(info) • generate key using • xmlpattern '/customerinfo/name' • as sql varchar(40); <customerinfo Cid="1004"> <name>Matt Foreman</name> <addr country="Canada"> <street>1596 Baseline</street> <city>Toronto</city> <state>Ontario</state> <pcode>M3Z-5H9</pcode> </addr> <phone type="work">905-555-4789</phone> <phone type="home">416-555-3376</phone> <assistant> <name>Peter Smith</name> <phone type="home">416-555-3426</phone> </assistant> </customerinfo> create index idx3 on customer(info) generate key using xmlpattern '//name' as sql varchar(40); LUW: “as sql double” zOS: “as sql decfloat”

create tablecustomer(infoXML); Tip 6: Lean XML Indexes Don’t index everything! Very expensive for insert, update, delete ! • create unique index idx1 on customer(info) • generate key using • xmlpattern '/customerinfo/@Cid' • as sql decfloat; • create index idx2 on customer(info) • generate key using • xmlpattern '/customerinfo/name' • as sql varchar(40); <customerinfo Cid="1004"> <name>Matt Foreman</name> <addr country="Canada"> <street>1596 Baseline</street> <city>Toronto</city> <state>Ontario</state> <pcode>M3Z-5H9</pcode> </addr> <phone type="work">905-555-4789</phone> <phone type="home">416-555-3376</phone> <assistant> <name>Peter Smith</name> <phone type="home">416-555-3426</phone> </assistant> </customerinfo> create index idx3 on customer(info) generate key using xmlpattern '//name' as sql varchar(40); create index idx4 on customer(info) generate key using xmlpattern '//text()' as sql varchar(40);

<customerinfo Cid="1004"> <name>Matt Foreman</name> <addr country="Canada"> <street>1596 Baseline</street> <city>Toronto</city> <state>Ontario</state> <pcode>M3Z-5H9</pcode> </addr> (…) </customerinfo> Tip 6: Lean XML Indexes and Indexing non-leaf Nodes • Typically not useful ! • …xmlpattern '/customerinfo/addr' as sql varchar(128); • Single index entry. Key value = concatenation of all text nodes under “addr”: • (/customerinfo/addr, “1596 BaselineTorontoOntarioM3Z-5H9”) • Better: 4 separate indexes ! • …xmlpattern '/customerinfo/addr/street' as sql varchar(50); • …xmlpattern '/customerinfo/addr/city' as sql varchar(40); • …xmlpattern '/customerinfo/addr/state' as sql varchar(25); • …xmlpattern '/customerinfo/addr/pcode' as sql varchar(10);

Tip 7 & 8 :Put document filtering predicates in XMLEXISTS instead of XMLQUERY & Use square brackets [ ] to avoid Boolean predicates in XMLEXISTS • XMLQUERY function in a SELECT clause does not filter documents or rows, does not use indexes • Document/Row-filtering predicates must be in XMLEXISTS in the WHERE clause • Predicates in XMLEXISTS must be in square brackets

create tablecustomer( infoXML); SQL/XML with XMLQUERY • select xmlquery(‘$i/customerinfo[phone = “905-555-4789”]/name’ passing info as “i”) • from customer customer table: 3 record(s) selected 1 record(s)selected • select xmlquery(‘$i/customerinfo/name’ passing info as “i”) • from customer • where xmlexists(‘$i/customerinfo[phone = “905-555-4789”]’ passing info as “i”)

create tablecustomer( infoXML); SQL/XML with XMLEXISTS • select xmlquery(‘$i/customerinfo/name’ passing info as “i”)from customerwhere xmlexists(‘$i/customerinfo/phone = “905-555-4789”’ passing info as “i”) True or false, not empty! customer table: 3 record(s) selected 1 record(s) selected • select xmlquery(‘$i/customerinfo/name’ passing info as “i”) • from customer • where xmlexists(‘$i/customerinfo[phone = “905-555-4789”]’ passing info as “i”)

Tip 9: Use RUNSTATS on XML data! • RUNSTATS does collect statistics for XML data and XML indexes! • The optimizer does use these stats! LISTDEF DBACORDTSLIST INCLUDE TABLESPACES DATABASE DBACORD RUNSTATS TABLESPACE LIST DBACORDTSLIST TABLE(ALL) INDEX(ALL)

Tip 10:Use SQL/XML publishing views to expose relational data as XML • SQL/XML publishing functions hidden in a view • create table unit( unitID char(8), name char(20), manager varchar(20)); • create view UnitView(unitID, name, unitdoc) as select unitID, name, XMLELEMENT(NAME "Unit", XMLELEMENT(NAME "ID", u.unitID), XMLELEMENT(NAME "UnitName", u.name), XMLELEMENT(NAME "Mgr", u.manager) ) • from unit u;

Tip 10:Use SQL/XML publishing views to expose relational data as XML • Queries that perform sub-optimally select unitdoc from UnitViewwhere xmlexists('$i/Unit[ID = "WWPR"]' passing unitdoc as "i"); • Query that performs well: filter on relational select unitdoc from UnitViewwhere UnitID = "WWPR"; In a nutshell, include relational columns in a SQL/XML publishing view, and when querying the view express any predicates on those columns rather than on the constructed XML.

Tip 11:Use XMLTABLE views to expose XML data in relational format • Values returned from XML documents in tabular format • create table customer(info XML); • create view myview(CustomerID, Name, Zip, Info) as SELECT T.*, info FROM customer, XMLTABLE ('$c/customerinfo' passing info as “c” COLUMNS “CID” INTEGER PATH './@Cid', “Name” VARCHAR(30) PATH './name', “Zip” CHAR(12) PATH './addr/pcode' ) as T;

Tip 11:Use XMLTABLE views to expose XML data in relational format • Query with an XML predicate • May perform sub-optimally select CustomerID,Name from myview where Zip = “95141”; • Will perform well select CustomerID, Name from myview where xmlexists('$i/customerinfo[addr/pcode ' “95141”] passing info as “i”); In a nutshell, be careful with XMLTABLE views which expose XML data in relational form. When possible, include additional columns in the view definition so that filtering predicates can be expressed on those columns instead of the XMLTABLE columns.

Extracting values from XML for Hybrid Store using Trigger • CUST(ID, NAME, CITY, ZIP, INFO): extract NAME, CITY, ZIP from INFO (XML) CREATE TRIGGER ins_cust AFTER INSERT ON cust REFERENCING NEW AS newrow FOR EACH ROW MODE DB2SQL BEGIN ATOMIC update cust set (name, city, zip) = (select X.name, X.city, X.zip from cust, XMLTABLE('customerinfo' PASSING CUST.INFO COLUMNS name varchar(30) PATH 'name', city varchar(20) PATH 'addr/city', zip varchar(12) PATH 'addr/pcode-zip') as X where cust.id = newrow.id ) where cust.id = newrow.id; END #

Tip 12: Use Parameter markers and host vars for fast XML queries select info from customer where xmlexists('$i/customerinfo[phone ="905-555-4789"]' passing info as "i") select info from customer where xmlexists('$i/customerinfo[phone= $p]' passing info as "i", cast(?as varchar(12)) as"p") select info from customer where xmlexists('$i/customerinfo[phone = $p]' passing info as "i", :vchostvar as "p")

XML Queries – Things to do to improve Performance LUW chart • Use XPath instead of FLWOR where possible • Reason: Simpler is better, for humans and for the DB2 optimizer. • XML may not have to be reconstructed and FLWOR can use more tempspace • Avoid parent for predicates • Eg. /a/b//d[../c=fn:string(“abc”)] • Reason: parent steps in the predicate prevent index usage • See: http://www.ibm.com/developerworks/data/library/techarticle/dm-0611nicola/#cases • Use range predicates when appropriate • Eg. [dateOfBirth [.>= xs:date(“2000-01-01”) and .<= xs:date(“2000-12-31”)]] • What is important here is the notation with the self axes (the dots). • Reason: This allows DB2 to use a single XISCAN instead of 2 XISCANs + IXAND. • See: http://www.ibm.com/developerworks/data/library/techarticle/dm-0611nicola/#rangepredicates

Using SPUFI or JCL SELECT Cid, InfoFROM DSN8910.CUSTOMERWHERE XMLEXISTS ('declare default element namespace "http://posample.org";//addr[city="Toronto"]' passing INFO) • XML and XPath are case-sensitive: CAP off/case mixed • SQLCODE = -16002, ERROR: AN XQUERY EXPRESSION HAS AN UNEXPECTED TOKEN DEFAULT FOLLOWING DECLARE. • Terminal session CCSID setting has to be consistent with application encoding scheme as “[” and “]” have different code points in different code pages. • SQLCODE = -16002, ERROR: AN XQUERY EXPRESSION HAS AN UNEXPECTED TOKEN FOLLOWING "Toronto".

DB2 9 for z/OS pureXML Performance and Best Practices