350 likes | 562 Views
DB2 for Linux, Unix, and Windows. pureXML Indexing Overview DB2 9, 9.5, and 9.7 for Linux, Unix, and Windows Christina (Tina) Lee IBM Silicon Valley Laboratory. September 2009. Agenda. pureXML Basics Regions and Paths Indexes Index on XML column DB2 9.5 Reject Invalid Values option
E N D
DB2 for Linux, Unix, and Windows pureXML Indexing Overview DB2 9, 9.5, and 9.7 for Linux, Unix, and WindowsChristina (Tina) LeeIBM Silicon Valley Laboratory September 2009
Agenda • pureXML Basics • Regions and Paths Indexes • Index on XML column • DB2 9.5 Reject Invalid Values option • Common User Errors • Queries and XML Indexes • Catalog Changes • DB2 9.7 • XML indexes on Range Partitioned Tables • Online Index Create and Online Index Reorg • Index Compression
pureXML Basics • XML stored in a parsed hierarchical format • No fixed XML schema per XML column required • XML Schema validation is optional, per document • XML indexes for specific elements/attributes • XQuery and SQL/XML Integration create table dept (deptID char(8),…, deptdoc xml); DB2 Storage Relational Storage page page page pureXML Storage
XML Regions and Column Path Indexes • XML Regions Index • System generated when first XML column created or added to table • Nodes and subtrees in a data page form regions in a document • Provides logical mapping of regions to retrieve document data • XML Column Path Index • System generated for each XML column created or added to table • Maps unique paths to path ids for each XML column • Used to improve performance during queries
CREATE INDEX for Index on XML Column • Index created on single XML column • Composite keys not supported • Only indexes document nodes that satisfy XML pattern • XML index specification • GENERATE KEY USING XMLPATTERN • XML pattern expression • Data type
Key Generation • Relational index inserts one key per table row • Index on XML Column may insert multiple keys per table row • Multiple parts of document may satisfy XML pattern
XML Pattern: Path Expression Steps • Supports subset of XQuery path expressions • Path expression steps separated by forward slash (/) • Double forward slash (//) is abbreviated syntax for /descendant-or-self::node()/
Qualifying Paths and Nodes • Set of nodes may qualify if single path specified • Set of paths and nodes may qualify if wildcard, descendant axis, or descendant-or-self axis specified
Data Types • Four SQL data types are supported • VARCHAR • DOUBLE • DATE • TIMESTAMP
VARCHAR(n) • Values longer than specified length(n) are not indexed • Document insertion or index creation will fail • Index can support both range scans and equality look-ups • Trailing blanks are significant during string comparisons
VARCHAR HASHED • Has no length limit and can index arbitrary length character strings • System generates an 8-byte hash code over entire string • Only used for equality look-ups and not range scans
DOUBLE • All numeric values will be converted and stored in the index as the DOUBLE data type • Special numeric values (NaN, INF, -INF, +0, -0) indexed even though not supported by SQL DOUBLE data type
DATE and TIMESTAMP • If timezone not specified, original value stored in index • If timezone specified, DATE and TIMESTAMP data type values are normalized to UTC (Coordinated Universal Time) before storing in index
Document Rejection or CREATE INDEX Failures • Errors causing document rejection for INSERT or UPDATE statements and CREATE INDEX failure if table already populated • VARCHAR(n): Value length exceeds length constraint • Conversion Errors: Valid XML value but can't convert to DB2's representation for the data type because of DB2 limitations
Invalid XML Values • For DOUBLE, DATE, and TIMESTAMP indexes • XML values without a valid lexical form for the target index XML data type are invalid • DB2 9 XML indexes always ignore invalid XML values • Invalid XML values can be rejected or ignored on new CREATE INDEX option for DB2 9.5
DB2 9.5 Reject Invalid Values • New REJECT INVALID VALUES option for DB2 9.5 • If XML value can’t be cast to index XML data type, error returned • If index does not exist, index is not created • XML data not inserted or updated in the table if index exists
DB2 9.5 Ignore Invalid Values • Invalid values for index XML data type ignored and not indexed • No error or warning is issued • Default option
Unique Keyword • Uniqueness enforced across all documents within a single XML column • Enforced within index data type, XML path to node, and value of node after value cast to index data type
Query Operators for XML • XSCAN ( XML Document Scan) • Traverses XML document trees and may evaluate predicates and extract document values • XISCAN (XML Index Scan) • Performs probes and scans on XML indexes and can evaluate predicates. • XANDOR (XML Index ANDing and ORing) • Evaluates two or more equality predicates by driving multiple XISCANs. CREATE INDEX AgeIndex on t1(XMLDoc) GENERATE KEY USING XMLPATTERN '/Person/Age' AS SQL DOUBLE; XQUERY for $i in db2-fn:xmlcolumn(‘T1.XMLDOC’)/Person where $i/Age = 17 return $i;
Even if these requirements are satisfied, the optimizer can still decide NOT to use an eligible index! Index Eligibility Requirements for an XML index to be used for a query: • Index “contains” the query predicate, i.e. isequally or less restrictive than the predicate • Query predicate matches the index data type • /text() is used consistently in query predicate and index definition: both specify /text() or not specify /text()
Queries using an Index on an XML Column • Some sample queries using equality and range predicates
SYSCAT.INDEXXMLPATTERNS • Catalog view externalizes information on the XML pattern specified for an Index on an XML Column
SYSCAT.INDEXES • Index on an XML Column has a logical and physical index • Logical index contains XML pattern created by user • Physical index contains index values • DB2 system generated key columns
DB2 9.7 XML Indexes on Range Partitioned Tables CREATE INDEX zipcode ON sales(customer_info) GENERATE KEY USING XMLPATTERN ’/Customer/Address/Zipcode’ AS SQL varchar(10) NOT PARTITIONED; • Relational Indexes may be not partitioned or partitioned in DB2 9.7 • User-defined XML Indexes may be not partitioned (DB2 9.7 GA) or partitioned (DB2 9.7 FP1) • System generated XML Paths Indexes are always not partitioned • System generated XML Regions Indexes are always partitioned CREATE INDEX zipcode ON sales(customer_info) GENERATE KEY USING XMLPATTERN ’/Customer/Address/Zipcode’ AS SQL varchar(10) PARTITIONED;
DB2 9.7 XML Indexes on Range Partitioned Tables Non-Partitioned Relational Index or Index on XML Column Partitioned Relational Index or Index on XML Column Partitioned Relational Index or Index on XML Column Partitioned Relational Index or Index on XML Column Base Table Partition 1 Base Table Partition 3 Base Table Partition 2 Partitioned Regions Index Partitioned Regions Index Partitioned Regions Index XDA XDA XDA Non-Partitioned XML Path Index
DB2 9.7 Online XML Index Create and Reorg • Insert/Update/Delete transactions no longer need to wait until the CREATE INDEX/REORG INDEXES/REORG INDEX statement completes • Results in increased throughput and faster response time for concurrent transactions.
DB2 9.7 Index Compression • Default for relational and XML indexes enables compression if data row compression enabled • New COMPRESS keyword on CREATE/ALTER INDEX can override default behavior • Index can be compressed even if data rows not compressed • MDC Block Indexes and XML Paths Indexes not compressed
What Did You Learn Today? • What the difference is between XML and relational indexes • How to create an index on an XML column • How to avoid common user errors • What the requirements are for queries to use XML indexes • How the XML indexes are defined in the catalog • DB2 9.5 and DB2 9.7 XML index enhancements