310 likes | 331 Views
XQuery Implementation in a Relational Database System. Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian Baras, Brandon Berg, Denis Churin, Eugene Kogan SQL Server Microsoft Corp. Overview. Background XML Support in SQL Server 2005
E N D
XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys,Gideon Schaller, Wei Yu, Dragan Tomic, Adrian Baras, Brandon Berg, Denis Churin, Eugene Kogan SQL Server Microsoft Corp
Overview • Background • XML Support in SQL Server 2005 • OrdPath labeling of XML nodes • XML indexes – PATH, VALUE, PROPERTY • Main topic – XQuery compilation • Architecture • XML operators • Mapping XML operators to relational+ ops • Conclusions S. Pal et al.
BackgroundXML Support in SQL Server 2005 Create table DOCS ( ID int primary key, XDOC xml) • XML stored in an internal, binary form (‘blob’) • Optionally typed by a collection of XML schemas • Used for storage and query optimizations • 3 of 5 methods on XML data type: • query(): returns XML type • value(): returns scalar value • exist(): checks conditions on XML nodes • XML indexing • More information at http://msdn.microsoft.com/xml S. Pal et al.
BackgroundXQuery embedded in SQL • Retrieve section titles from <book> wrapped in new <topic> elements: SELECT ID, XDOC .query(' for $s in /BOOK/SECTION return <topic> {data($s/TITLE)} </topic> ') FROM DOCS S. Pal et al.
BackgroundXQuery – supported features • XQuery clauses “for”, “where”, “return” and “order by” • XPath axes – child, descendant, parent, attribute, self and descendant-or-self • Functions – numeric, string, Boolean, nodes, context, sequences, aggregate, constructor, data accessor • SQL Server extension functions to access SQL variable and column data within XQuery • Numeric operators (+, -, *, div, mod) • Value comparison operators (eq, ne, lt, gt, le, ge) • General comparison operators (=, !=, <, >, <=, >=) S. Pal et al.
Background [SIGMOD04] ORDPATH Label of Nodes • node1 is ancestor of node2 • ORDPATH (node1) is prefix • of ORDPATH (node2) BOOK 1 @ISBN 1.1 Section 1.3 Section 1.5 Title 1.3.1 Figure 1.3.3 Title 1.5.1 Figure 1.5.3 • node1 precedes node2 in document order • ORDPATH (node1) < ORDPATH (node2) ORDPATH(1.3) ≤ id < Descendant_Limit (1.3) = 1.4 S. Pal et al.
Background [VLDB 2004]Indexing XML column • Primary XML index on an XML column • Creates B+tree tree on data model content of the XML nodes • Adds column Path_ID for the reversed, encoded path from each XML node to root of XML tree • OrdPath labeling schema is used for XML nodes • Relative order of nodes • Document hierarchy S. Pal et al.
Background XML example INSERT INTO myTable VALUES (7, ‘<Book xmlns="myns" ISBN = "1-55860-3612"> <Section> <Title>Bad Bugs</Title> </Section> <Section> <Title> Tree frogs </Title> <Figure>…</Figure> </Section> </Book>’) S. Pal et al.
Background Primary XML Index Entries • Encoding of tags & types stored in system meta-data - Additional details not shown Clustering key S. Pal et al.
Background Secondary XML indexes • To speed up different classes of commonly occurring queries • Statistics created on key columns of the primary and secondary XML indexes • Used for cost-based selection of secondary XML indexes S. Pal et al.
Background Handling Types • If XML column is typed • Values are stored in XML blob and XML indexes with appropriate typing • Untyped XML • Values are stored as strings • Convert to appropriate types for operations • SQL typed values stored in primary XML index • Most SQL types are compatible with XQuery types (integer) • Value comparisons on XML index columns suffice • Some types (e.g. xs:datetime) are stored in internal format and processed specially S. Pal et al.
XQuery Processing Architecture • XQuery Compiler: • Parses XQuery expr • Checks static type correctness • Type annotations • Applies static optimiztns • Path collapsing • Rewrites using XML schemas • XML Operator Mapper • Recursively traverses XML algebra tree • Converts each XmlOp to reln+ operator sub-tree • Mapping depends upon existence of primary XML index XQuery expression XQuery Compiler XML algebra tree (XmlOp ops) XMLOperatorMapper Relational Operator Tree (relational+ operators) Reln Query Processor S. Pal et al.
Examples of XML Operators S. Pal et al.
XMLOperatorMapping – Overview PATH Index XQUERY PK OrdPath PK XML Primary XML Index VALUE Index PROPERTY Index Special handling for SELECT * | XDOC REL+ tree S. Pal et al.
New operators • Some produce N rows from M (≠ N) rows • XML_Reader – streaming, pull-model XML parser • XML_Serializer – to serialize query result as XML • Some are for efficiency • Contains – to evaluate XQuery contains() • TextAdd – to evaluate the XQuery function string() • Data – to evaluate XQuery data() function • Some are for specific needs • Check – validate XML during insertion or modification S. Pal et al.
XMLOperatorMapping • Following categories: • Mapping of XPath expressions • Mapping of XQuery expressions • Mapping of XQuery built-in functions S. Pal et al.
Non-indexed XML, Full Path XML operator tree: XmlOp_Path PATH = “/BOOK/SECTION” Rel+ operator tree: XML_Serialize XML_Reader (XDOC, “/BOOK/SECTION”) • XML_Reader produces subtrees of <SECTION> • Node table rows • Contains OrdPath • No PK or PATH_ID • XML_Serialize reassembles those row into XML data type • To output result S. Pal et al.
Sample query execution using Primary XML Index • /Book/Section #3#1 (by XML Op Mapper) Clustering key S. Pal et al.
Indexed XML, Full Path XML_Serialize • XmlOp_Path mapped to SELECT • GET(PXI) – rows from primary XML index • Match PATH_ID • Not shown: • JOIN with base table on PK Assemble Subtree Apply Select ($b) Select GET (PXI) $b.OrdP ≤ OrdP< DL($b) GET (PXI) Path_ID=#SECTION#BOOK S. Pal et al.
XML index – PATH • Speeds up path evaluations • Example – /Book/Section #3#1 S. Pal et al.
Indexed XML, Imprecise Paths /BOOK/SECTION// TITLE • Matched using LIKE operator on Path_ID XML_Serialize Apply Assemble subtree of <TITLE> Select ($s) GET (PXI) Path_ID LIKE #TITLE%#SECTION#BOOK S. Pal et al.
Predicate Evaluation XML_Serialize /BOOK[@ISBN = “12”] • Search value compared with VALUE column in PXI • Collapsed path /BOOK/@ISBN • Induce index seeks • Reduce intermediate result size • Parent check – Par($b) • Using OrdPath • Value conversion might be needed Apply Apply Assemble subtree of <BOOK> Select ($b) Select Path_ID=#@ISBN#BOOK & VALUE=“12” & Par($b) GET (PXI) GET (PXI) Path_ID=#BOOK S. Pal et al.
Ordinal Predicate • /BOOK[n] • Adds ranking column to the rows for <BOOK> elements • Retrieves the nth <BOOK> node • Special optimizations • [1] TOP 1 ascending • [last()] TOP 1 descending • Avoids sorting when input is sorted • Example – in XML_Serializer S. Pal et al.
Error handling • Static type errors at compilation time • Raises static type errors if an expression could fail at runtime due to type safety violation • Addition of string to integer • Querying non-existent node name in typed XML • Non-singleton in “eq” • Some can be fixed using explicit cast or ordinal specification • Dynamic error converted to empty sequence • Yields correct result in predicates without negations S. Pal et al.
“for” Iterator XML_Serialize for $s in /BOOK//SECTION where $s/@num >= 3 return $s/TITLE • XML op for “for” is XmlOp_Apply • Maps to APPLY • Binds $s and iterates over <SECTION> • Determines its <TITLE> children • Nested “for” and “for” with multiple bindings turn into nested APPLY • Each APPLY binds to a different variable Apply Assemble <SECTION> Apply ($s) Path_ID LIKE #TITLE#SECTION%#BOOK & Par($s) Select Exists Select ($s) Select GET(PXI) GET (PXI) Path_ID LIKE #@num#SEC%#BK & VALUE >= 3 & Par($s) Path_ID LIKE #SECTION%#BOOK S. Pal et al.
XQuery “order by” and “where” • Order by: • Sorts rows based on order-by expression • Adds a ranking column to these rows • Ranking column converted into OrdPath values • Yield the new order of the rows • Fits rest of query processing framework • Where • Becomes SELECT on input sequence • Filters rows satisfying specified condition S. Pal et al.
XQuery “return” • Return nodes sequence in document order • Use OrdPath values and • XML_Serialize operator • New element and sequence constructions • Merge constructed and existing nodes into a single sequence (SWITCH_UNION) S. Pal et al.
XQuery Functions & Operators • Built-in fn and op are mapped to relational fn and op if possible • fn:count() count() • Additional support for XQuery types, functions and operators that cannot be mapped directly • Intrinsics S. Pal et al.
Optimizations • Exploiting Ordered Sets • Sorting information (OrdPath) made available to further relational operators • XML_Serialize is an example • Using static type information • Eliminates CONVERT() in operations • Allows range scan on VALUE index S. Pal et al.
Conclusions • Built-up infrastructure for query processing framework • Other XQuery features (such as “let” and typeswitch) can be implemented • Data modification language • Fits into relational query processing framework • XQuery features can be implemented using rel++ operators • Optimizations pose the biggest challenges • More cost-based optimizations can be done • Enhanced costing model (e.g. choice of PXI) • Matching materialized views S. Pal et al.
Thank you! S. Pal et al.