300 likes | 492 Views
SQL Server 2005: Deep Dive On XML And XQuery. Michael Rys DAT405 Program Manager, SQL Server XML Technologies Microsoft Corporation. XML And Relational Data Today. Relational Data. XML. Relational Data. XML. Relational Data. Relational Data. Query and Combine. XML. XML.
E N D
SQL Server 2005: Deep Dive On XML And XQuery Michael Rys DAT405 Program Manager, SQL Server XML Technologies Microsoft Corporation
XML And Relational Data Today RelationalData XML RelationalData XML RelationalData RelationalData Query and Combine XML XML File System
XML Scenarios • Data Exchange • Business-to-business (B2B), business-to-consumer (B2C), application-to-application (A2A) • XML is ubiquitous, extensible, platform independent transport format • Document Management • XHTML, Office XML Documents • Messaging • Simple Object Access Protocol (SOAP), RSS • Mid-Tier Collaboration • Ad-hoc modeling of semistructured data • storing objects with sparse or multi-valued properties that do not fit well into the traditional relational schemata →Transport, Store, and Query XML data
SQL Server 2005 XML Architecture Relational XML XML XML Parser XML Schemata SchemaCollection Validation OpenXML/nodes() PATH Index Node Table Rowsets XML data type(binary XML) XML-DML PRIMARYXML INDEX PROP Index FOR XML with TYPE directive VALUE Index XQuery XQuery
Why XQuery? • SQL does not understand XML • XPath 1.0 • W3C Recommendation • Used in SQL Server 2000: SQLXML and OpenXML • Navigation, no reshaping • Limited knowledge about types • XSLT • W3C Recommendation • Data-driven reshaping (uses XPath) • MSXML, System.XML • Hard to author and optimize for large amount of data • No XML data modification language (DML)
What Is XQuery? • Queries and transforms trees • Functional, declarative query language • Combines XPath with node construction • Operates on (XML Schema-)typed and unconstrained XML • Designed to operate on large amounts of data • Optimizable • Current Status: In final Last Call • Recommendations in H2 CY2006 • Fulltext and DML extensions will follow later
Key XQuery Features LET ORDER BY FOR WHERE • FLWOR: FOR / LET / WHERE / ORDER BY / RETURN • Includes XPath 2.0 (/doc[@id = 123]) • Element constructors (<topic>{…}</topic>) • Order-preserving operators • Input order (FLWR) • Document order (XPath, union) • Statically (or dynamically) typed • Strong typing with schema, weak typing without schema RETURN SQL: WITH SELECT FROM WHERE ORDER BY & SET
XQuery Type System • 3 Classes of Item Types: • Node types: element(), attribute(), comment() etc. • Element content types: xs:anyType, user-defined (e.g., my:CustomerT) • Atomic types: built-in and user-defined (e.g., xs:int, my:hatSize) • XQuery uses XML Schema for content and atomic types • “Untyped” data have special types (e.g., xdt:untypedAtomic) • XML Schema (W3C standard) • Rich mechanism for type definitions and validation constraints • Can be used to constrain XML documents • XML Schema Collections will be used for typing (meta-data) • Benefits of typed data • Guarantees shape of data • Provide type specific semantics • Allows storage and query optimizations
Static Typing In XQuery • Type Inference: Infers type of Expression during compilation • Type Check: Inferred Type is subtype of expected type • Benefits: • Compile-time type error discovery • Guarantees correct type at runtime • More efficient execution • Costs: • Sometimes type inference is less precise than data will be (inferring list on /a[1]/b, but there will always be only 1 b) • Requires more explicit casts and “pick first” (/a[1]/b[1])
XML Data Modification • XQuery extensions: Insert, update, and delete • XML sub-tree modification: • Add or delete XML sub-trees • Update values • Generate consistent state
replace value of delete insert XML-DML: insert <notes/>into /Customer replace value of(/Customer/name)[1]with “Nils” delete /Customer/Order[id = 42] insert <notes/>as lastinto /Customer Customer insert <notes/>as firstinto /Customer notes notes insert <notes/>before /Customer/name name: xs:string Order notes insert <notes/>after/Customer/name id: xs:int “Nils” “Janine” 42 Target needs to be statically one node
XQuery And XML-DML In SQL Server 2005 • Subset of XQuery implemented • Is aligned with July 2004 XQuery working draft • Added XML Data Modification • Applies to single XML data type instance • Methods on XML data type: • query(), value(), exist(), modify(), nodes() • Use SQL to iterate over collection of instances (XML-typed column) • Can refer to relational data • Take advantage of Schema-collection information to operate on typed XML data • Will make use of XML indices for optimization
XQuery Methods • query() creates new, untyped XML data type instance • value()extracts an XQuery value into the SQL value and type space • Expression has to statically be a singleton • String value of atomized XQuery item is cast to SQL type • SQL type has to be SQL scalar type (no XML or CLR UDT) • exist()returns 1 if the XQuery expression returns at least one item, 0 otherwise
XQuery: nodes() • Provides OpenXML-like functionality on XML data type column in SQL Server 2005 • Returns a row per selected node • Each row contains a special XML data type instance that • References one of the selected nodes • Preserves the original structure and types • Can only be used with the XQuery methods (not modify()), count(*), and IS (NOT) NULL
sql:column()/sql:variable() • Map SQL value and type into XQuery values and types in context of XQuery or XML-DML • sql:variable(): accesses a SQL variable/parameterdeclare @value int set @value=42select * from T where T.x.exist(‘/a/b[@id=sql:variable(“@value”)]’)=1 • sql:column(): accesses another column valuetables: T(key int, x xml), S(key int, val int)select * from T join S on T.key=S.keywhere T.x.exist(‘/a/b[@id=sql:column(“S.val”)]’)=1 • Restrictions in SQL Server 2005: • No XML, CLR UDT, datetime, or deprecated text/ntext/image
XQuery: modify() • Used with SET: declare @xdoc xmlset @xdoc.modify(‘delete /a/b[@id=“42”]’)update T set T.xdoc.modify(‘insert <b/> into /a’)where T.id=1 • Relational row-level concurrency: whole XML instance is locked
Combined SQL And XQuery/DML Processing • SELECT x.query(‘…’), y FROM T WHERE … Metadata Static Phase XQuery Parser SQL Parser XML SchemaCollection Static Typing Static Typing Algebrization Algebrization Static Optimization of combined Logical and Physical Operation Tree Dynamic Phase Runtime Optimization and Execution of physical Op Tree XML and rel.Indices
XML Indices • Create XML index on XML column CREATE PRIMARY XML INDEX idx_1 ON docs (xDoc) • Create secondary indexes on tags, values, paths • Speed up queries • Results can be served directly from index • SQL’s cost based optimizer will consider index • Primary and Secondary Indices will be efficiently maintained during updates • Only subtree that changes will be updated
insert into Person values (42, '<book ISBN=”1-55860-438-3”> <section> <title>Bad Bugs</title> Nobody loves bad bugs. </section> <section> <title>Tree Frogs</title> All right-thinking people <bold>love</bold> tree frogs.</section></book>') Example Index Contents
Primary XML Index CREATE PRIMARY XML INDEX PersonIdx ON Person (Pdesc) Assumes typed data; Columns and Values are simplified, see VLDB 2004 paper for details
Architectural Blueprint: Indexing XML Columnin table T(id, x) Primary XML Index (1 per XML column)Clustered on Primary Key (of table T), XID 1 2 2 4 1 2 3 3 3 1 Non-clustered Secondary Indices (n per primary Index) Value Index Property Index Path Index
Take-Away: XML Indices • PRIMARY XML Index – use when lot’s of XQuery • FOR VALUE– useful for queries where values are more selective than paths such as //*[.=“Seattle”] • FOR PATH– useful for Path expressions: avoids joins by mapping paths to hierarchical index (HID) numbers. Example: /person/address/zip • FOR PROPERTY– useful when optimizer chooses other index (e.g., on relational column, or FT Index) in addition so row is already known
Session Summary • SQL Server 2005 provides XQuery and XML DML on XML datatype • XQuery subset based on July 2004 WD • Typing provided by XML Schema collections on XML datatype • Node-based Data Manipulation Language (DML) • Integrates with relational processing • Optimization: • Using extended relational algebra and query optimizer • Indexing of XML datatype
Community Resources • At PDC • DAT Track lounge: I’ll be there daily • After PDC • MSDN dev center: http://msdn.microsoft.com/SQL/2005 • XML and Databases whitepapers: http://msdn.microsoft.com/XML/BuildingXML/XMLandDatabase/ • Online WebCasts: http://msdn.microsoft.com/sql/2005/2005webcasts/ • Newsgroups & Forum: news:microsoft.public.sqlserver.xml http://forums.microsoft.com/msdn/ShowForum.aspx?ForumID=89 • My E-mail: mrys@microsoft.com • My Weblog: http://www.sqljunkies.com/weblog/mrys • Please fill out Session Evaluation
© 2005 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.