380 likes | 575 Views
Deep Dive into XML and XQuery in SQL Server 2005. Michael Rys Program Manager SQL Server Query Technologies http://www.sqljunkies.com/weblog. XML and Relational Data Today. Relational Data. XML. Relational Data. XML. Relational Data. Relational Data. Query and Combine. XML. XML.
E N D
Deep Dive into XML and XQuery in SQL Server 2005 Michael Rys Program Manager SQL Server Query Technologies http://www.sqljunkies.com/weblog
XML and Relational Data Today RelationalData XML RelationalData XML RelationalData RelationalData Query and Combine XML XML File System
XML • Self-describing: <doc id="d1">This is an <important>example</important>.</doc> • Complex data • Trees, recursive, graph • Structured Data: highly regular, homogeneous structures • Semi-structured Data: heterogeneous, sparse data • Markup Data: documents/content markup • Document ordering • Schema/Type system • Schema-less • Optional Schema: semi-structured, structured • Extensible • Annotations, multiple schemas (late binding)
XML Scenarios • Data Exchange between loosely-coupled systems • Message Envelope in XML: Simple Object Access Protocol (SOAP), RSS • Message Payload/Business Data in XML: Business-to-business (B2B), business-to-consumer (B2C), application-to-application (A2A) • Vertical Industry Exchange schemas • XML is ubiquitous, extensible, platform independent transport format • Document Management • XHTML, DocBook, Home-grown, domain-specific markup (e.g. contracts), OpenOffice, MS Office XML (both default and user-extended) • Ad-hoc modeling of semistructured data • Storing and querying heterogeneous complex objects • Semistructured data with sparse, highly-varying structure at the instance level • XML provides self-describing format and extensible schemas →Transport, Store, and Query XML data
Why Bring XML and Relational Data Together? XML in SQL Server 2005(incl SQLXML) XML in SQL Server 2000(incl SQLXML) Document Mgmt + XML dt Native XML Store in RDBMS Native XML andsemi-structured data DBMS Rel XML Rel XML XML for relational data XML, relational and semi-structured data
SQL Server as Integrated Data Platform • Provides core DB services to BOTH XML and Relational Data: • Base Services • Concurrency Control • Recovery • Declarative Query and Update language • Execution Engine and Optimizer • Tools to repurpose and combine data • Shared management and deployment • Integration with BI tools
SQL Server 2000Technologies • Server support • FOR XML: generate XML from tables • OpenXML: generate relational rowset from XML • Mid-tier support • XML views (annotated mapping schemas XSD) • Templates • UpdateGrams/BulkLoad • Access methods • HTTP SOAP (via mid-tier ISAPI) • ADO, OLE DB; ADO.NET
SQL Server 2005 XML Data Type Architecture Relational XML XML XML Parser XML Schemata SchemaCollection Validation OpenXML/nodes() PATH Index Node Table Rowsets XML data type(binary XML) XML-DML PRIMARYXML INDEX PROP Index FOR XML with TYPE directive VALUE Index XQuery XQuery
XML Datatype • Native SQL type • Use for column, variable or parameter • Can represent: • XML 1.0 documents • XML 1.0 fragments (0 to n element nodes and text nodes at top) • Can be constrained by XML Schema collection • Queryable with XQuery • Updateable with XML-DML • XML Indexing • Well-formed and validation checks
XML Datatype: Storage • Primary Storage • As LOB (2 GB per instance) • Efficient binary representation: • Primary goal: efficient parsing and serialization • ca. 20% compression over Unicode textual representation • XML encoding transformed to UTF-16 • Secondary Storage (for Query Optimization) • Via Primary and Secondary Indices • Primary XML index: 2-6x of XML data • PATH XML index: 0.5-2x of XML data • CAST from/to varbinary, (n)varchar, and CLR UDT
XML Schema Support • XML Schema (World Wide Web Consortium [W3C] standard) • Rich mechanism for type definitions and validation constraints • Can be used to constrain XML documents • Benefits of typed data • Guarantees shape of data • Allows storage and query optimizations • XML type system • Store XML schemas in system metadata • Does not preserve annotations
What is XQuery? • Queries and transforms trees • Functional, declarative query language • Combines XPath with node construction • Operates on (XML Schema-)typed and unconstrained XML • Designed to operate on large amounts of data • Optimizable • Current W3C Status: In Candidate Recommendation • Recommendations in H2 CY2006 • Fulltext and DML extensions will follow later
Key XQuery Features FOR LET WHERE • FLWOR: FOR / LET / WHERE / ORDER BY / RETURN • Includes XPath 2.0 (/doc[@id = 123]) • Element constructors (<topic>{…}</topic>) • Order-preserving operators • Input order (FLWR) • Document order (XPath, union) • Statically (or dynamically) typed • Strong typing with schema, weak typing without schema RETURN ORDER BY SQL: WITH SELECT FROM WHERE ORDER BY & SET
XQuery Type System • 3 Classes of Item Types: • Node types: element(), attribute(), comment() etc • Element content types: xs:anyType, user-defined (e.g., my:CustomerT) • Atomic types: built-in and user-defined (e.g., xs:int, my:hatSize) • XML Schema for content and atomic types • “Untyped” data have special types (xdt:untypedAtomic) • XML Schema (W3C standard) • Rich mechanism for type definitions and validation constraints • Can be used to constrain XML documents • XML Schema Collections will be used for typing (meta-data) • Benefits of typed data • Guarantees shape of data • Provide type specific semantics • Allows storage and query optimizations
Static Typing in XQuery • Type Inference: Infers type of Expression during compilation • Type Check: Inferred Type is subtype of expected type • Benefits: • Compile-time type error discovery • Guarantees correct type at runtime • More efficient execution • Costs: • Sometimes type inference is less precise than data will be (inferring list on /a[1]/b, but there will always be only 1 b) • Requires more explicit casts and “pick first” (/a[1]/b[1]).
XML Data Modification • XQuery extensions: Insert, update, and delete • XML sub-tree modification: • Add or delete XML sub-trees • Update values • Generate consistent state
XML-DML: delete replace value of insert insert <notes/>into /Customer replace value of(/Customer/name)[1]with “Nils” delete /Customer/Order[id = 42] insert <notes/>as lastinto /Customer Customer insert <notes/>as firstinto /Customer notes notes insert <notes/>before /Customer/name name: xs:string Order notes insert <notes/>after/Customer/name id: xs:int “Nils” “Janine” 42 Target needs to be statically one node
XQuery and XML-DML in SQL Server 2005 • Subset of XQuery implemented • Is aligned with July 2004 XQuery working draft • Added XML Data Modification • Applies to single XML data type instance • Methods on XML data type: • query(), value(), exist(), modify(), nodes() • Use SQL to iterate over collection of instances (XML-typed column) • Can refer to relational data • Take advantage of Schema-collection information to operate on typed XML data • Will make use of XML indices for optimization
XQuery methods • query() creates new, untyped XML data type instance • value()extracts an XQuery value into the SQL value and type space • Expression has to statically be a singleton • String value of atomized XQuery item is cast to SQL type • SQL type has to be SQL scalar type (no XML or CLR UDT) • exist()returns 1 if the XQuery expression returns at least one item, 0 otherwise
XQuery: nodes() • Provides OpenXML-like functionality on XML data type column in SQL Server 2005 • Returns a row per selected node • Each row contains a special XML data type instance that • References one of the selected nodes • Preserves the original structure and types • Can only be used with the XQuery methods (not modify()), count(*), and IS (NOT) NULL
sql:column()/sql:variable() • Map SQL value and type into XQuery values and types in context of XQuery or XML-DML • sql:variable(): accesses a SQL variable/parameterdeclare @value int set @value=42select * from T where T.x.exist(‘/a/b[@id=sql:variable(“@value”)]’)=1 • sql:column(): accesses another column valuetables: T(key int, x xml), S(key int, val int)select * from T join S on T.key=S.keywhere T.x.exist(‘/a/b[@id=sql:column(“S.val”)]’)=1 • Restrictions in SQL Server 2005: • No XML, CLR UDT, datetime, or deprecated text/ntext/image
XQuery: modify() • Used with SET: declare @xdoc xmlset @xdoc.modify(‘delete /a/b[@id=“42”]’)update T set T.xdoc.modify(‘insert <b/> into /a’)where T.id=1 • Relational row-level concurrency: whole XML instance is locked
Combined SQL and XQuery/DML Processing • SELECT x.query(‘…’), y FROM T WHERE … Metadata Static Phase XQuery Parser SQL Parser XML SchemaCollection Static Typing Static Typing Algebrization Algebrization Static Optimization of combined Logical and Physical Operation Tree Dynamic Phase Runtime Optimization and Execution of physical Op Tree XML and rel.Indices
XML Indices • Create XML index on XML column CREATE PRIMARY XML INDEX idx_1 ON docs (xDoc) • Create secondary indexes on tags, values, paths • Speed up queries • Results can be served directly from index • SQL’s cost based optimizer will consider index • Primary and Secondary Indices will be efficiently maintained during updates • Only subtree that changes will be updated
insert into Person values (42, '<book ISBN=”1-55860-438-3”> <section> <title>Bad Bugs</title> Nobody loves bad bugs. </section> <section> <title>Tree Frogs</title> All right-thinking people <bold>love</bold> tree frogs.</section></book>') Example Index Contents
Primary XML Index CREATE PRIMARY XML INDEX PersonIdx ON Person (Pdesc) Assumes typed data; Columns and Values are simplified, see VLDB 2004 paper for details
Architectural Blueprint: Indexing XML Columnin table T(id, x) Primary XML Index (1 per XML column)Clustered on Primary Key (of table T), XID 1 2 2 4 1 2 3 3 3 1 Non-clustered Secondary Indices (n per primary Index) Value Index Property Index Path Index
Take-away: XML Indices PRIMARY XML Index– use when lot’s of XQuery FOR VALUE– useful for queries where values are more selective than paths such as //*[.=“Seattle”] FOR PATH– useful for Path expressions: avoids joins by mapping paths to hierarchical index (HID) numbers. Example: /person/address/zip FOR PROPERTY– useful when optimizer chooses other index (e.g., on relational column, or FT Index) in addition so row is already known.
Summary • SQL Server as Integrated Data Platform provides core DB services to BOTH XML and Relational Data • Native XML Support includes: • Native XML type • Schema validation • XQuery support • XML-DML support • XML Indices • (Improved XML Publishing and Shredding)
Resources SQL Server webpage: http://msdn.microsoft.com/SQL/2005/ XML and Databases whitepapers: http://msdn.microsoft.com/XML/BuildingXML/XMLandDatabase/ Online WebCasts: http://msdn.microsoft.com/sql/2005/2005webcasts/ Newsgroups & Forum: microsoft.private.sqlserver2005.xmlhttp://communities.microsoft.com/newsgroups/default.asp?ICP=sqlserver2005&sLCID=ushttp://forums.microsoft.com/msdn/ShowForum.aspx?ForumID=89 My E-mail: mrys@microsoft.com My Weblog: http://www.sqljunkies.com/weblog/mrys