220 likes | 371 Views
XML Processing Moves Forward XSLT 2.0 and XQuery 1.0. Michael Kay Prague 2005. About me. Database background Started using XML in 1998 for content management applications Author of XSLT Programmer’s Reference Developer of Saxon XSLT processor Member of W3C XSL and XQuery Working Groups
E N D
XML Processing Moves ForwardXSLT 2.0 and XQuery 1.0 Michael Kay Prague 2005
About me • Database background • Started using XML in 1998 for content management applications • Author of XSLT Programmer’s Reference • Developer of Saxon XSLT processor • Member of W3C XSL and XQuery Working Groups • Founded SAXONICA March 2004
Contents • A tour of the new specs • What’s significant about XSLT 2.0 • A quick demo • Why XQuery?
The QT Specification Family XSLT 2.0 XQuery 1.0 FunctionsandOperators XPath 2.0 Data Model XML Schema
Standards maturity XSLT 1.0XPath 1.0 XMLSchema XML Maturity XQueryXSLT 2.0XPath 2.0 REC CR Time
A family of standards XQuery1.0 XSLT 2.0 XPath 2.0 XSLT 1.0 XPath 1.0 XML Schema
XSLT and XQuery Documents Data XSLT XQuery
What’s new in XSLT 2.0 • New Processing Model • Major Features • grouping • regular expressions • functions • schema support • Many “minor” features
Some “minor” features XSLT 2.0 • Temporary trees • Multiple Output Files • Format date/time • Tunnel parameters • Declared variable types • Multi-mode templates • xsl:next-match • conditional compilation • XHTML serialization • xsl:namespace • separator=“,” • character maps XPath 2.0 • Sequences • if..then..else • for $x in X return f($x) • some/every • except/intersect • $n is $m Function library • String functions • Regex functions • Date/time arithmetic • URI handling • min(), max(), avg()
Handling unstructured text • unparsed-text() function • reads a text file into a string • tokenize() function • splits a string into substrings • xsl:analyze-string • parses a string and generates markup
Regular expression functions • matches() test if a string matches a regex if (matches($in, ‘[A-Z]{3}[0-9]{3}’) • tokenize() split a string into substrings regex matches the separator for $s in tokenize($in, ‘,\s?’) ... • replace() replace every occurrence of a match replace($in, ‘\s’, ‘%20’)
Grouping • Takes any sequence as input • Divides the items into groups • Applies processing to each group group-by: items with a common value for a grouping keygroup-adjacent: adjacent items with a common grouping key group-starting-with: pattern to match first item in each group group-ending-with: pattern to match last item in each group
Grouping by Value <xsl:for-each-group select=“book” group-by=“publisher”> <xsl:sort select=“current-grouping-key()”/> <h2>Publisher: <xsl:value-of select=“current-grouping-key”/> </h2> <xsl:for-each select=“current-group()”/> <xsl:sort select=“title”/> <p>author: <xsl:value-of select=“author”/></p> <p>title: <xsl:value-of select=“title”/></p> </xsl:for-each> </xsl:for-each-group>
User-defined Functions • Written like named templates • Called from XPath • Return a result <xsl:function name=“ged:date-to-ISO” as=“xs:date”> <xsl:param name=“in” as=“ged:date”/> <xsl:sequence select=“xs:date(concat( substring($in, 8, 4), ‘-’ format-number(index-of((“JAN”, “FEB”, ...), substring($in, 4, 3)), ’00’), ‘-’, substring($in, 1, 2)))”/> </xsl:function> <xsl:sort select=“ged:date-to-ISO(@birth-date)”/>
XQuery 1.0 • Designed to query XML databases • Also handles in-memory transformations • Well supported by database vendors
XQuery ExampleJoin two tables xquery version 1.0; <results> { for $p in doc ("auction.xml")/site/people/person let $a := for $t in doc("auction.xml") /site/closed_auctions/closed_auction where $t/buyer/@person = $p/@id return $t return <item person="{$p/name}"> {count ($a)} </item>} </results> XMark Q8
XSLT Equivalent <result xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:for-each select="/site/people/person"> <xsl:variable name="a" select="/site/closed_auctions/closed_auction [buyer/@person = current()/@id]"/> <item person="{name}"> <xsl:value-of select="count($a)"/> </item> </xsl:for-each> </result> XMark Q8
Optimization • With multi-GB databases, using indexes is essential • XQuery does not have template rules • This makes it possible to do static analysis and join optimization
XSLT 1Mb 4Mb 10Mb Xalan 1503 11006 65855 O(n2) xt 160 2253 16414 MSXML 33 519 4248 Saxon 8.4 90 1340 11126 XQuery Saxon 8.4 136 1575 11947 Qizx 351 711 1813 O(n) Galax 1870 6672 16625 XMark Q8 results (msecs)
Two can play at that game! XSLT 1Mb 4Mb 10Mb Xalan 1503 11006 65855 O(n2) xt 160 2253 16414 MSXML 33 519 4248 Saxon 8.5 27 26 45 O(n) XQuery Saxon 8.5 16 16 31 Qizx 351 711 1813 Galax 1870 6672 16625 caveat: this is one query only!
Conclusions • XSLT 2.0 and XQuery 1.0 are nearly ready • XSLT 2.0 has many powerful new features, making new applications possible • XQuery 1.0 designed for optimization against very large databases