Introduction to XSLT

Introduction to XSLT Evan Lenz XML/XSLT Consultant http://xmlportfolio.com evan@evanlenz.net August 2, 2005 O’Reilly Open Source Convention August 1 - 5, 2005

Who is this guy? • Evan Lenz • Majored in music • Over 5 years ago, read Michael Kay's XSLT Programmer's Reference cover-to-cover while sitting by his newborn son's hospital bed • Participated on the XSL Working Group for a couple years • Wrote XSLT 1.0 Pocket Reference (due out this month) • Preparing for entrance to a Ph.D. program in Digital Arts and Experimental Media

Why does he like XSLT? • XSLT is: • Powerful • Small • Beautiful • In high demand • Fun to learn • Fun to teach

What should I expect this afternoon? • Fasten your seatbelts • A variety of interactive exercises and traditional presentation • Feel free to feel overwhelmed • You're learning more than you think! • Try your best while you're here and it will be time well spent • Have fun!

What's with the handouts? • The big handout is a late-stage draft of XSLT 1.0 Pocket Reference, due out this month • If you would like a complimentary copy of the final book, put your name and mailing address on the sign-up sheet • The smaller handout contains exercises that we will be using today

High-level overview XSLT from 30,000 feet

What is XSLT? • “XSL Transformations” • “A language for transforming XML documents into other XML documents” • W3C Recommendation • http://www.w3.org/TR/xslt • Version 1.0: 1999-11-16

OK, then what is XSL? • “Extensible Stylesheet Language” • “A language for expressing stylesheets” • W3C Recommendation • http://www.w3.org/TR/xsl • Version 1.0: 2001-10-15 • Has 2 parts: • XSLT • Refactored out of XSL so that it could proceed independently • XSL-FO • “Formatting Objects”

What is XPath? • “XML Path Language” • “A language for addressing parts of an XML document” • W3C Recommendation • http://www.w3.org/TR/xpath • Version 1.0: 1999-11-16 • Released on the same day as XSLT 1.0 • The expression language used in XSLT

A relationship of subsets • XPath is part of XSLT • XSLT is part of XSL • Today we are concerned only with the inner two circles: • XSLT and XPath • XSL, a.k.a. XSL-FO, is out of scope for today

What is XSLT used for? Common applications Stylesheets for converting XML to HTML Generating Web pages or whole websites Docbook -> HTML Transformations from one document type to another *ML to *ML – as many potential applications as there are XML document types RSS, SVG, UBL, LegalXML, HrXML, XBRL Office applications SpreadsheetML, WordML, Keynote XML, OOo XML, PowerPoint (in next version), Access XML, etc. Extracting data from documents Modifying or fixing up documents

Where is XSLT used? Every platform Windows, Linux, Mac, UNIX, Java Many browsers support XSLT natively Firefox/Mozilla, Internet Explorer, Safari Many frameworks use or support XSLT .NET, Java, LAMP PHP5 now uses libxslt Cocoon, 4Suite, Amazon web services, Google appliance, Cisco routers, etc., etc. XSLT IS EVERYWHERE!!

Interoperable implementations? In terms of interoperability, XSLT is unmatched among languages having multiple implementations Java Saxon – http://saxon.sf.net (open-source) Xalan-J – http://xml.apache.org/xalan-j/ (open-source) Windows MSXML – fast, fully conformant Python 4xslt – http://www.4suite.org (open-source) C libxslt – http://xmlsoft.org (open-source; used in Firefox, Safari, PHP5, etc.) Xalan-C++ – http://xml.apache.org/xalan-c/ (open-source)

Enough already, let's see some code!

Example XML file INPUT: names.xml <people> <person> <givenName>Joe</givenName> <familyName>Johnson</familyName> </person> <person> <givenName>Jane</givenName> <familyName>Johnson</familyName> </person> <person> <givenName>Jim</givenName> <familyName>Johannson</familyName> </person> <person> <givenName>Jody</givenName> <familyName>Johannson</familyName> </person> </people>

A very simple stylesheet, names.xsl

OUTPUT: the result of the transformation $ saxon names.xml names.xsl >names.html

Or we could open the XML directly in the browser • Oops, we must first add a processing instruction (PI) to the top, like this: <?xml-stylesheet type="text/xsl" href="names.xsl"?> <people>  </people>

That's better. Displays as HTML but viewing source shows it's just XML.

One more example for now INPUT: article.xml <?xml-stylesheet type="text/xsl" href="article.xsl"?> <article> <heading>This is a short article</heading> <para>This is the <emphasis>first</emphasis> paragraph.</para> <para>This is the <strong>second</strong> paragraph.</para> </article>

A rule-oriented stylesheet article.xsl:

A rule-oriented stylesheet, cont. article.xsl, cont.:

OUTPUT: article.xml transformed to HTML

See a pattern here?

XPath in a nutshell

How XPath fits in XSLT XPath expressions appear in attribute values, e.g.: <xsl:for-each select="/people/person"/> <xsl:value-of select="givenName"/> <xsl:apply-templates select="/article/para"/> What these mean: /people/person Select all person child elements of all people child elements of the root node givenName Select all givenName child elements of the context node /article/para Select all para child elements of all article child elements of the root node

The skinny on XPath XPath is an expression language The only thing you can do with XPath is write expressions When we say “expression”, we mean “XPath expression” Every expression returns a value XPath 1.0 has just four data types: Node-set (the most important) String Number Boolean All expressions are evaluated in a context Understanding context is crucial to understanding XPath

Path expressions Expressions that return node-sets are sometimes called path expressions A node-set is: An unordered collection of zero or more nodes Every expression is evaluated relative to exactly one context node The context node is analogous to the current directory in a filesystem On a CLI, dir/* expands to all the files in the dir directory inside the current directory As an XPath expression, dir/* would select all the element children of all the dir element children of the context node

A filesystem analogy • Addressing files: • Relative • dir/* • ../file • Absolute • /home/elenz/file.txt • Addressing XML nodes: • Relative • body/p • ../table • Absolute • /html/body/p

QUIZ 1: You have 5 minutes Ready? Set...

Go! Use this cheat sheet para selects the para element children of the context node * selects all element children of the context node node() selects all children of the context node @name selects the name attribute of the context node @* selects all the attributes of the context node para[1] selects the first para child of the context node para[last()] selects the last para child of the context node */para selects all para grandchildren of the context node /doc/chapter[5]/section[2] selects the second section of the fifth chapter of the doc chapter//para selects the para element descendants of the chapter element children of the context node //para selects all the para descendants of the document root and thus selects all para elements in the same document as the context node . selects the context node .//para selects the para element descendants of the context node .. selects the parent of the context node title ../@lang selects the lang attribute of the parent of the context node

XPath is all about trees A venture into the abstract world of the XPath data model Start filling out the NOTES page

The XPath data model An abstraction of an XML document, after parsing In XSLT, models the source tree, stylesheet tree, & result tree An XML document is a tree of nodes There are 7 kinds of nodes (memorize these!) Root node Element node Attribute node Text node Comment node Processing Instruction (PI) node Namespace node

Root nodes Every XML document has exactly one root node An “invisible” container for the whole document The XPath expression / selects the root node of the same document as the context node The root node is not an element Instead, the “document element” or “root element” is a child of the root node It can also contain: Processing instruction (PI) nodes Comment nodes XSLT extension to XPath data model: Root node may contain text nodes Root node may contain more than one element node

Element nodes There is one element node for each element that appears in a document. (Duh.) Example: <foo><bar/></foo> There are two element nodes above: foo and bar. The foo element contains the bar element node. Element nodes can contain: Text nodes Other element nodes Comment nodes Processing instruction (PI) nodes

Node property: children Applies only to: Element nodes Root nodes Consists of: Ordered list of zero or more other nodes 4 kinds of nodes can be children (memorize this subset!) Element nodes Text nodes Comment nodes Processing instruction (PI) nodes Instead of “Lions, Tigers, and Bears, Oh My”, chant: “Elements, comments, text, PIs! Elements, comments, text, PIs!” Example: <foo><bar/>  </foo> The foo element's children consists of four nodes in order: 1) element, 2) text, 3) comment, 4) text

Why should I memorize that subset of four? Knowing what types of nodes can be children is crucial to understanding what this little, unassuming instruction does (as we shall see): <xsl:apply-templates/> So remember: “Elements, comments, text, PIs!” “Elements, comments, text, PIs!”

How to access the children Use the child axis, e.g. (in non-abbreviated form): child::node() Selects all children of the context node child::* Selects all child elements of the context node child::paragraph Selects all child elements named paragraph child::xyz:foo Selects all child elements named foo in the namespace designated by the xyz prefix child::xyz:* Selects all child elements that are in the namespace designated by the xyz prefix

Attribute nodes There is one attribute node for each attribute that appears in a document. (Duh again.) Example: <foo bar="bat" bang="baz"/> There are two attribute nodes in the above example: bar and bang

Node property: attributes Applies only to: Element nodes Consists of: Unordered list of zero or more attribute nodes For example: <doc lang="en"/> The doc element's attributes property consists of one lang attribute

How to select attributes Use the attribute axis, e.g. (in abbreviated form): @lang Selects the attribute named lang @* or @node() Selects all attributes of the context node @abc:foo Selects the attribute named foo in the namespace designated by the abc prefix @abc:* Selects all attributes that are in the namespace designated by the abc prefix

Text nodes There is one text node for each contiguous sequence of character data in a document Text nodes are never adjacent siblings to each other Adjacent text nodes are always automatically merged into one text node (e.g., when creating the result tree in XSLT) Lexical details are thrown away The XPath data model knows nothing about: CDATA sections, entity references, or character references Example: <foo><</foo> There is one text node in the above document (a < character) Example: <foo><![CDATA[<]]></foo> Identical to the first example, as far as XPath is concerned

Text node quiz Example: <foo> <bar>Hello world.</bar> </foo> • How many text nodes are in the above document?

Text node quiz: ANSWER Example: <foo> <bar>Hello world.</bar> </foo> • How many text nodes in the above document? • ANSWER: 3 • 1: Linefeed, space, space • 2: Hello world. • 3: Linefeed

How to select text nodes Use the text() node test: text() Short for child::text() descendant::text() Selects all text nodes that are descendants of the context node

Comment nodes There is one comment node for each comment Example:

How to select comments Use the comment() node test on the child axis: comment() Short for child::comment()

Processing instruction (PI) nodes There is one PI node for each PI The XML declaration is not a PI <?xml version="1.0"?> is not a PI (It's not a node at all but just a lexical detail that XPath knows nothing about.) Example: (This is a PI.) <?xml-stylesheet type="text/xsl" href="a.xsl"?>

How to select processing instructions Use the processing-instruction() node test Any PI: processing-instruction() Selects all PI children of the context node Short for child::processing-instruction() PI with a specific target: processing-instruction('xml-stylesheet') Selects all xml-stylesheet processing instruction children of the context node

Namespace nodes There is one namespace node for each in-scope namespace URI/prefix binding for each element in a document. (No duh... er... what?) Always includes this (implicit) binding (used by reserved attributes xml:lang and xml:space, etc.): Prefix: “xml” URI: “http://www.w3.org/XML/1998/namespace” Example: <foo/> There is one namespace node in the above document Example: <foo xmlns="http://example.com"/> There are two namespace nodes in the above document The implicit xml one (see above) And this one: Prefix: “” URI: “http://example.com”

Introduction to XSLT