560 likes | 675 Views
XML and XSL Overview. by Alex Chaffee alex@jguru.com, http://www.purpletech.com/ Purple Technology: Open source development jGuru: Java online resource FAQs and News and other cool stuff. XML. eXtensible Markup Language Replacement for HTML Metalanguage - used to create other languages
E N D
XML and XSL Overview by Alex Chaffee alex@jguru.com, http://www.purpletech.com/ Purple Technology: Open source development jGuru: Java online resource FAQs and News and other cool stuff
XML • eXtensible Markup Language • Replacement for HTML • Metalanguage - used to create other languages • Has become a universal data-exchange format
Advantages of XML • Human-readable • Machine-readable (easy to parse) • Standard format for data interchange • Possible to validate • Extensible • can represent any data • can add new tags for new data formats • Hierarchical structure (nesting)
Why not HTML? • Browsers are too lenient • Led to sloppy HTML code all over the Web • <imG src="foo.gif> is "legal" HTML • Told HTML, "go to your room and don't come out until it's clean" • Out came XML
XML Searching and Agents • An early motivation for XML • Allows detailed queries of disparate data sources • Find best price for certain product • Search for properties with different real estate brokers • HTML insufficient • Good for humans, bad for computers • Doesn't scale
XML Example <?xml version="1.0"?> <!DOCTYPE menu SYSTEM "menu.dtd"> <menu> <meal name="breakfast"> <food>Scrambled Eggs</food> <food>Hash Browns</food> <drink>Orange Juice</drink> </meal> </menu>
XML Languages • MML - musical scores • CML - chemicals • HRMML - Human Resource Management (???) • MathML - equations • RSS - web syndication
Tag vs. Element • A tag is a name, enclosed by angle brackets, with optional attributes • <foo id=“123”> • An element is a tree, containing an open tag, contents, and a close tag • <foo id=“123”>This is <bar>an element</bar></foo>
XML Syntax • Tags properly nested • Tag names case-sensitive • All tags must be closed • or self-closing • <foo/> is the same as <foo></foo> • Attributes enclosed in quotes • Document consists of a single (root) element • A few other details
Well-Formed vs. Valid • Well-Formed: • Structure follows XML syntax rules • Valid: • Structure conforms to a DTD
DTD • Document Type Definition • A grammar for XML documents • Defines • which elements can contain which other elements • which attributes are allowed/required/permitted on which elements
DTD and Data Exchange • Both sides must agree on DTD ahead of time • DTD can be part of document or stored separately
DTD Example <?xml encoding="US-ASCII"> <!ELEMENT menu (meal)*> <!ATTLIST menu name CDATA #OPTIONAL> <!ELEMENT meal (food|drink)*> <!ATTLIST meal name CDATA #REQUIRED> <!ELEMENT food (#PCDATA)*> <!ELEMENT drink (#PCDATA)*>
Why isn't a DTD in XML? • It will be someday: XSchema
XML Namespaces • A single document can use multiple DTDs • But! Two DTDs can use the same element name with different rules • Solution: Namespaces • Must prefix tag name with namespace name • e.g. <xsl:apply-templates select="."/>
Entities • Macros / constants • Values defined once, used in document <!DOCTYPE foo SYSTEM "foo.dtd" [ <!ENTITY background "#99FFFF"> ]> <BODY BGCOLOR="&background;">
SML / Minimal XML • Simplified Markup Language • Subset of XML, but stripped down • Easier to understand, parse • No • DTDs • Attributes • Processing instructions • etc.
XSL • The eXtensible Style Language • Transforms XML into HTML • Actually, transforms XML into a tree, then turns that tree into another tree, then outputs that tree as XML
XSL Architecture XSL Stylesheet XML Source XSL Processor HTML Output
XML is a Tree menu <?xml version="1.0"?> <!DOCTYPE menu SYSTEM "menu.dtd"> <menu> <meal name="breakfast"> <food>Scrambled Eggs</food> <food>Hash Browns</food> <drink>Orange Juice</drink> </meal> <meal name="snack"> <food>Chips</food> </meal> </menu> meal meal name food food drink "breakfast" "Scrambled Eggs" "Hash Browns" "Orange Juice"
XML Is A Tree • Nodes • Branch nodes contain children • Leaf nodes contain content • Attributes, Values, Entities, etc. • DOM provides API-based access to tree models • XSL turns one tree into a different tree
Command Line Invocation • Apache Xalan java org.apache.xalan.xslt.Process -IN faq.xml –XSL faq.xsl –OUT faq.html • IBM LotusXSL java com.lotus.xsl.xml4j.ProcessXSL -in servletfaq.xml -xsl faq.xsl -out faq.html • And so on…
Formatting Objects • Forget about it for now
XSLT • The meat of XSL • Syntax for making XSL template files • Pattern matching • Output formatting • Rules-based (like Prolog)
XPath • The stuff inside the quotes in XSL patterns • "/person/name/firstname" • A sensible way to locate content in an XML document • More straightforward than walking a DOM tree or waiting for a SAX callback
XPath Syntax • book/title • title child of book child of current node • /book/title • title child of book child of document root • @language • language attribute of current node • chapter/@language • language attribute of chapter child of current node
XPath Syntax (cont.) • chapter[3]/para • all the para children of the third chapter • book/*/title • all title children of all children of book (but not of their children) • chapter//para • all para children of any child of chapter, recursively • ../../title • title child of parent of parent • parent::node()/parent::node()/child::title
XPath Functions • para[1] or para[position()=1] • the first para node of the current node • para[last()] • para[count(child::note)>0] • all paragraphs with one or more notes • para[id("abstract")] • selects all child nodes like <para id="abstract"> • para[@type='secret'] or para[attribute::type='secret'] • selects all child nodes like <para type="secret">
XPath Functions (cont.) • para[not(title)] • selects all child paragraphs with no title elements • para[position() >= 2 and position() < last()] • selects all but the first and last paragraphs • para[lang("en")] • matches <para xml:lang="en-uk">…</para> • note[contains(., "alex")] • . means "test childrens' content too, recursively" in this context • note[starts-with(., "hello")]
XPath Disadvantages • Not XML • Not hierarchical • New syntax rules • Weird mix of /,[],(),*,:,::,.,..,@ • New function set • Not Java • Concepts like "axis" not always clear
XSL Rules • XSL is a series of rules or templates • Each template matches an element • Templates can contain XML commands
XSL Commands: apply-templates • Main rule: apply-templates • looks for a template match • applies it • Usually the template calls apply-templates recursively on its children • If not, then processing stops at that node (but continues for its other siblings that matched this template)
Default Rule • For a leaf node, output its contents • For a branch node, apply templates (recursively) (including default rule)
Some XSL Commands • value-of • grabs raw value, good for text elements and attributes • if • executes conditionally • number • counts position of element in group • good for ordered list numbering, table of contents, etc.
XSL Example <?xml version="1.0"?> <!DOCTYPE xsl:stylesheet [ <!ENTITY background "#99FFFF"> ]> <xsl:stylesheet xmlns:xsl="http://www.w3.org/XSL/Transform/1.0" xmlns="http://www.w3.org/TR/REC-html40" result-ns="">
Example (cont.) <xsl:template match="menu"> <HTML> <HEAD> <TITLE>Menu: <xsl:value-of select="@name"/> </TITLE> </HEAD> <BODY BGCOLOR="&background;"> <H1> Menu <xsl:value-of select="@name"/> </H1> [Note: Can reuse contents, unlike CSS]
Example (cont.) <xsl:apply-templates /> </BODY> </HTML> </xsl:template>
Example (cont.) <xsl:template match="meal"> <H2><xsl:value-of select="@name"/></H2><br />; <UL> <xsl:apply-templates/> </UL> </xsl:template>
Example (cont.) <xsl:template match="food"> <LI><xsl:apply-templates/></LI> </xsl:template> <xsl:template match="drink"> <LI><xsl:apply-templates/></LI> </xsl:template> </xsl:stylesheet>
Outputting Attributes • From This: • <link> <name>Stinky</name> <url>http://www.stinky.com/</url></link> • We Want This: • <A href="http://www.stinky.com/">Stinky</A>
Outputting Attributes • The Hard Way: • <xsl:element name="A"> <xsl:attribute name="href"> <xsl:value-ofselect="url" /> </xsl:attribute><xsl:value-ofselect="name" /></xsl:element> • The Easy Way: • <A href="{url}"> <xsl:value-of select="name"/></A>
Copying Subtrees • <xsl:template match="*|@*|text()"> <xsl:copy> <xsl:apply-templates select="*|@*|text()"/> </xsl:copy></xsl:template> • No, I don't understand it either • Default copy rule strips all tags/attributes • Also copy-of
XSL conditionals: if • <xsl:if test="author">by <xsl:apply-templates select="author" /></xsl:if> • Note: no else (?!?)
XSL Conditonals: choose • Case 1 • <link> <name>Stinky</name> <url>http://www.stinky.com/</url></link> • <a href="http://www.stinky.com/">Stinky</a> • Case 2 • <link> <url>http://www.stinky.com/</url></link> • <a href="http://www.stinky.com/">http://www.stinky.com/</a> • Case 3 • <link> <name>Stinky</name></link> • Stinky
XSL Conditionals: choose • <xsl:choose><xsl:when test="url"> <A href="{url}"> <xsl:choose><xsl:when test="name"><xsl:value-ofselect="name" /></xsl:when><xsl:otherwise><xsl:value-ofselect="url" /></xsl:otherwise></xsl:choose></A></xsl:when><xsl:otherwise><xsl:value-ofselect="name" /></xsl:otherwise></xsl:choose>
XSL Looping: for-each • <xsl:for-each select="chapter"> <h2><xsl:value-of select="@title"/> </h2></xsl:for-each> • Functional overlap with apply-templates • Difference in programming style • Use it inside a given template rule
Template Modes • Same element name, different context -> different template, different output • Can invoke apply-templates with a mode, matches corresponding moded template • <h1>Table of Contents</h1><ol><xsl:apply-templates select="chapter" mode="toc"/></ol> • <xsl:template select="chapter" mode="toc"> <li><xsl:value-of select="@title"/></li></xsl:template> • <xsl:template select="chapter"> <h1><xsl:value-of select="@title"/></h1> <xsl:apply-templates/></xsl:template>