190 likes | 273 Views
Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november. The Semantic Blessings of XSLT. DOXATRIX. Intended audience. Understands English Knows what XML is about Cares about meaning, processing and validation
E N D
Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november The Semantic Blessings of XSLT DOXATRIX
Semantic Blessings of XSLT Intended audience • Understands English • Knows what XML is about • Cares about meaning, processing and validation • Does not need to know about XSLT • Does not need to be a programmer • But might be aware that computers need to be programmed
Semantic Blessings of XSLT Semantic? Blessings? XSLT? • XML is about the structure of a document • Semantics are about “meaning” • A schema can say that a document should have a title (structure) • The documentation might add that a title is used for identification (unique within a set of documents), and give a clue about what the document is about (semantics) • The words used in the title are really semantics • Blessings are good, helpful, you want them • What is XSLT? • How can XSLT help you in adding, verifying and using semantic markup?
Semantic Blessings of XSLT Why bother marking up explicitly?
Semantic Blessings of XSLT NLP is good, Explicit Markup is better • “Plein 26 Den Haag”=<street>Plein</street><nr>26</nr><city>Den Haag</city> • “Plein 1813 Den Haag”=<street>Plein 1813</street><city>Den Haag</city> • XML is about tagging structure • A schema adds semantics • <name>Quattro Staggioni</name>: Pizza by Mario or piece by Vivaldi? • I don’t care (in this presentation)
Semantic Blessings of XSLT eXtensible Stylesheet Language - Transformations • XSL: the eXtensible Stylesheet Language • Family of three W3C recommendations for transformation and presentation • XML Path Language (XPath) • XSL Transformations (XSLT) • XSL Formatting Objects (XSL-FO) XSLT stylesheet 1 XSL-FO document PDF XSL-FO processor XML source document(s) XSLT processor HTML pages XSLT stylesheet 2
Semantic Blessings of XSLT XSLT characteristics • An XSLT style sheet is an XML document • Input is one or more XML documents • Output is one or more XML (XSLT!), HTML, XSL-FO or plain text (CSS!) documents • Style sheet can look like template of the result document (data pull) • Or be event driven (data push) • Elements and attributes are “events” • Functional programming language • Rule based • Declarative • No side effects • Statements can be executed in any order • Embeds XPath • XSLT 2.0 and XPath 2.0 know XML Schema types • XSLT 2.0 can compute from implicit structure
Semantic Blessings of XSLT XSLT engines • stand alone: • Saxon (open source, Michael Kay) • Altova (free, XML Spy) • MSXML • on server: • Saxon + .NET • Altova + .NET • MSXML + ASP • built in browser: • IE6 and higher • FF1 and higher • Opera9 and higher
Semantic Blessings of XSLT What’s the competition? • CSS (Cascading Style Sheets) • Easier, simpler • Don’t transform • Perl, Python, Java, JavaScript, C(++), (V)Basic • Generic programming or scripting languages • No built in knowledge of XML, but lots of libraries for DOM or SAX • JSP, ASP, PHP • Server side processing • Not really XML aware • Little or no transformation • IS-10179 DSSSL: Document Style Semantics and Specification Language • SGML based • Rarely used
Semantic Blessings of XSLT XSLT and semantics... • XML elements describe what the content is (semantics) • XSLT stylesheets what to do (processing) with them • How can a processing stylesheet be a semantic blessing?
Semantic Blessings of XSLT Blessing 3: XSLT 2.0 may be schema aware • A schema defines the semantics of a document type • XSLT 2.0 is based on XPath 2.0 • XSLT 2.0 may use schemas • Then, XPath 2.0 can use the type of element types or attributes • So it can know whether to treat an attribute as string or as integer(”12” < ”3” if type is string, ”12” > ”3” if type is integer) • But will it sort correctly:<song title=”50 ways to leave your lover” performer=”Paul Simon” /><song title=”1919 rag” performer=”Kid Ory” />or<king name=”Henry VIII” born=”1491-06-28” died=”1547-01-28” /><king name=”Henry IX” born=”1725-03-11” died=”1807-07-13” />(yes, if the roman numbers were coded as Ⅷ and Ⅸ) • With the “instance of” operator you can use information that is not in the document, but is in the schema • Therefore, XSLT 2.0 disencourages stand alone processing • From a semantic point of view, that’s a blessing
Semantic Blessings of XSLT Blessing 4: Schema independent processing (1) • In a sequence group, the order contains no information:(title, abbreviated-title?) (1)is equivalent to(abbreviated-title?, title) (2) • Suppose, you want to print the abbreviated title if one is coded, and otherwise the full title • In streamprocessing, the q&d solution might be as simple as:temp=getNextElement; if existsNextElement then write(getNextElement) else write(temp); (1)orwrite(getNextElement); (2) • But what if you decide to change from order (1) to (2)? • Or add an optional element toc-title?(title, abbreviated-title?, toc-title?) (1)(toc-title?, abbreviated-title?, title) (2) • The simple program breaks
Semantic Blessings of XSLT Blessing 4: Schema independent processing (2) • In XSLT, you have access to the elements by name, in arbitrary order • The style sheet fragment looks like<xsl:choose> <xsl:when test="./abbreviated-title"> <xsl:value-of select="abbreviated-title"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="title"/> </xsl:otherwise></xsl:choose> • If the schema (and documents) change order, the stylesheet remains the same • If an optional toc-title is added, the stylesheet remains the same • Verbosity turns out to be simpler, in the long run • By the way, if sequence matters in the document, it shouldn’t in the schema • Reasons to prescribe sequence: • to ease input • to enforce cardinality
Semantic Blessings of XSLT Blessing 5: functional programming • No variables • Suppose you want to sort items alphabetically and do act on each new letter • First idea:<xsl:variable name="PrevLetter" select="' '" /><xsl:for-each select="book"> <xsl:sort select="title" data-type="text" order="ascending"/> <xsl:variable name="ThisLetter" select="substring(title/.[1],1,1)" /> <xsl:if test="$PrevLetter!=$ThisLetter"> <H2><xsl:value-of select="$ThisLetter"/></H2> </xsl:if> <xsl:variable name="PrevLetter" select="$ThisLetter" /> <H3><xsl:value-of select="title"/></H3> </xsl:for-each> • No good: the value of the variable PrevLetter is reset in every iteration of the for-each loop
Semantic Blessings of XSLT Would this work? <xsl:for-each select="book"> <xsl:sort select="title" data-type="text" order="ascending"/> <xsl:variable name="PrevLetter" select="substring(preceding-sibling::book[1]/title/.[1],1,1)" /> <xsl:variable name="ThisLetter" select="substring(title/.[1],1,1)" /> <xsl:if test="$PrevLetter!=$ThisLetter"> <H2><xsl:value-of select="$ThisLetter"/></H2> </xsl:if> <H3><xsl:value-of select="title"/></H3> </xsl:for-each> • Better, but the function preceding-sibling operates on the original order, not on the sorted... • Is that a bug or a feature? • It’s a blessing!
Semantic Blessings of XSLT The solution <xsl:for-each-group select="book" group-by="substring(title/.[1],1,1)"> <H2><xsl:value-of select="current-grouping-key()"/></H2> <xsl:for-each select="current-group()"> <xsl:sort select="title" data-type="text" order="ascending"/> <H3><xsl:value-of select="title"/></H3> </xsl:for-each> </xsl:for-each-group> • Think XML • Think in creating hierarchies: groups of titles starting with the same letter
Semantic Blessings of XSLT The ultimate semantic normalisation • “PCDATA considered harmful” (Han Nonnekes, Shell Oil) • Text is the outer structure in a specific language of a deeper meaning • You should encode a text as that deeper tree • With references to abstract words (concepts) • For each language (“English, upper class, around 1850”) give dictionary and transformation rules • Then generate the text
Semantic Blessings of XSLT Questions? • Ask me now • Ask me during lunch or tea break • Ask me during buffet • Mail dg@doxatrix.nl • Presentation can be downloaded from • www.xmlholland2008.nl • www.doxatrix.nl/dg