210 likes | 363 Views
Experiences of Document Transformations with XSLT and DOM. Anne Honkaranta, Virpi Lyytikäinen, Pasi Tiitinen, University of Jyväskylä, Finland inSGML project. Content. Poem Publishers, Inc. Poems Publishing environment Transformations Tranformation techniques
E N D
Experiences of Document Transformations with XSLT and DOM Anne Honkaranta, Virpi Lyytikäinen, Pasi Tiitinen, University of Jyväskylä, Finland inSGML project University of Jyväskylä/AHo & VLy
Content • Poem Publishers, Inc. • Poems • Publishing environment • Transformations • Tranformation techniques • Transformations in server-client environment • Tranformations in Poem Publishers, Inc • Challenges encountered • Lessons learned University of Jyväskylä/AHo & VLy
Poem Publishers, Inc. • Fictional company • Publishes Finnish poems on WWW • Poems are authored in XML format according to a DTD • The company offers the poets an authoring environment if so desired • The poems can form collections University of Jyväskylä/AHo & VLy
Poem.dtd University of Jyväskylä/AHo & VLy
Publishing environment • Microsoft IIS server v. 5.0 • Jscript, VBScript • ASP 3.0 • DOM II • Internet Explorer 5.5 or newer • CSS Level 2 • MSXML 3.0 University of Jyväskylä/AHo & VLy
Transformation • Changing/converting document • format • structure /information schema • content organization • filtering the content • all the above • Conversion, filtering, and transformation are sometimes used as synonyms University of Jyväskylä/AHo & VLy
Why you need transformations? • Authors need content-oriented DTD • Different end-user devices • When managing documents we need to have them in an optimal format for processing • --> three-step publication process • authoring -- processing -- output University of Jyväskylä/AHo & VLy
Event-based mapping technique Tree-based mapping technique Examples of languages • SAX-Simple API for XML • Omnimark language/program • DOM (document object model) — API • Balise language/program • XSLT language Pros/cons. • fast, uses computing resources efficiently • does not give very good control over schema (dtd, grammar) of an output document • constructing a parse tree in memory takes resources • good control over schema of an output documen • best suited for complex (context) transformation) Transformation techniques University of Jyväskylä/AHo & VLy
Transformations in client-server environment (XSLT/DOM) • Alternatives: • using PI in XML source document (c) (can be written to the source document on a web server) • DOM-interface and DOM objects for loading the source XML and XSLT (c/s) • using DOM-interface + scripting language (Vbscript, Jscript) or Java University of Jyväskylä/AHo & VLy
Source XML doc Output doc. + link to CSS XSLT doc. CSS doc. Server/Client Client Transformation chain (an example) Output HTML/ XHTML doc rendered by CSS University of Jyväskylä/AHo & VLy
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href=”poem_html.xsl" ?> <!DOCTYPE POEM SYSTEM "Poem1.dtd"> ... <xsl:stylesheet..... <html> <head><meta> <LINK rel="stylesheet” type="text/css” href="runo_htm.css" > </LINK> Example:using PI in source XML University of Jyväskylä/AHo & VLy
<HTML><BODY><HEAD></HEAD> <SCRIPT LANGUAGE=VBSCRIPT> Dim objDocument, objXSL, strXML Set objDoc = CreateObject("MSXML2.DOMDocument") Set objXSL = CreateObject ("MSXML2.DOMDocument") objDoc.async=false objXSL.async=false objDoc.Load "../Runot/Pinkku1.xml" objXSL.Load "runo1_htmlksi2.xsl" strXML=objDoc.transformNode(objXSL) Document.Write strXML </SCRIPT> </BODY></HTML> Example: using DOM-objects+XSLT University of Jyväskylä/AHo & VLy
<HTML><HEAD><TITLE>Inspect nodes of poem</TITLE></HEAD> <BODY> <SCRIPT LANGUAGE="VBSCRIPT" CODEPAGE="iso-8859-1" LCID="1033"> Dim root, xmlDoc, child Set xmlDoc = CreateObject("Msxml2.DOMDocument") xmlDoc.async = False xmlDoc.load("Runot/Pinkku1.xml") 'Walk from the document to each of its child nodes: For Each child In xmlDoc.childNodes document.write ”type of node:" & child.nodeType & " | " document.write ”name of node:" & child.nodeName & " | " document.write ”content of node:" & child.text & "<BR>" Next </SCRIPT></BODY></HTML> Example: using Vbscript+DOM University of Jyväskylä/AHo & VLy
Transformation ”types” tested in Poem Publishers, Inc. • XML-to-XML • XML-to-HTML • XML-to-XHTML University of Jyväskylä/AHo & VLy
Transformation needs tested in Poem Publishers, Inc. • Tasks tested: • combining multiple source documents into output view (poem+header/footer, poem list, poem metadata) • combining multiple source documents into one file (making a poem collection) • combining XSLT transformation documents for transformation needs (poem+footer) University of Jyväskylä/AHo & VLy
Example: combining XSLT-stylesheets <?xml version=”1.0” encoding=”iso-8859-1”?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1999/REC-html401"> <xsl:import href="header.xsl"/> <xsl:output method="html" encoding="ISO-8859-1" /> <?xml version=”1.0” encoding=”iso-8859-1”?> !-- Filename: header.xsl --> <xsl:stylesheet xmlns:xsl= "http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1999/REC-html401"> <xsl:output method="html" encoding="ISO-8859-1" /> <xsl:template match="/" name="header"> University of Jyväskylä/AHo & VLy
Challenges Encountered • Problems with • parsers and versions • character encodings • figures and links • ”too many” tools, scripting languages, and programs University of Jyväskylä/AHo & VLy
Example: Character encodings and parser • MSXML OUTPUT DOC MSXML 3.0 INPUT DOC -input doc encoding -maybe character entities -entities are changed to actual character reps. when transformed -uses UTF-16 -detects output encoding from PI when appropriate load/save methods used -otherwise outputs UTF-16 -has some encoding -has an encoding declaration -problem: either of them is ”wrong” University of Jyväskylä/AHo & VLy
Possibilities • you can use XSLT-stylesheets as components and combine them • a stylesheet can be seen as a re-usable component on the server • you can also chain transformations • you can keep your data in content-oriented form and provide multiple output versions by using transformations • problem: management of DTD’s, transformation components and versions University of Jyväskylä/AHo & VLy
Lessons learned • Use same character encodings in source documents and transformation scripts • Offer a content oriented DTD for your authors; there is propably need for transformations anyway • Support level of CSS, XSLT and XML varies in browsers • Tools are available for building XML publishing environments: allow extra time for dealing with possible problems • Multiple skills and tools needed in publishing environment, XML is not enough! University of Jyväskylä/AHo & VLy
More information: inSGML project http://haades.it.jyu.fi/inSGML/ University of Jyväskylä/AHo & VLy