430 likes | 619 Views
XML a first course Part 2. Yaakov J. Stein Chief Scientist RAD Data Communications. Course Objectives. XML what and why? Well-formed XML Displaying XML in IE Valid XML and DTD s Parsing XML using JavaScript Processing XML using XSL. XML. Parsing XML using JavaScript. XML Parsers.
E N D
XMLa first coursePart 2 Yaakov J. Stein Chief ScientistRAD Data Communications
Course Objectives • XMLwhat and why? • Well-formed XML • Displaying XML inIE • Valid XML andDTDs • Parsing XML usingJavaScript • Processing XML using XSL
XML Parsing XML using JavaScript
XML Parsers All XML parsers MUST check for well-formed input Some XML parsers are validating, others nonvalidating There are two XML parser “philosophies” • Event driven parsers (SAX) • Fast and small memory footprint • Output parsing results on-the-fly • Application must store information it needs • Can use stack to track hierarchy • Tree parsers (DOM) • Slow and large memory footprint • Build full tree first, then user can traverse tree • Exploit “Object Oriented” languages
SAX Simple API for XML (present version SAX 2.0) Not developed by W3C BUT de-facto standard Versions for Java(Apache Xerces parser), C++, VB, Python, Perl some ContentHandler methods (callbacks) void setDocumentLocator (Locator locator) supplies application with event location void startDocument() throws SAXException receive notification of XML beginning void endDocument() throws SAXException receive notification of XML end void startElement (…) throws SAXException receive notification of element start tag void endElement (…) throws SAXException receive notification of element end tag void characters (…) throws SAXException receive notification of text void ignorableWhitespace(…) throws SAXException receive notification of space Example startElement quote characters “to be” startElement bold characters “or” endElement bold characters “not to be” endElement quote <quote> to be <bold> or </bold> not to be </quote>
vehicles airplanes bicycles motor_vehicles trucks cars Document Object Model DOM - API that provides access to XML/HTML document structure - Enables reading, deleting, changing, adding elements/attributes There is a good match between XML and tree hierarchy and object oriented programming <vehicles> <airplanes/> <motor_vehicles> <trucks/> <cars/> </motor_vehicles> <bicycles/> </vehicles> vehicles vehicles.airplanes vehicles.motor_vehicles vehicles.motor_vehicles.trucks vehicles.motor_vehicles.cars vehicles.bicycles
Nodes The basic unit in the DOM tree is the Node object Nodes that are not null also implement more specialized interfaces Node properties • nodeName (readonly String) • nodeType (readonly unsigned short) • nodeValue (String) • attributes (readonly NamedNodeMap) • parentNode (readonly Node) • childNodes (readonly NodeList) • firstChild (readonly Node) • lastChild (readonly Node) • previousSibling (readonly Node) • nextSibling (readonly Node) • ownerDocument (readonly Document) • prefix (String) • localName (readonly String) • namespaceURI (readonly String) Node methods • boolean hasChildNodes() • Node cloneNode(…) • Node appendChild(…) • Node removeChild(…) • Node replaceChild(…) • Node insertBefore(…) • void normalize() • boolean hasAttributes() • boolean isSupported(…)
Node Types The W3C DOM defines the following types (as constants in the Node object - but IE doesn’t implement) constant’s name nodeNamenodeValuedata type • ELEMENT_NODE tag’s name null Element • ATTRIBUTE_NODE attribute’s name value Attr • TEXT_NODE #text text Text • CDATA_SECTION_NODE #cdata_section text CDATASection • ENTITY_REFERENCE_NODE referenced name null EntityRerence • ENTITY_NODE entity’s name null Entity • PROCESSING_INSTRUCTION_NODE PI’s targetrest of PI ProcessingInstruction • COMMENT_NODE #comment text Comment • DOCUMENT_NODE #document null Document • DOCUMENT_TYPE_NODE dtd name null DocumentType • DOCUMENT_FRAGMENT_NODE #document-fragmentnull DocumentFragment • NOTATION_NODE notation’s name null Notation
Elements Element nodes have the following properties and methods (for full list see W3C site) Property • tagName (readonly String) Methods • boolean hasAttribute(String name) • String getAttribute(String name) • void setAttribute(String name, String value) • Attr getAttributeNode (String name) • Attr setAttributeNode(Attr newAttr) • void removeAttribute(String name) • Attr removeAttributeNode(Attr oldAttr) • NodeList getElementsbyTagName(String name)
Attributes Attr nodes have the following properties (no methods) Properties • name (readonly String) • ownerElement (readonly Element) • specified (readonly boolean) • value (String)
NodeList and NamedNodeMap NodeList is an array of nodes Node.childNodes Property • length (readonly unsigned long) Method • Node item (unsigned long index) nl.item(k) is the same as nl[k] NamedNodeMap is a collection of Nodes indexed by names Property Node.Attributes • length (readonly unsigned long) Methods • Node item(unsigned Long index) • Node getNamedItem(name) • Node setNamedItem(…) • Node removeNamedItem(name)
Character Data CharacterData nodes are the father of text and comment nodes text nodes are the father of CDATASection nodes Properties • data (String) • length (readonly unsigned long) Methods • appendData() • deleteData() • insertData() • replaceData() • substringData() Node CharacterData Text Comment CDATASection
Document Document nodes are needed to start everything Properties • documentElement (readonly Element) root element of xml • Doctype (readonly DocumentType) dtd Methods • Element createElement(name) • Attr createAttribute(name) • Text createTextNode(…) • Comment createComment(…) • createEntityReference(…) • createCDATASection(…) • createProcessingInstruction(…) • createDocumentFragment(…) • Element getElementById(id) • NodeList getElementsByTagName(name) • createNodeIterator(…) • createTreeWalker(…)
Parsing with JavaScript There are DOM interfaces for many (object oriented) languages • Java • JavaScript, ECMAScript, Jscript • C++ • VBScript It is easier to use a scripting language • Many required features are pre-programmed • Interpreted, not compiled • Platform independent JavaScript runs only inside a browser JavaScript is easier that Java which is easier than C++ (kids use it!) JavaScript is FUN (kids use it!)
How to use JavaScript Use JavaScript by placing script tags in HTML document <SCRIPT LANGUAGE="javascript"> internal javascript code </SCRIPT> or URL <SCRIPT LANGUAGE="javascript“ SRC=“filename.js”></SCRIPT> You can place SCRIPT tag anywhere, in HEAD or in BODY It is recommended to hide scripts from older non-scripting browsers <!-- HTML COMMENT <SCRIPT LANGUAGE="javascript"> // JAVASCRIPT COMMENT </SCRIPT> --> <NOSCRIPT> <H1> This page requires a modern browser! </H1> </NOSCRIPT>
Quick overview of JavaScript ECMAscript, see ECMA-262 Object oriented(object has properties, methods and events) Loosely typed(string(default), numbers, boolean) functionswith arguments(not checked even for number)optional return value var declares local scope new allocates object don’t need; Operators ++ -- +(numbers,strings) - * / %(mod) << etc += etc < <= > >= == != ~(bit negation) ! && || ?: (conditional) , NaN infinity Flow if if/else while for (c-like) for/in continue break return with Math PI E SQRT2 abs ceil floor round max min sqrt pow eval sin cos tan acos asin atan exp log random Date WeekDay DayFromTime DaysInYear etc.etc.etc.
Javascript Events EVENTS Onclick Mouse click Ondblclick Mouse double click onmouseover Mouse enters an element onmouseout Mouse leaves an element onmousemove Mouse moves onmousedown Mouse button is pressed onmouseup Mouse button is released onkeypress Visible character is pressed onkeydown Key is pressed onkeyup Key is released onload Document has finished loading onblur Element loses the focus onfocus Element gains the focus
Javascript Example <HTML> <HEAD> <SCRIPT language=“javascript”> function hi() { with (hello.style) { posLeft=event.clientX; posTop=event.clientY; } } function flying() { with (fly.style) { if (posLeft<300) { posLeft+=5; posTop+=5; } else { posLeft=10; posTop=10; } } setTimeout('flying()',10); } </SCRIPT> </HEAD> <BODY onload="flying()" onclick="hi()"> <P ID="hello" style="position:absolute;top:100;left:100"> Hello World! </P> <SPAN ID="fly" style="position:absolute;top:10;left:10"> I'm Flying!!! </SPAN> </BODY> </HTML> DHTML
XML Islands What happens when we define an XML island inside an HTML file ? <html> <head> <title>XML Island Demo</title> </head> <body> <!-- xml island --> <xml id="hellodata" src="hello.xml"></xml> </body> </html> Nothing happens - the XML is in the DOM, but the browser doesn’t know what to do! (When we directly display an XML file HTML uses a default XSL) We have to manually extract from the XML DOM and insert it into the browser window as HTML!
An IE specific-feature XML islands are Microsoft-specific, and Microsoft supplies some non-standard ways of retrieving info <html> <head> <title>XML island Demo</title> </head> <!-- xml island --> <xml id="hellodata" src="hello.xml"></xml> <body> <B>printout</B> <span dataSrc="#hellodata" dataFld="message"></span> </body> </html> <?xml version="1.0"?> <printout> <message> Hello world! </message> </printout> printout Hello world!
Javascript to the rescue Using javascript we can access the XML DOM in a standard way! <html> <head> <title>XML DOM Demo</title> </head> <!-- xml island --> <xml id="hellodata" src="hello.xml"></xml> <body> <script language=javascript> alert(hellodata.xml) document.write(hellodata.xml) </script> </body> </html> <?xml version="1.0"?> <printout> <message> Hello world! </message> </printout> alertdisplays the DOM object write displays the text (suppresses tags)
Let’s try a more interesting file! <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="modems.xsl"?> <!-- modems.xml --> <!DOCTYPE modems SYSTEM "modems.dtd"> <modems> <copper> <name>ASM-20</name> <webpage>products/family/asm-20/asm-20.htm</webpage> <medium>4-wire</medium> <linecode>D1</linecode> <sync>synchronous</sync> <management>unmanaged</management> <minrate>19.2</minrate> <maxrate>256</maxrate> <maxrange>7.5</maxrange> <interfaces> <interface>V.24</interface> . . . </interfaces> </copper> . . . </modems> Try alert and document.write !!!
Javascript Access to DOM What happens when we walk through the DOM tree? <script language="JavaScript"> // main section of DOM (DTD after xsl please!) document.writeln("The document has " + modemdata.childNodes.length + " sections.<br>") for (n=0;n<modemdata.childNodes.length;n++) { document.writeln( "<font color='red'>" + n + "</font>" + " nodeType=" + modemdata.childNodes(n).nodeType + " nodeName=" + modemdata.childNodes(n).nodeName + " nodeValue=" + modemdata.childNodes(n).nodeValue + "<br>" ) } </script> The document has 5 sections. 0 nodeType=7 nodeName=xml nodeValue=version="1.0“ 1 nodeType=7 nodeName=xml-stylesheet nodeValue=type="text/xsl" href="modems.xsl“ 2 nodeType=8 nodeName=#comment nodeValue= modems.xml 3 nodeType=10 nodeName=modems nodeValue=null 4 nodeType=1 nodeName=modems nodeValue=null the XML tree
Let’s walk through the real tree! // first get the XML root node var rootnode = modemdata.documentElement // var rootnode = modemdata.childNodes(modemdata.childNodes.length-1) var nmodems = rootnode.childNodes.length document.writeln("<h1> The root is <font color='blue'>" + rootnode.nodeName + "</font>" + " and it has " + nmodems + " child nodes.</h1>") // now traverse XML tree for (n=0;n<nmodems;n++) { // find the modem var thismodem = rootnode.childNodes(n) document.writeln("<h2>"+ n + ". " + thismodem.nodeName+"</h2>") numfields = thismodem.childNodes.length // print all the child nodes for this modem for (i=0;i<numfields;i++) { document.writeln( "<font color='red'>" + i + "</font> " + "<font color='green'>" + thismodem.childNodes(i).nodeType + "</font> " + "<font color='blue'>" + thismodem.childNodes(i).nodeName + "</font> " + thismodem.childNodes(i).text + "<br>") }
And the answer is … The root is modems and it has 7 child nodes. 0. copper 01name ASM-20 11webpage products/family/asm-20/asm-20.htm 21medium 4-wire 31linecode D1 41sync synchronous 51management unmanaged 61minrate 19.2 71maxrate 256 81maxrange 7.5 91interfaces V.24 RS-232 V.35 V.36 X.21 …
More generally There are more levels and we have to recursively walk through the tree function parseChildren(node) var x = node.childNodes var n = x.length if (n>0) { for (var I=0; I<n; I++) { . . . parseChildren( x(i) ) } } } There will usually be attributes (etc) as well We often want to jump to specific nodes, etc We may want to append, delete, change nodes and persist the changes EXERCISE TIME!! See NodeIterator and TreeWalker
XML Processing XML using XSL
Stylesheets Stylesheets are commonplace in presentation tools They enable customization, standardization of documents A stylesheets is usually a set of rules describing how different elements are to be displayed For example • look of headers • font face and size • effects (underline, bullets) • Use of color Cascaded Style Sheets are used to changes HTML defaults SGML had DocumentStyle and SemanticsSpecificationLanguage • Based on Scheme (LISP variant) • Influenced XSLT’s philosophy, but not its syntax
CSS We can add style to XML using CSS - just like HTML <?xml version="1.0"?> <?xml-stylesheet type="text/css" href="biblio3.css"?> <!DOCTYPE bibliography SYSTEM "biblio3.dtd"> <bibliography> <book> <title>. . . </title> </bibliography> book {display:block} article {display:block} talk {display:block} title {display:block; background:red; color:yellow; font-size:20pt;} author {color:blue; font-size:20pt;} But such style is very limited • Treatment of tags is not environment dependent • Can hide tags (display:none) but can’t sort or filter them • CSS is not a full programming language • CSS is not XML-based and not extensible
XSL One can process with procedural languages (e.g. javascript) But instead one can use an XML-based pattern matching language • First step of compilation is XML • Declaritive languages are more suitable for transformation applications XSL eXtensible Stylesheet Language • XSL has 2 components XSLT and XSLFO • Both are XML applications (can be verified using DTD) XSLT has 2 versions NEW VERSION (MSXML3, IE6?) <xsl:stylesheet version=“1.0” xmlns:xsl=“http://www.w3.org/1999/Transform”> OLD VERSION (IE5+) <xsl:stylesheet version=“1.0” xmlns:xsl=“http://www.w3.org/TR/WD-xsl”> XSLT is supported by • IE5+ • XMLSPY • Apache’s Xalan • Saxon • XP • Sablotron • Unicorn • Xesalt
XSL Transformations If we are already processing the XML file (XML in XML out) we can do a lot more! Examples: • Change tag names (e.g. <para> … </para> to <P> … </P>) • Change attributes to child elements or vice-versa • Manipulate fields (including numeric computation) • Reorder elements • Change entire hierarchical structure • Filter elements or SELECT records Hence there are two equivalent opening tags <xsl:stylesheet version=“1.0” …> for “embedded” XSL <xsl:transform version=“1.0” …> for “standalone” XSLT not in IE XML format conversion
XSLT Processing XSLT • Inputs 2 XML files: XML and XSL • Outputs 1 XML file (can be HTML for display) XSLT supports recursion and iteration (it relies on an XML DOM parser) XSLT supports XPath (although IE support is minimal) XSLT supports internationalization (languages) Unfortunately, present-day XSLT processors are limited • require tree in memory and are hence limited in database size (write SAX programs for large applications) • are relatively slow Processing features: • template matching commands • value commands extract fields • standard programming constructs (e.g. basic math, loops, conditionals) • special features (e.g. filtering, sorting) • noncommands are passed to output
Simple XSL Example <?xml version="1.0"?> <?xml-stylesheet type="text/css" href="biblio3.css"?> <!DOCTYPE bibliography SYSTEM "biblio3.dtd"> <bibliography> <book> <title>. . . </title> </bibliography> <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/TR/WD-xsl" > <xsl:template match="/"> <html> <body> <H1> Bibliography </H1> <xsl:apply-templates/> </body> </html> </xsl:template> <xsl:template match="bibliography"> <xsl:apply-templates/> </xsl:template> <xsl:template match="book|article|thesis|talk"> <p><b><xsl:value-of select="title"/></b></p> <p><xsl:value-of select="author"/></p> </xsl:template> </xsl:stylesheet> Bibliography Digital Signal Processing . . . Y. Stein Critical Temperature . . . Y. Stein Storage Capacity for Neural Network Models Y. Stein
template match The heart of XSLT is template matching (triggering) The xsl:template element with the match attribute is used <xsl:template match=“nodename”> . . . Put here whatever you want to do! </xsl:template> Actually the match attribute’s value is not merely a nodename it is a complex expression matching any of the children of the current node We must always start processing by matching to the document node which is nicknamed/(WARNING - this is NOT the XML root!) <xsl:template match=“/”> . . . </xsl:template>
Recursion and Iteration At every moment there is a current node We will need to match the current node’s children We can do this by recursion <xsl:apply-templates/> <xsl:apply-templates select = “subtree”/> Or by iteration (looping) <xsl:for-each select=“subtree” . . . > . . . </xsl:for-each> When recursing XSL should perform default actions on all the child nodes, but IE doesn’t
value-of select The explicit value of a node is obtained using <xsl:value-of select=“nodename”/> where as usual nodename is actually an expression For the current node’s value use “.” <xsl:value-of select=“.”/> Example <xsl:template match="article"> <b><xsl:value-of select="title"/>:</b> <xsl:value-of select="author"/> </xsl:template>
XPath expressions The expression in match and select attributes are in XPath XPath expressions are NOT XML syntax Here are some XPath goodies /like in directories is both the “top” and hierarchy divider *wildcard @attribute //any number of intervening levels type() e.g.text(), comment()nodes of a particular type Test brackets [xxx]only nodes with child or attribute which match Examples <xsl:template match=“@color”> <xsl:template match=“zoo/animals/*/food”> <xsl:template match=“zoo//food”> <xsl:template match=“book[text()]”>
Sorting The for-each element has several ordering options Sorting is specified using the order-by attribute By default ordering is lexicographical (unless explicitly number) and ascending (use - for descending) Multiple keys can be specified (separate by ;) <xsl:for-each select="copper|fiber" order-by="number(minrate); -interfaces"> . . . </xsl:for-each> There is also a <xsl:sort/> command not implemented by IE Also, you can count with <xsl:number/> (position in current node)
Default (IE) XSL <?xml version="1.0"?> <xsl:stylesheet version="1.0" . . .> <xsl:template match="/"> <html> <head> <style> . . . </style> </head> <body> <xsl:apply-templates/> </body> </html> </xsl:template> <xsl:template match="node()[nodeType()=10]"> <SPAN><!DOCTYPE <x:node-name/><I> (View source for full doctype . . . )></SPAN> </xsl:template> <xsl:template match="pi()"> . . . </xsl:template> <xsl:template match="comment()"> . . . </xsl:template> . . . <xsl:template match="*[ textnode() $and$ $not$ (comment() $or$ pi() $or$ cdata()) ]"> . . . </xsl:template> . . . </xsl:stylesheet>
XSLing on-the-fly By defining two XML islands one for the XML and one for the XSL We can process the XML before displaying it <html> <head> <script language=“javascript”> function load() { var result = xmli.transformNode(xsli.documentElement); fakeDiv.innerHTML = result; } </script> </head> <!-- xml islands --> <xml id="xmli" src="modems.xml"></xml> <xml id="xsli" src="modems.xsl"></xml> <body onload="load()"> <div id="fakeDiv"> </div> </body> </html>
XML and XSL and Javascript! XSL is great - but it has NO GUI !!!!!! Javascript is great - but it is tedious to use Idea: • process XML with XSL • use HTML buttons, forms, etc. • events trigger Javascript functions • Javascript changes XSL in DOM • XSL retransforms XML to HTML
XSML+XSL+JS Example <html> <head> <script language=“javascript”> function load() { var result = xmli.transformNode(xsli.documentElement); fakeDiv.innerHTML = result; } function change(value) { // parse XSL and make changes load() } </script> </head> <!-- xml islands --> <xml id="xmli" src="modems.xml"></xml> <xml id="xsli" src="modems.xsl"></xml> <body onload="load()"> <select name=“number" onClick=“change(this.value)”> <option selected="selected" value=“0">0</option> <option value=“1">1</option> </select> <div id="fakeDiv"> </div> </body> </html>
A Example modems.xml <?xml version="1.0"?> <!-- <?xml-stylesheet type="text/xsl" href="modems.xsl"?> --> <!-- modems.xml --> <!DOCTYPE modems SYSTEM "modems.dtd"> <modems> . . . modems.xsl <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/TR/WD-xsl"> <xsl:template match="/"> <html> <head> <Style> . . . </Style> </head> <body><xsl:apply-templates select="modems"/></body> </html> </xsl:template> . . . find.html <html> <head> <Style> . . . </Style> <script language="javascript" for="window" > function load() . . . function selectmedium(key) . . . function selectman(key) . . . function inputrate(key) . . . function inputrange(key) . . . </script> . . .