Processing XML

Processing XML • Processing XML using XSLT • Processing XML documents with Java (DOM) • Next week -- Processing XML documents with Java (SAX)

Processing XML using XSLT To use James Clark’s xt program visit his site at http://www.jclark.com/ and click on XML. The following programs were tested with the command line C:>xt somefile.xml somefile.xsl resultfile.html The xt classes (and xslt processing) may also be accessed via a servlet.

<?xml version="1.0" ?> <?xml-stylesheet type="text/xsl" href="demo1.xsl"?> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book> Input

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match = "book"> <HTML><BODY><xsl:apply-templates/></BODY></HTML> </xsl:template> <xsl:template match = "title"> <H1><xsl:apply-templates/></H1> </xsl:template> <xsl:template match = "author"> <H3><xsl:apply-templates/></H3> </xsl:template> <xsl:template match = "publisher"> <P><I><xsl:apply-templates/></I></P> </xsl:template> </xsl:stylesheet> Processing

<HTML> <BODY> <H1>The Catcher in the Rye</H1> <H3>J. D. Salinger</H3> <P> <I>Little, Brown and Company</I> </P> </BODY> </HTML> Output

<?xml version="1.0" ?> <?xml-stylesheet type="text/xsl" href="demo1.xsl"?> <library> <block> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book> </block> </library> Input

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match = "book"> <HTML><BODY><xsl:apply-templates/></BODY></HTML> </xsl:template> <xsl:template match = "title"> <H1><xsl:apply-templates/></H1> </xsl:template> <xsl:template match = "author"> <H3><xsl:apply-templates/></H3> </xsl:template> <xsl:template match = "publisher"> <P><I><xsl:apply-templates/></I></P> </xsl:template> </xsl:stylesheet> The default rules matches the root, library and block elements.

<HTML> <BODY> <H1>The Catcher in the Rye</H1> <H3>J. D. Salinger</H3> <P> <I>Little, Brown and Company</I> </P> </BODY> </HTML> The output is the same.

<?xml version="1.0" ?> <?xml-stylesheet type="text/xsl" href="demo1.xsl"?> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> <book>Cliff Notes on The Catcher in the Rye</book> </book> Two books in the input

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match = "book"> <HTML><BODY><xsl:apply-templates/></BODY></HTML> </xsl:template> <xsl:template match = "title"> <H1><xsl:apply-templates/></H1> </xsl:template> <xsl:template match = "author"> <H3><xsl:apply-templates/></H3> </xsl:template> <xsl:template match = "publisher"> <P><I><xsl:apply-templates/></I></P> </xsl:template> </xsl:stylesheet> What’s the output?

<HTML> <BODY> <H1>The Catcher in the Rye</H1> <H3>J. D. Salinger</H3> <P> <I>Little, Brown and Company</I> </P> <HTML> <BODY>Cliff Notes on The Catcher in the Rye</BODY> </HTML> </BODY> </HTML> Illegal HTML

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match = "book"> <HTML><BODY><xsl:apply-templates/></BODY></HTML> </xsl:template> <xsl:template match = "title"> <H1><xsl:apply-templates/></H1> </xsl:template> <xsl:template match = "author"> <H3><xsl:apply-templates/></H3> </xsl:template>  </xsl:stylesheet> We are not matching on publisher.

<HTML> <BODY> <H1>The Catcher in the Rye</H1> <H3>J. D. Salinger</H3> Little, Brown and Company </BODY> </HTML> We get the default rule matching the publisher and then printing its child.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match = "book"> <HTML><BODY><xsl:apply-templates/></BODY></HTML> </xsl:template> <xsl:template match = "title"> <H1><xsl:apply-templates/></H1> </xsl:template> <xsl:template match = "author"> <H3><xsl:apply-templates/></H3> </xsl:template> <xsl:template match = "publisher">  </xsl:template> </xsl:stylesheet> We can skip the publisher by matching and stopping the recursion.

<HTML> <BODY> <H1>The Catcher in the Rye</H1> <H3>J. D. Salinger</H3> </BODY> </HTML>

<?xml version="1.0" ?> <?xml-stylesheet type="text/xsl" href="demo1.xsl"?> <shelf> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book> </shelf> A shelf has many books.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match = "book"> <HTML><BODY><xsl:apply-templates/></BODY></HTML> </xsl:template> <xsl:template match = "title"> <H1><xsl:apply-templates/></H1> </xsl:template> <xsl:template match = "author"> <H3><xsl:apply-templates/></H3> </xsl:template> <xsl:template match = "publisher"> <i><xsl:apply-templates/></i> </xsl:template> </xsl:stylesheet> Will this do the job?

<HTML> <BODY> <H1>The Catcher in the Rye</H1> <H3>J. D. Salinger</H3> <i>Little, Brown and Company</i> </BODY> </HTML> <HTML> <BODY> <H1>The Catcher in the Rye</H1> <H3>J. D. Salinger</H3> <i>Little, Brown and Company</i> </BODY> </HTML> <HTML> <BODY> <H1>The Catcher in the Rye</H1> <H3>J. D. Salinger</H3> <i>Little, Brown and Company</i> </BODY> </HTML> This is not what we want.

<?xml version="1.0" ?> <?xml-stylesheet type="text/xsl" href="demo1.xsl"?> <shelf> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book> </shelf> Same input.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match = "shelf"> <HTML><BODY>Found a shelf</BODY></HTML> </xsl:template> </xsl:stylesheet> Checks for a shelf and quits.

<HTML> <BODY>Found a shelf</BODY> </HTML> Output

<?xml version="1.0" ?> <?xml-stylesheet type="text/xsl" href="demo1.xsl"?> <shelf> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book> </shelf> Same input.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match = "shelf"> <HTML> <BODY> <b>These are a few of my favorite books</b> <table width = "640“ border = “5”> <xsl:apply-templates/> </table> </BODY> </HTML> </xsl:template> <xsl:template match = "book"> <tr> <td> <xsl:number/> </td> <xsl:apply-templates/> </tr> </xsl:template> <xsl:template match = "title | author | publisher"> <td><xsl:apply-templates/></td> </xsl:template> </xsl:stylesheet> Produce a table of books.

<HTML> <BODY> <b>These are a few of my favorite books</b> <table width="640“ border = “5”> <tr> <td>1</td> <td>The Catcher in the Rye</td> <td>J. D. Salinger</td> <td>Little, Brown and Company</td> </tr> <tr> <td>2</td> <td>The XSLT Programmer's Reference</td> <td>Michael Kay</td> <td>Wrox Press</td> </tr> <tr> <td>3</td> <td>Computer Organization and Design</td> <td>Patterson and Henessey</td> <td>Morgan Kaufmann</td> </tr> </table> </BODY> </HTML>

Processing XML Documents with Java (DOM) • The following examples were tested using Sun’s JAXP • (Java API for XMP Parsing. This is available at • http://www.javasoft.com/ and click on XML

XML DOM • The World Wide Web Consortium’s Document Object Model • Provides a common vocabulary to use in manipulating • XML documents. • May be used from C, Java, Perl, Python, or VB • Things may be quite different “under the hood”. • The interface to the document will be the same.

The XML File “cats.xml” <?xml version = "1.0" ?> <!DOCTYPE TopCat SYSTEM "cats.dtd"> <TopCat> I am The Cat in The Hat <LittleCatA> I am Little Cat A </LittleCatA> <LittleCatB> I am Little Cat B <LittleCatC> I am Little Cat C </LittleCatC> </LittleCatB> <LittleCatD/> </TopCat>

document XML doc Called the Document Element doctype element topcat element text element element Little cat B I am the cat in the hat Little cat A Little cat D element text text Little Cat C I am little cat B I am little cat A text I am little cat C DOM

Agreement.xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap>

document XML doc doctype FixedFloatSwap Notional NumYears NumPayments FixedRate 100 5 3 6 All of these nodes implement the Node interface

Operation of a Tree-based Parser XML DTD Document Tree Tree-Based Parser Application Logic Valid XML Document

Some DOM Documentation from JavaSoft

The Node Interface • The Node interface is the primary datatype for the entire Document • Object Model. • It represents a single node in the document tree. • While all objects implementing the Node interface expose • methods for dealing with children, not all objects implementing • the Node interface may have children. • For example, Text nodes may not have children.

Properties • All Nodes have properties. • Not all properties are needed by all types of nodes. • The attribute property is an important part of the Element • node but is null for the Text nodes. • We access the properties through methods…

Some Methods of Node Example Methods are: String getNodeName() – depends on the Node type if Element node return tag name if Text node return #text

Some Methods of Node Example Methods are: short getNodeType() Might return a constant like ELEMENT_NODE or TEXT_NODE or …

Some Methods of Node Example Methods are: String getNodeValue() if the Node is an Element Node then return ‘null’ if the Node is a Text Node then return a String representing that text.

Some Methods of Node Example Methods are: Node getParentNode() returns a reference to the parent

Some Methods of Node Example Methods are: public Node getFirstChild() Returns the value of the firstChild property.

Some Methods of Node Example Methods are: public NodeList getChildNodes() returns a NodeList object NodeList is an interface and not a Node.

The NodeList Interface • The NodeList interface provides the abstraction of an ordered • collection of nodes, without defining or constraining how this • collection is implemented. • The items in the NodeList are accessible via an integral index, • starting from 0.

There are only two methods of the NodeList Interface public Node item(int index) Returns the item at index in the collection. If index is greater than or equal to the number of nodes in the list, this returns null.

There are only two methods of the NodeList Interface public int getLength() Returns the value of the length property.

The Element Interface • public interface Element • extends Node • By far the vast majority of objects (apart from text) that authors • encounter when traversing a document are Element nodes. Inheritance Nothing prevents us from extending one interface in order to create another. • Those who implement Element just have more promises to keep.

The Element Interface • public interface Element • extends Node • Some methods in the Element interface • String getAttribute(String name) • Retrieves an attribute value by name.

The Element Interface • public interface Element • extends Node • Some methods in the Element interface • public String getTagName() • Returns the value of the tagName property.

The Element Interface • public interface Element • extends Node • Some methods in the Element interface • public NodeList getElementsByTagName(String name) • Returns a NodeList of all descendant elements with a given • tag name, in the order in which they would be encountered in • a preorder traversal of the Element tree..

Processing XML

Processing XML

Presentation Transcript

RAINDROP: XML Stream Processing Engine

XML Processing in

Using JDOM for XML processing

Simplifying XML Processing using LINQ to XML

XPipe - An XML Processing Methodology

Processing XML with Java

Processing XML in .NET

Processing XML Documents

XML Security Processing With VTD-XML

XML Storage and Query Processing

Processing XML with Java

Query Processing with XML

Processing XML with Java

Query Processing of XML Data

XML Query Processing

Java API for XML Processing

XML Processing in Java

Processing XML with Java

Processing XML

Processing XML with Java

XML Stream Processing

XML Native Query Processing