700 likes | 819 Views
Java and XML Platform independence meets language independence!. CC432 / Short Course 507 Lecturer: Simon Lucas University of Essex Spring 2002. Main Topics. Introduction Reading and Writing XML SAX DOM and JDOM Serializing Objects to XML XMLC Concluding remarks. Introduction.
E N D
Java and XMLPlatform independence meets language independence! CC432 / Short Course 507 Lecturer: Simon Lucas University of Essex Spring 2002
Main Topics • Introduction • Reading and Writing XML • SAX • DOM and JDOM • Serializing Objects to XML • XMLC • Concluding remarks
Introduction • Java is a platform independent language – runs anywhere where we have a JVM • And is well-connected – powerful java.net library • Yet – many people persist in using other languages – C/C++, VB etc!
Why Java and XML? • The common format that allows applications written in any language to communicate is XML • Therefore, very important to make Java read and write XML • Can also design object models in Java – and translate them into XML • Leverage powerful design tools such as Together for this purpose
Reading and Writing XML • To gain an insight into what this involves – we’ll work through a simplified model of XML • Our simplified model is as follows: • A tree of elements • Each element either has: • Text, • OR • A set of Child Elements
Element.java • The Element class defines the object model for this kind of document • It also includes some String constants that dictate what characters will be used to delimit the elements • These are chosen to be standard XML characters • Currently, no checking that node text does not contain these special characters!!!
Element.java - I package xml.serial; import java.util.*; import java.io.*; public class Element { static String TAG_OPEN = "<"; static String TAG_CLOSE = ">"; static String END_TAG_OPEN = "</"; static int TAB = 2; static int INIT_INDENT = 0; static char SPACE = ' '; protected Vector children; protected StringBuffer text; protected String name; public Element( String name ) { this.name = name; children = null; text = null; }
Element.java II final public Vector getChildren() { return children; } final public String getText() { return text.toString(); } final public String getName() { return name; }
Element.java III final public void setText(String text) throws Exception { // should substitute for any nasty characters // e.g. at least < and > if ( children == null) { this.text = new StringBuffer( text ); } else { throw new Exception( "Cannot add text to a node that already has child elements"); } }
Element.java IV final public void addChild(Element child) throws Exception { if ( text == null ) { if (children == null) { children = new Vector(); } children.addElement( child ); } else { throw new Exception( "Cannot add elements to a node that already has text"); } }
Reading and Writing Elements • Given this simple Element class • We can now write code to serialize a tree of these elements to an XML doc • And to de-serialize such a document back to the tree of Elements in memory • Hence, we get to write a simple parser for this subset of XML! • ElementTest creates an element-only document and writes it to a file
ElementTest.java package xml.serial; import java.io.*; public class ElementTest { public static void main(String[] args) throws Exception { Element el = new Element("object"); PrintWriter pw = new PrintWriter( System.out ); // el.write( pw ); Element value = new Element( "value" ); value.setText( "Hello" ); el.addChild( value ); el.write( pw ); pw.println( "And now the static version..." ); ElementWriter.write( el , pw ); pw.flush(); } }
Running ElementTest >java xml.serial.ElementTest <object> <value> Hello </value> </object>
SAX Event-based XML processing
SAX – Main Features • Serial processing of an XML document • Register an event handler • The SAX parser then reads the XML document from start to end • Calls the methods of the event handler in response to various parts of the document
Example Events • startDocument() • startElement() • characters() • endElement() • endDocument() • + many others!
SAX-based program pattern • Define a class that implements the ContentHandler interface • Easiest way is to extend DefaultHandler • DefaultHandler provides NO-OP implementations of all the methods in the ContentHandler interface • Override whichever methods you need to for your application
Using your Custom ContentHandler • Import the necessary packages • Create a new SAXParser • Get an XMLReader from the Parser • Set the ContentHandler for the XMLReader to be your own Customized ContentHandler • Set up an ErrorHandler for the XMLReader – this is a class to handle any parsing errors • Call the XMLReader to parse an XML Document
Counting Node Types • This program is the Hello World of SAX • At the start of the document we create a Hashtable to count the occurrences of each type of element • We override startElement() to update the count in the Hashtable with each element name we see • Override endDocument() to print a summary
SAXTest Program Structure • SAXTest uses CountNodes • CountNode extends DefaultHandler DefaultHandler SAXTest CountNodes
SAXTest package courses.xml; import javax.xml.parsers.*; import org.xml.sax.*; import org.xml.sax.helpers.*; public class SAXTest extends DefaultHandler { static String parserClass = "org.apache.xerces.parsers.SAXParser"; public static void main(String[] args) throws Exception { XMLReader reader = XMLReaderFactory.createXMLReader( parserClass ); reader.setContentHandler( new CountNodes() ); reader.setErrorHandler( new SimpleErrorHandler(System.err)); reader.parse( args[0] ); } }
CountNodes • We shall override the following: • startDocument() • startElement() • endElement()
CountNodes - declaration package courses.xml; import org.xml.sax.*; import org.xml.sax.helpers.*; import java.util.*; public class CountNodes extends DefaultHandler { private Hashtable tags; // …
CountNodes: startDocument() • Create a new hashtable for each new document public void startDocument() throws SAXException { tags = new Hashtable(); }
CountNodes: startElement() public void startElement(String namespaceURI, String localName, String rawName, Attributes atts) throws SAXException { String key = localName; Object value = tags.get(key); if (value == null) { // Add a new entry tags.put(key, new Integer(1)); } else { // Get the current count and increment it int count = ((Integer)value).intValue(); count++; tags.put(key, new Integer(count)); } }
CountNodes: endDocument() • Summarise the Hashtable contents public void endDocument() throws SAXException { Enumeration e = tags.keys(); while (e.hasMoreElements()) { String tag = (String)e.nextElement(); int count = ((Integer) tags.get(tag)).intValue(); System.out.println( "Tag <" + tag + "> occurs " + count + " times"); } }
Running SAXTest: Hello.xml <?xml version="1.0" ?> <greetings> <greeting lang="english"> hello </greeting> <greeing> bonjour </greeing> <greeting> hola! </greeting> </greetings>
Output >java courses.xml.SAXTest courses\xml\hello.xml Tag <greeing> occurs 1 times Tag <greetings> occurs 1 times Tag <greeting> occurs 2 times
Notes on CountNodes • Note the parameters to startElement() • We get direct access to that element only – that is its: • Namespace • Attributes • Element Name (local name) • Raw Name (namespace + local name) • We must work for any access beyond this!
SAX Exercise • By overriding: • startElement() • endElement() • startDocument() • endDocument() • provide a ContentHandler prints out how many times a greeting element was that child of another greeting element
SAX Filter Pipelines • In the Count Nodes example, the XMLReader read from an XML document source • Also possible to read from the output of a ContentHandler • In this way can plug together modular filters to achieve complex effects
DOM and JDOM Document Object Model and Java Document Object Model
DOM • A language-independent object model of XML documents • Memory-based • The entire document is parsed – read in to memory • This allows direct access to any part of the document • But limits the size of document that can be handled
JDOM • Because DOM is a language-independent spec., there are features that seem awkward from a Java perspective • JDOM is a Java-based system, developed by Brett McLaughlin and Jason Hunter • It aims to offer most of the features of DOM, but make them easier to exploit to Java programmers
Hello JDOM World • We’ll look at a program that • creates a document • adds a few elements to it • writes it to an output stream
package xml.jdom;import org.jdom.Element;import org.jdom.Document;import org.jdom.output.XMLOutputter;public class HelloWorld { public static void main(String[] args) throws Exception { Element root = new Element("Greeting"); root.setText("Hello world!"); Element child = new Element("Gday"); child.setText("The kid <bold> is \"cool </bold>"); child.addAttribute( "color" , "red" ); root.addContent( child ); Document doc = new Document(root);
XMLOutputter output = new XMLOutputter( " " , true ); output.output( doc, new java.io.PrintWriter( System.out ) ); String text = root.getText(); }}
Reading XML into JDOM package xml.jdom; import org.jdom.Document; import org.jdom.DocType; import org.jdom.Element; import org.jdom.input.SAXBuilder; import org.jdom.output.XMLOutputter; public class InputTest { public static void main(String[] args) throws Exception { String filename1 = "xml/slides/slides.xml"; SAXBuilder builder = new SAXBuilder(); System.out.println("Building..."); Document doc = builder.build( filename1 ); System.out.println( doc ); } }
Processing XML with JDOM • Now we have the document tree in memory • Processing is typically much simpler than with SAX • Though for simple programs, this is not always so • Let’s begin by considering how to write the Count Nodes program with JDOM
Some API • Commonly used functions: • getChildren() – gets all the child elements • getContent() – gets all the content of a node – Pis, Entities, Child elements etc • addContent() – adds any kind of content to a node • addChild() • get/setText() deals with the text of a node • getParent() – does what you expect!
Count Nodes in JDOM • Strategy: • Create a hashtable • Read in the document • Walk the tree, keeping count in the hashtable • We walk the tree by recursively visiting all the children of a node
CountNodes - Structure • CountTest.java reads in the XML doc as a JDOM Document • Creates an instance of CountNodes • Calls the walkTree method of CountNodes on the document root element • CountNodes defines three methods • Constructor – initialises the Hashtable • walkTree – recursively walks the document • count – updates entries in the Hastable • printSummary • Compare this with the SAX implementation
CountTest.java package xml.jdom; import org.jdom.*; import org.jdom.input.SAXBuilder; public class CountTest { public static void main(String[] args) throws Exception { String filename1 = "courses/xml/hello.xml"; SAXBuilder builder = new SAXBuilder(); Document doc = builder.build( filename1 ); CountNodes counter = new CountNodes(); counter.walkTree( doc.getRootElement() ); counter.printSummary( System.out ); } }
CountNodes.java package xml.jdom; import java.util.*; import java.io.*; import org.jdom.*; public class CountNodes { Hashtable h; public CountNodes() { h = new Hashtable(); } // … continued
CountNodes – walkTree() public void walkTree(Element el) { count( el.getName() ); List children = el.getChildren(); for (Iterator i = children.iterator(); i.hasNext() ; ) { walkTree( (Element) i.next() ); } }
CountNodes – count() public void count(String key) { Object value = h.get(key); if (value == null) { // Add a new entry h.put(key, new Integer(1)); } else { // Get the current count and increment it int count = ((Integer) value).intValue(); count++; h.put(key, new Integer(count)); } }
CountNodes – printSummary() public void count(String key) { Object value = h.get(key); if (value == null) { // Add a new entry h.put(key, new Integer(1)); } else { // Get the current count and increment it int count = ((Integer) value).intValue(); count++; h.put(key, new Integer(count)); } }
JDOM Exercise • Write a JDOM program to print out how many times a greeting element was that child of another greeting element • (e.g. given a doc like Hello.xml – see above) • (same task that we previously attempted with SAX)
JDOM Exercise Hints • Consider the following methods: • getParent() • getName() • getChildren()
Serializing Objects to XML Homebrew version JSX