310 likes | 521 Views
CSIT600b: XML Programming DOM Programming. Dickson K.W. Chiu PhD, SMIEEE Thanks to Prof. SC Cheung (HKUST), Prof. Francis Lau (HKU) Reference: XML How To Program, Deitel, Prentice Hall 2001. Overview of Java API for XML.
E N D
CSIT600b: XML Programming DOM Programming Dickson K.W. Chiu PhD, SMIEEE Thanks to Prof. SC Cheung (HKUST), Prof. Francis Lau (HKU) Reference: XML How To Program, Deitel, Prentice Hall 2001
Overview of Java API for XML • Java Web Services Developer Pack (Java WSDP) http://java.sun.com/webservices/tutorial.html • Now in J2EE 1.4 core • Document-oriented • Java API for XML Processing (JAXP) -- processes XML documents using various parsers • Java Architecture for XML Binding (JAXB) -- processes XML documents using schema-derived JavaBeans component classes • Procedure-oriented • Java API for XML-based RPC (JAX-RPC) -- sends SOAP method calls to remote parties over the Internet and receives the results • Java API for XML Messaging (JAXM) -- sends SOAP messages over the Internet in a standard way • Java API for XML Registries (JAXR) -- provides a standard way to access business registries and share information Dickson Chiu 2004
The DOM Core (JAXP) • Fundamental Interfaces • Required in all implementations • DOMException, ExceptionCode, DOMImplementation, DocumentFragment, Document, Node, NodeList, NamedNodeMap, CharacterData, Attr, Element, Text, Comment • Extended Interfaces • For DOM implementations working with XML • CDATASection, DocumentType, Notation, Entity, EntityReference, ProcessingInstruction • http://java.sun.com/xml/jaxp/ • Supports Schema and DTD validation + XSLT • More powerful than JDOM and dom4j Dickson Chiu 2004
DOM classes and interfaces (Ref) Deitel, XML How to Program, Fig 8.4 Dickson Chiu 2004
Some Document methods. Deitel, XML How to Program, Fig 8.5 Dickson Chiu 2004
XmlDocument methods Deitel, XML How to Program, Fig 8.6 Dickson Chiu 2004
Node methods Deitel, XML How to Program, Fig 8.7 Dickson Chiu 2004
Some node types Deitel, XML How to Program, Fig 8.8 Dickson Chiu 2004
Element methods Deitel, XML How to Program, Fig 8.9 Dickson Chiu 2004
DOM API of JAXP Dickson Chiu 2004
Merging 2 DOM trees (for assignment) import java.net.*; import java.io.*; import org.w3c.tidy.*; import org.w3c.dom.*; import javax.xml.transform.*; import javax.xml.transform.stream.*; import javax.xml.transform.dom.*; import org.w3c.dom.Node; import javax.xml.parsers.*; import javax.imageio.metadata.*; public class tester { public tester() { } static public void main(String[] arg){ try { String urlStr1 = "http://finance.yahoo.com/q/cp?s=^HSI"; String urlStr2 = "http://finance.yahoo.com/q/cp?s=^DJI"; // open the connection with that url URL url1 = null; URL url2 = null; url1 = new URL(urlStr1); url2 = new URL(urlStr2); URLConnection cn1 = null; URLConnection cn2 = null; cn1 = url1.openConnection(); cn2 = url2.openConnection(); // parse the html file into dom Tidy tidy = new Tidy(); tidy.setCharEncoding(Configuration.UTF8); tidy.setIndentContent(true); tidy.setXHTML(true); tidy.setWraplen(Integer.MAX_VALUE); Document doc1 = tidy.parseDOM(cn1.getInputStream(), null); Document doc2 = tidy.parseDOM(cn2.getInputStream(), null); Dickson Chiu 2004
Merging 2 DOM trees - cont // xsl File xslFile1 = new File("xslt1.xsl"); File xslFile2 = new File("xslt1.xsl"); // transform obj TransformerFactory t = TransformerFactory.newInstance(); Transformer transformer1 = t.newTransformer(new StreamSource(xslFile1)); Transformer transformer2 = t.newTransformer(new StreamSource(xslFile2)); //transform DOMSource source1 = new DOMSource(doc1); DOMResult result1 = new DOMResult(); DOMSource source2 = new DOMSource(doc2); DOMResult result2 = new DOMResult(); transformer1.transform(source1, result1); transformer2.transform(source2, result2); Document resultDoc1 = (Document)result1.getNode(); Document resultDoc2 = (Document)result2.getNode(); Dickson Chiu 2004
Merging 2 DOM trees - cont // merge two dom // 1. create new root Node newRoot = resultDoc1.createElement("indexes"); // 2. get the root element from both dom trees Node hsiIndex = resultDoc1.getFirstChild(); Node djiIndex = resultDoc2.getFirstChild(); // 3. *must* make all the nodes belongs to the same document djiIndex = resultDoc1.importNode(djiIndex, true); // 4. change the pointers newRoot.appendChild(hsiIndex); newRoot.appendChild(djiIndex); resultDoc1.appendChild(newRoot); t.newTransformer().transform(new DOMSource(resultDoc1), new StreamResult(System.out)); } catch (MalformedURLException ex) { } catch (IOException ex1) { } catch (TransformerConfigurationException ex2) { } catch (TransformerException ex3) { } } } True => deep copy Dickson Chiu 2004
Operation on the Command Line • Compiling • H:\Sun\AppServer\jdk\bin\javac -classpath H:\Sun\Appserver\lib\tidy.jar tester.java • Running • H:\Sun\AppServer\jdk\bin\java -classpath H:\Sun\Appserver\lib\tidy.jar;. tester > out.wml Library Dickson Chiu 2004
Building an XML Document with DOM • Desired Output <root> <!--This is a simple contact list--> <contact gender="F"> <FirstName>Sue</FirstName> <LastName>Green</LastName> </contact> <?myInstruction action silent?> <![CDATA[I can add <, >, and ?]]> </root> Dickson Chiu 2004
DocumentBuilder loads and parses XML documents Building an XML Document with DOM // Dietel, XML How to Program, Fig. 8.14 : BuildXml.java. import java.io.*; import org.w3c.dom.*; import org.xml.sax.*; import javax.xml.parsers.*; import com.sun.xml.tree.XmlDocument; public class BuildXml { private Document document; public BuildXml() { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); try { // get DocumentBuilder DocumentBuilder builder = factory.newDocumentBuilder(); // create root node document = builder.newDocument(); } catch ( ParserConfigurationException pce ) { pce.printStackTrace(); } import specifies location of classes needed by application Use JAXP default parser Obtain XML Document reference Dickson Chiu 2004
Create root Element and append to Document Write XML Document to myDocument.xml OUT!!! Use transformer Create CDATA node and append to root node Create ProcessingInstruction node with target myInstruction and value actionsilent Building an XML Document with DOM Element root = document.createElement( "root" ); document.appendChild( root ); Comment simpleComment = document.createComment( "This is a simple contact list" ); // add a comment root.appendChild( simpleComment ); Node contactNode = createContactNode( document ); root.appendChild( contactNode ); // add a child element ProcessingInstruction pi = document.createProcessingInstruction( "myInstruction", "action silent" ); root.appendChild( pi ); // add processing instruction CDATASection cdata = document.createCDATASection( "I can add <, >, and ?" ); root.appendChild( cdata ); // add a CDATA section try { // write the XML document to a file ( (XmlDocument) document).write( new FileOutputStream( "myDocument.xml" ) ); } catch ( IOException ioe ) { ioe.printStackTrace(); } } Call method createContactNode (next slide) to create child node Dickson Chiu 2004
Creates and returns Element node Building an XML Document with DOM public Node createContactNode( Document document ) { // create FirstName and LastName elements Element firstName = document.createElement( "FirstName" ); firstName.appendChild( document.createTextNode( "Sue" ) ); Element lastName = document.createElement( "LastName" ); lastName.appendChild( document.createTextNode( "Green" ) ); // create contact element Element contact = document.createElement( "contact" ); // create an attribute Attr genderAttribute = document.createAttribute( "gender" ); genderAttribute.setValue( "F" ); // append attribute to contact element contact.setAttributeNode( genderAttribute ); contact.appendChild( firstName ); contact.appendChild( lastName ); return contact; } public static void main( String args[] ) { BuildXml buildXml = new BuildXml(); } } Create ElementFirstName with text Sue Create ElementLastName with text Green Create Elementcontact with attribute gender Append Elements FirstName and LastName to Elementcontact Dickson Chiu 2004
Write an XML file with Transformer // write XML document to disk import javax.xml.transform.*; import javax.xml.transform.stream.*; import javax.xml.transform.dom.*; try { // create DOMSource for source XML document Source xmlSource = new DOMSource( document ); // create StreamResult for transformation result // Write to console: Result result = new StreamResult( System.out ); Result result = new StreamResult( new FileOutputStream( new File( “test.xml") ) ); // create TransformerFactory TransformerFactory transformerFactory = TransformerFactory.newInstance(); // create Transformer for transformation Transformer transformer = transformerFactory.newTransformer(); transformer.setOutputProperty( "indent", "yes" ); // transform and deliver content to client transformer.transform( xmlSource, result ); } Dickson Chiu 2004
Modifying XML Document with DOM // Fig 8.10 : ReplaceText.java import java.io.*; import org.w3c.dom.*; import javax.xml.parsers.*; import com.sun.xml.tree.XmlDocument; import org.xml.sax.*; public class ReplaceText { private Document document; public ReplaceText() { try { // obtain the default parser DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); // set the parser to validating factory.setValidating( true ); DocumentBuilder builder = factory.newDocumentBuilder(); // set error handler for validation errors builder.setErrorHandler( new MyErrorHandler() ); // obtain document object from XML document document = builder.parse( new File( "intro.xml" ) ); Dickson Chiu 2004
Write new XML document to intro1.xml OUT!!! Use Transformer Modifying XML Document with DOM Cast root node as element (subclass), then get list of all message elements // fetch the root node Node root = document.getDocumentElement(); if ( root.getNodeType() == Node.ELEMENT_NODE ) { Element myMessageNode = ( Element ) root; NodeList messageNodes = myMessageNode.getElementsByTagName( "message" ); if ( messageNodes.getLength() != 0 ) { Node message = messageNodes.item( 0 ); // create a text node Text newText = document.createTextNode("New Message!!" ); // get the old text node Text oldText = ( Text ) message.getChildNodes().item( 0 ); // replace the text message.replaceChild( newText, oldText ); } } ( (XmlDocument) document).write( new FileOutputStream( "intro1.xml" ) ); } If message element exists, replace old text node with new one Item() returns type Objectand need casting <myMessage> <message>New Message!!</message> </myMessage> <myMessage> <message>Welcome to XML!</message> </myMessage> Dickson Chiu 2004
Modifying XML Document with DOM catch ( SAXParseException spe ) { System.err.println( "Parse error: " + spe.getMessage() ); System.exit( 1 ); } catch ( SAXException se ) { se.printStackTrace(); } catch ( FileNotFoundException fne ) { System.err.println( "File \'intro.xml\' not found. " ); System.exit( 1 ); } catch ( Exception e ) { e.printStackTrace(); } } public static void main( String args[] ) { ReplaceText d = new ReplaceText(); } } Dickson Chiu 2004
Handling Complexities + ELEMENT: sentence + TEXT: The + ENTITY REF: projectName + COMMENT: The latest name we're using + TEXT: Eagle + CDATA: <i>project</i> + TEXT: is + PI: editor: red + ELEMENT: bold + TEXT: important + PI: editor: normal • To be more robust, a DOM application must handle more cases. • When data comes from outside world • When searching for an element: • Ignore comments, attributes, and processing instructions. • Allow for the possibility that subelements do not occur in the expected order. • Skip over TEXT nodes that contain ignorable whitespace, if not validating. • When extracting text for a node: • Extract text from CDATA nodes as well as text nodes. • Ignore comments, attributes, and processing instructions when gathering the text. • If an entity reference node or another element node is encountered, recurse (that is, apply the text-extraction procedure to all subnodes). Dickson Chiu 2004
Error Handler for Validation Errors // Fig 8.11 : MyErrorHandler.java // Error Handler for validation errors. import org.xml.sax.ErrorHandler; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; public class MyErrorHandler implements ErrorHandler { // throw SAXException for fatal errors public void fatalError( SAXParseException exception ) throws SAXException { throw exception; } public void error( SAXParseException e ) throws SAXParseException { throw e; } // print any warnings public void warning( SAXParseException err ) throws SAXParseException { System.err.println( "Warning: " + err.getMessage() ); } } Dickson Chiu 2004
Load and parse XML document Obtain JAXP default parser and DocumentBuilder to load and parse XML documents Traversing the DOM // Dietel, XML How to Program, Fig. 8.15 : TraverseDOM.java import java.io.*; import org.w3c.dom.*; import org.xml.sax.*; import javax.xml.parsers.*; import com.sun.xml.tree.XmlDocument; public class TraverseDOM { private Document document; public TraverseDOM( String file ) { try { // obtain the default parser DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setValidating( true ); DocumentBuilder builder = factory.newDocumentBuilder(); // set error handler for validation errors builder.setErrorHandler( new MyErrorHandler() ); // obtain document object from XML document document = builder.parse( new File( file ) ); processNode( document ); } Require parser to validate documents Pass Document to method processNode Dickson Chiu 2004
Traversing the DOM catch ( SAXParseException spe ) { System.err.println("Parse error: " + spe.getMessage() ); System.exit( 1 ); } catch ( SAXException se ) { se.printStackTrace(); } catch ( FileNotFoundException fne ) { System.err.println( "File \'" + file + "\' not found. " ); System.exit( 1 ); } catch ( Exception e ) { e.printStackTrace(); } } Dickson Chiu 2004
Traversing the DOM switch statement determines Node type public void processNode( Node currentNode ){ switch ( currentNode.getNodeType() ) { case Node.DOCUMENT_NODE: // process a Document node Document doc = ( Document ) currentNode; System.out.println( "Document node: " + doc.getNodeName() + "\nRoot element: " + doc.getDocumentElement().getNodeName() ); processChildNodes( doc.getChildNodes() ); break; case Node.ELEMENT_NODE: // process an Element node System.out.println( "\nElement node: " + currentNode.getNodeName() ); NamedNodeMap attributeNodes = currentNode.getAttributes(); for ( int i = 0; i < attributeNodes.getLength(); i++){ Attr attribute = ( Attr ) attributeNodes.item( i ); System.out.println( "\tAttribute: " + attribute.getNodeName() + " ; Value = " + attribute.getNodeValue() ); } processChildNodes( currentNode.getChildNodes() ); break; If document node, output document node and process child nodes If element node, output element’s attributes and process child nodes Dickson Chiu 2004
Method processChildNodes calls method processNode for each Node in NodeList Traversing the DOM case Node.CDATA_SECTION_NODE: // process text node / CDATA section case Node.TEXT_NODE: Text text = ( Text ) currentNode; if ( !text.getNodeValue().trim().equals( "" ) ) System.out.println( "\tText: " + text.getNodeValue() ); break; } } public void processChildNodes( NodeList children ) { if ( children.getLength() != 0 ) for ( int i = 0; i < children.getLength(); i++) processNode( children.item( i ) ); } public static void main( String args[] ) { if ( args.length < 1 ) { System.err.println("Usage: java TraverseDOM <filename>" ); System.exit( 1 ); } TraverseDOM traverseDOM = new TraverseDOM( args[ 0 ] ); } } If CDATA or text node, output node’s text content Dickson Chiu 2004
DOM with JavaScript (Reference) (MS XML Parser) (Deitel Sec 8.3) <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <!-- Fig. 8.3 : DOMExample.html --> <head> <title>A DOM Example</title> </head> <body> <script type = "text/javascript" language = "JavaScript"> var xmlDocument = newActiveXObject( "Microsoft.XMLDOM" ); xmlDocument.load( "article.xml" ); // get the root element var element = xmlDocument.documentElement; document.writeln("<p>Here is the root node of the document:" ); document.writeln("<strong>" + element.nodeName+"</strong>" ); document.writeln("<br>The following are its child elements:" ); document.writeln( "</p><ul>" ); Dickson Chiu 2004
DOM with JavaScript (2) // traverse all child nodes of root element for ( i = 0; i < element.childNodes.length; i++ ) { var curNode = element.childNodes.item( i ); // print node name of each child element document.writeln( "<li><strong>" + curNode.nodeName + "</strong></li>" ); } document.writeln( "</ul>" ); // get the first child node of root element var currentNode = element.firstChild; // firstChild = childNodes.item(0) document.writeln( "<p>The first child of root node is:" ); document.writeln( "<strong>" + currentNode.nodeName+ "</strong>"); document.writeln( "<br>whose next sibling is:" ); // get the next sibling of first child var nextSib = currentNode.nextSibling; document.writeln( "<strong>" + nextSib.nodeName+ "</strong>." ); document.writeln( "<br>Value of <strong>" + nextSib.nodeName + "</strong> element is:" ); Dickson Chiu 2004
DOM with JavaScript (3) var value = nextSib.firstChild; // print the text value of the sibling document.writeln( "<em>" + value.nodeValue + "</em>" ); document.writeln( "<br>Parent node of " ); document.writeln( "<string>" + nextSib.nodeName+ "</strong> is:" ); document.writeln( "<strong>" + nextSib.parentNode.nodeName + "</strong>.</p>" ); </script></body></html> <?xml version = "1.0"?> <!-- Fig. 8.2: article.xml --> <article> <title>Simple XML</title> <date>December 6, 2000</date> <author> <fname>Tem</fname> <lname>Nieto</lname> </author> <summary>XML is pretty easy.</summary> <content>Once you have mastered HTML, XML is easily learned. You must remember that XML is not for displaying information but for managing information. </content> </article> Dickson Chiu 2004