190 likes | 406 Views
XML DOM Tutorial. CSC 309 By: Meng Lou. DOM. Introduction Overview Steps for DOM parsing Examples DOM or SAX? Summary. Introduction. DOM supports navigating and modifying XML documents. Hierarchical tree representation of documents Language Neutral, C++, Java, CORBA
E N D
XML DOM Tutorial CSC 309 By: Meng Lou
DOM Introduction Overview Steps for DOM parsing Examples DOM or SAX? Summary
Introduction • DOM supports navigating and modifying XML documents. • Hierarchical tree representation of documents • Language Neutral, C++, Java, CORBA • www.w3c.org/DOM
Pros and Cons • Advantages: Robust API for the DOM TREE; Relatively simple to modify the data structure and extract data • Disadvantages: Stores the entire document in memory; As DOM was written for any language, method naming conventions don’t follow standard Java conventions
Steps for parsing • Specify parser • Create a document builder • Invoke the parser to create a Document representing the XML document • Normalize • Obtain the root node • Modify and examine the properties of nodes
Specifying a Parser • Use the command line java –D option • In the program, use System.setProperty, eg. System.setProperty( “javax.xml.parsers.DocumentBuilderFactory”, “org.apache.xerces.jaxp.DocumentBuilderFactoryImpl” );
Create a Document Handler • Create an instance of builder factory, then use it to create a DocumentBuilder Object DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = builderFactory.newDocumentBuilder();
Create a Dcoument • Call the parse method Document doc = builder.parse (someInputStream); • The Document class represents the parsed result in a tree structure
Normalize the Tree • Normalization has two affects: - Combines textual nodes that span multiple lines - Eliminates empty textual nodes doc.getDocumentElement().normalize();
Obtain the root node • Traversing begins at the root node Element rootElement = doc.getDocumentElement(); - Element is a subclass of the more general Node class represents an XML element - Node represents all the various components of an XML document eg. Document, Element, Attribute, Entity…
Examine and Modify Nodes • Various properties: - getNodeName - getNodeType - getAttributes - getChildNodes - setNodeValue - appendChild - removeChild - replaceChild
Sample Code Bits //walk the DOM tree and print as u go public void walk(Node node) { int type = node.getNodeType(); switch(type) { case Node.DOCUMENT_NODE: { System.out.println("<?xml version=\"1.0\" encoding=\""+ "UTF-8" + "\"?>"); break; }//end of document case Node.ELEMENT_NODE: { System.out.print('<' + node.getNodeName() ); NamedNodeMap nnm = node.getAttributes(); if(nnm != null ) { int len = nnm.getLength() ; Attr attr; for ( int i = 0; i < len; i++ ) { attr = (Attr)nnm.item(i); System.out.print(' ' + attr.getNodeName() + "=\"" + attr.getNodeValue() + '"' ); } } System.out.print('>'); break; }//end of element case Node.ENTITY_REFERENCE_NODE: { System.out.print('&' + node.getNodeName() + ';' ); break; }//end of entity case Node.CDATA_SECTION_NODE: { System.out.print( "<![CDATA[" + node.getNodeValue() + "]]>" ); break; } case Node.TEXT_NODE: { System.out.print(node.getNodeValue()); break; } }//end of switch //recurse for(Node child = node.getFirstChild(); child != null; child = child.getNextSibling()) { walk(child); } //without this the ending tags will miss if ( type == Node.ELEMENT_NODE ) { System.out.print("</" + node.getNodeName() + ">"); } }//end of walk
DOM or SAX ? • Dom - Suitable for small documents - Easily modify document - Memory intensive • SAX (Simple API for XML) - Suitable for large documents - Only traverse document once - event Driven, saves memory
Summary • DOM is a tree representation of an XML document in memory • JAXP provides a vendor-neutral interface to the underlying parser • Every component of the XML document is a Node • Use normalization to combine text elements that spans multiple lines