660 likes | 831 Views
Java XML Programming. Svetlin Nakov. Bulgarian Association of Software Developers. www.devbg.org. Contents. Introduction to XML Parsers The DOM Parser The SAX Parser The StAX Parser Introduction to JAXP Using DOM Using StAX Java API for XPath Java API for XSLT. XML Parsers.
E N D
Java XML Programming Svetlin Nakov Bulgarian Association of Software Developers www.devbg.org
Contents • Introduction to XML Parsers • The DOM Parser • The SAX Parser • The StAX Parser • Introduction to JAXP • Using DOM • Using StAX • Java API for XPath • Java API for XSLT
XML Parsers • XML parsers are programming libraries that make the work withXML easier • They are used for: • Extracting data from XML documents • Building XML documents • Validating XML documents by given scheme
XML Parsers – Models • DOM (Document Object Model) • RepresentsXML documents as a tree in the memory • Allows flexible and easy processing • Supports changing the document • SAX (Simple API for XML Processing) • ReadsXMLdocuments consequently (like a stream) • Allows read-only / write-only access • StAX (Streaming API for XML) • Similar to SAX but simplified
Using a XML Parser • Three basic steps to using an XML parser • Create a parser object • Pass your XML document to the parser • Process the results • Generally, writing out XML is outside scope of parsers • Some parsers may implement such mechanisms
Types of Parser • There are several different ways to categorize parsers: • Validating versus non-validating parsers • Parsers that support the Document Object Model (DOM) • Parsers that support the Simple API for XML (SAX) • Streaming parsers (StAX) • Parsers written in a particular language (Java, C#, C++, Perl, etc.)
DOM Key Features • The DOM API is generally an easier API to use • It provides a familiar tree structure of objects • You can use it to manipulate the hierarchy of a XML document • The DOM API is ideal for interactive applications • The entire object model is present in memory
The DOMParser – Example • Thefollowing XML document is given: <?xml version="1.0"?> <library name=".NET Developer's Library"> <book> <title>Programming Microsoft .NET</title> <author>Jeff Prosise</author> <isbn>0-7356-1376-1</isbn> </book> <book> <title>Microsoft.NET for Programmers</title> <author>Fergal Grimes</author> <isbn>1-930110-19-7</isbn> </book> </library>
The DOMParser – Example • This document is represented in the in the memory as a DOM tree in the following way: Root node Header part
SAX Key Features • TheSimple API for XML (SAX) • Event-driven • Serial-accessmechanism • Element-by-elementprocessing • Do not allow going backwards or jumping ahead • Require many times less resources • Memory • CPU time • Work over streams
The SAX Parser • Working with SAX is much complex • Old technology • Use it's new equivalent – the StAX parser
The StAX Parser • Like SAX but • Not event driven (not callback based) • "Pull"-based • Developer manually say "go to next element" and analyze it • It's a new feature in Java 6.0!
When to UseDOMand When to Use SAX/StAX? • The DOM processing model is suitable when: • Processing small documents • There is a need of flexibility • There is a need of direct access to different nodes of the document • We need to change the document
When to UseDOMand When to Use SAX/StAX? • The SAX/StAXprocessing model is suitable when: • Processing big documents • Big XML documents (e.g. > 20-30 MB) cannot be processed with DOM! • The performance is important • There is no need to change the document nodes • SAX/StAX is read-only / write-only (like the streams)
JAXP • Java API forXML Processing • Designed to beflexible • Facilitatetheuse of XML onthe Java platform • Provides a commoninterfacefor these standard APIs • DOM • SAX, StAX • XPath and XSL Transformations (XSLT)
JAXP – Plugability • JAXP allowsyou to useanyXML-compliantparser • Regardless of whichvendor'simplementationisactuallybeingused • Pluggabilitylayer • Letsyoupluginanimplementation of the SAX or DOM API • Letsyoucontrolhowyour XML dataisdisplayed
JAXP – Independence • To achievethegoal of XML processorindependence • Applicationshouldlimititself to theJAXP API • Avoid usingimplementation-dependentAPIsandbehavior
JAXP Packages • javax.xml.parsers • The JAXP APIs • Provides a commoninterfacefordifferentvendors' SAX and DOM parsers • org.w3c.dom • DefinestheDOM classes • Documentclassand all thecomponents of a DOM
JAXP Packages (2) • org.xml.sax • Definesthebasic SAX APIs • javax.xml.stream • Define the basic StAX classes • javax.xml.xpath • Defines API for the evaluation of XPath expressions • javax.xml.transform • Definesthe XSLT APIsthatletyoutransform XML intootherforms
DOM Document Structure Document structure: XML input: Document +---Element <dots> +---Text "this is before the first dot | and it continues on multiple lines" +---Element <dot> +---Text "" +---Element <dot> +---Text "" +---Element <flip> |+---Text "flip is on" |+---Element <dot> |+---Text "" |+---Element <dot> |+---Text "" +---Text "flip is off" +---Element <dot> +---Text "" +---Element <extra> |+---Text "stuff" +---Text "" +---Comment "a final comment" +---Text "" <?xml version="1.0" encoding="UTF-8"?> <dots> this is before the first dot and it continues on multiple lines <dot x="9" y="81" /> <dot x="11" y="121" /> <flip> flip is on <dot x="196" y="14" /> <dot x="169" y="13" /> </flip> flip is off <dot x="12" y="144" /> <extra>stuff</extra> <!-- a final comment --> </dots>
DOM Document Structure • There’s a text node between every pair of element nodes, even if the text is empty • XML comments appear in special comment nodes • Element attributes do not appear in tree • Available through Elementobject
Using DOM Here’s the basic recipe for getting started: import javax.xml.parsers.*; import org.w3c.dom.*; // Get a DocumentBuilder object DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = null; try { db = dbf.newDocumentBuilder(); } catch (ParserConfigurationException e) { e.printStackTrace(); } // Invoke parser to get a Document Document doc = db.parse(inputStream); Document doc = db.parse(file); Document doc = db.parse(url);
DOM Document Access Idioms • OK, say we have a Document. How do we get at the pieces of it? • Here are some common idioms: // get the root of the Document tree Element root = doc.getDocumentElement(); // get nodes in subtree by tag name NodeList dots = root.getElementsByTagName("dot"); // get first dot element Element firstDot = (Element) dots.item(0); // get x attribute of first dot String x = firstDot.getAttribute("x");
More Document Accessors e.g. DOCUMENT_NODE, ELEMENT_NODE, TEXT_NODE, COMMENT_NODE, etc. Nodeaccess methods: StringgetNodeName() shortgetNodeType() DocumentgetOwnerDocument() boolean hasChildNodes() NodeListgetChildNodes() NodegetFirstChild() NodegetLastChild() NodegetParentNode() NodegetNextSibling() NodegetPreviousSibling() booleanhasAttributes() ... and more ...
More Document Accessors ElementextendsNodeand adds these access methods: StringgetTagName() booleanhasAttribute(Stringname) StringgetAttribute(Stringname) NodeListgetElementsByTagName(Stringname) … and more … DocumentextendsNodeand adds these access methods: ElementgetDocumentElement() DocumentTypegetDoctype() ... plus theElementmethods just mentioned ... ... and more ...
Writing a Document as XML • JAXP do not specify how to write XML document to a file • Most JAXP implementations have own classes for writing XML files • E.g. the class XMLSerializer in Apache Xerces (the standard parser in J2SE 5.0) import com.sun.org.apache.xml.internal. serialize.XMLSerializer; XMLSerializer xmlser = new XMLSerializer(); xmlser.setOutputByteStream(System.out); xmlser.serialize(doc);
Reading and Parsing XML Documents with the DOM Parser Live Demo
Creating & Manipulating DOM Documents • The DOM API also includes lots of methods for creating and manipulating Document objects: // Get new empty Document from DocumentBuilder Document doc = docBuilder.newDocument(); // Create a new <dots> element // and add it to the document as root Element root = doc.createElement("dots"); doc.appendChild(root); // Create a new <dot> element // and add as child of the root Element dot = doc.createElement("dot"); dot.setAttribute("x", "9"); dot.setAttribute("y", "81"); root.appendChild(dot);
More Document Manipulators Nodemanipulation methods: voidsetNodeValue(StringnodeValue) NodeappendChild(NodenewChild) NodeinsertBefore(NodenewChild, NoderefChild) NoderemoveChild(NodeoldChild) ... and more ... Elementmanipulation methods: voidsetAttribute(Stringname, Stringvalue) voidremoveAttribute(Stringname) … and more … Documentmanipulation methods: TextcreateTextNode(Stringdata) CommentcreateCommentNode(Stringdata) ... and more ...
Building Documents with the DOM Parser Live Demo
The StAX Parser in Java • As from Java 6 the StAX parser is available as part of Java • Two basic StAX classes • XMLStreamReader • Pull based XML streaming API for parsing XML documents – read-only • XMLStreamWriter • Streaming based builder for XML documents – write-only
Parsing Documents withthe StAX Parser – Example FileReader fileReader = new FileReader("Student.xml"); XMLInputFactory factory = XMLInputFactory.newInstance(); XMLStreamReader reader = factory.createXMLStreamReader(fileReader); String element = ""; while (reader.hasNext()) { if (reader.isStartElement()) { element = reader.getLocalName(); } else if (reader.isCharacters() && !reader.isWhiteSpace()) { System.out.printf("%s - %s%n", element, reader.getText()); } reader.next(); } reader.close()
Parsing Documents withtheStAX Parser Live Demo
Creating Documents withthe StAX Parser – Example String fileName = "Customers.xml"; FileWriter fileWriter = new FileWriter(fileName); XMLOutputFactory factory = XMLOutputFactory.newInstance(); XMLStreamWriter writer = factory.createXMLStreamWriter(fileWriter); writer.writeStartDocument(); writer.writeStartElement("Customers"); writer.writeStartElement("Customer"); writer.writeStartElement("Name"); writer.writeCharacters("ABC Pizza"); writer.writeEndElement(); writer.writeStartElement("Address"); writer.writeCharacters("1 Main Street"); writer.writeEndElement(); writer.writeEndElement(); writer.writeEndElement(); writer.writeEndDocument(); writer.flush();
Parsing Documents withtheStAX Parser Live Demo
Using XPath in Java Searching nodes in XML documents
Parsing XML Documents with XPath • To evaluate an XPath expression in Java, create an XPath object • Then call the evaluate method • expression is an XPath expression • doc is the Document object that represents the XML document XPathFactory xpfactory = XPathFactory.newInstance(); XPath xpath = xpfactory.newXPath(); String result = xpath.evaluate(expression, doc)
Sample XML Document <?xml version="1.0" encoding="windows-1251"?> <items culture="en-US"> <item type="beer"> <name>Zagorka</name> <price>0.54</price> </item> <item type="food"> <name>kepab</name> <price>0.48</price> </item> <item type="beer"> <name>Amstel</name> <price>0.56</price> </item> </items>
Parsing with XPath – Example • For example,obtains as result the string "0.48“ • XPath can also match multiple nodes and return NodeList: String result = xpath.evaluate("/items/item[2]/price", doc) NodeList nodes = (NodeList) xpath.evaluate( "/items/item[@type='beer']/price", doc, XPathConstants.NODESET); for (int i=0; i<beerPriceNodes.getLength(); i++) { Node priceNode = nodes.item(i); System.out.println(node.getTextContent()); }
Using XPath Live Demo
Modifying XML with DOM and XPath Live Demo
XSL Transformations in JAXP javax.xml.transform.Transformer