520 likes | 615 Views
Programming with XML. Written by: Adam Carmi Zvika Gutterman. Agenda. About XML Review of XML syntax Document Object Model (DOM) JAXP W3C XML Schema Validating Parsers. About XML. XML – E X tensible M arkup L anguage Designed to describe data
E N D
Programming with XML Written by: Adam Carmi Zvika Gutterman
Agenda • About XML • Review of XML syntax • Document Object Model (DOM) • JAXP • W3C XML Schema • Validating Parsers
About XML • XML – EXtensible Markup Language • Designed to describe data • Provides semantic and structural information • Extensible • Human readable and computer-manipulable • Software and Hardware independent • Open and Standardized by W3C1 • Ideal for data exchange • World Wide Web Consortium (founded in 1994 by Tim Berners-Lee)
offenders.xml Information is marked up with structural and semantic information. The characters &, <, >, ‘, “ are reserved and can’t be used in character data. Use &, <, >, ' and " instead. <offenders> <!-- Lists all traffic offenders --> <offender id="024378449 "> <firstName> David </firstName> <middleName>Reuven</middleName> <lastName>Harel</lastName> <violation id=’12’> <code num=“232” category=“traffic”/> <issueDate>2001-11-02</issueDate> <issueTime>10:32:00</issueTime> Ran a red light at Arik & Bentz st. </violation> </offender> </offenders> Comment Character Data Tag Character Data
offenders.xml: Tags XML tags are not pre-defined and a are case sensitive. An XML document may have only one root tag. <offenders> <!-- Lists all traffic offenders --> <offender id="024378449 "> <firstName> David </firstName> <middleName>Reuven</middleName> <lastName>Harel</lastName> <violation id=’12’> <code num=“232” category=“traffic”/> <issueDate>2001-11-02</issueDate> <issueTime>10:32:00</issueTime> Ran a red light at Arik & Bentz st. </violation> </offender> </offenders> Root Tag Start Tag Shorthand for: <code num=...></code> End Tag
offenders.xml: Elements Elements mark-up information. Element x begins with a start-tag <x> and ends with an end-tag </x> XML Elements must be properly nested: <x>...<y>...</y>...</x> XML documents must contain exactly one root element. <offenders> <!-- Lists all traffic offenders --> <offender id="024378449 "> <firstName> David </firstName> <middleName>Reuven</middleName> <lastName>Harel</lastName> <violation id=’12’> <code num=“232” category=“traffic”/> <issueDate>2001-11-02</issueDate> <issueTime>10:32:00</issueTime> Ran a red light at Arik & Bentz st. </violation> </offender> </offenders> Root Element
offenders.xml: Content The content of an element is all the text that lies between its start and end tags. An XML parser is required to pass all characters in a document, including whitespace characters. <offenders> <!--Listsalltrafficoffenders--> <offender id="024378449"> <firstName>David</firstName> <middleName>Reuven</middleName> <lastName>Harel</lastName> <violation id=’12’> <code num=“232” category=“traffic”/> <issueDate>2001-11-02</issueDate> <issueTime>10:32:00</issueTime> RanaredlightatArik&Benz st. </violation> </offender> </offenders> whitespace
offenders.xml: Attributes Attributes are used to provide additional information about elements. Element values must always be enclosed in quotes (“/‘) <offenders> <!--Listsalltrafficoffenders--> <offender id="024378449"> <firstName>David</firstName> <middleName>Reuven</middleName> <lastName>Harel</lastName> <violation id=’12’> <code num=“232” category=“traffic”/> <issueDate>2001-11-02</issueDate> <issueTime>10:32:00</issueTime> Ranaredlightat Arik & Benz st. </violation> </offender> </offenders>
DOMTM • DOMTM – Document Object Model • A Standard hierarchy of objects, recommended by the W3C, that corresponds to XML documents. • Each element, attribute, comment, etc., in an XML document is represented by a Node in the DOM tree. • The DOM API1 allows data in an XML document to be accessed and modified by manipulating the nodes in a DOM tree. • Application Programming Interface
:Text :Text :Text :Text offenders.xml: DOM tree :Document :Element offenders :Comment Listsalltrafficoffenders :Element offender :Attribute id :Text 024378449 :Element firstName :Text David
:Text Example: offenders DOM :Element lastName The element “middleName” was skipped :Text Harel :Element violation :Attribute id :Text 12 offenders offender :Text :Element code :Attribute num :Text 232 :Text :Attribute category :Text traffic :Element issueDate :Text 2001-11-02
:Text :Text Example: offenders DOM :Text offenders offender violation :Element issueTime :Text 10:32:00 :Text Ranaredlight atArik&Benzst.
DOM Class Hierarchy1 <<interface>> NodeList <<interface>> Node <<interface>> NamedNodeMap <<interface>> Document <<interface>> CharacterData <<interface>> Element <<interface>> Text <<interface>> Comment • A partial class hierarchy is presented in this slide.
JAXP • JAXP – JavaTM API for XML Processing • JAXP enables applications to parse and transform XML documents using an API that is independent of a particular XML processor implementation. • JAXP provides two parser types: • SAX1 parser: event driven • DOM document builder: constructs DOM trees by parsing XML documents. • Simple API for XML
Creating a DOM Builder • Create a DocumentBuilderFactory object:DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); • Configure the factory object:dbf.setIgnoringComments(true); • Create a builder instance using the factory:DocumentBuilder docBuilder = dbf.newDocumentBuilder(); A ParserConfigurationException is thrown if a DocumentBuilder cannot be created which satisfies the configuration requested.
Building a DOM Document • A DOM document can be built manually from within the application:Document doc = docBuilder.newDocument();Element offenders = doc.createElement("offenders");doc.appendChild(offenders);Element offender = doc.createElement("offender");offender.setAttribute("id", "024378449 ");offenders.appendChild(offender);Element firstName = doc.createElement(“firstName”);Text text = doc.createTextNode(“ David “);firstName.appendChild(text);... A DOMException is raised if an illegal character appears in a name, an illegal child is appended to a node etc.
Building a DOM Document • A DOM representation of an XML document can be built automatically by parsing the XML document:Document doc = docBuilder.parse(new File(xmlFile)); A SAXParseException or SAXException is raised to report parse errors.
DumpDom.java (1 of 5) import org.w3c.dom.Document; import org.w3c.dom.NodeList; import org.w3c.dom.NamedNodeMap; import org.w3c.dom.Node; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.ParserConfigurationException; import java.io.File; import java.io.IOException; Creating and traversing a DOM document
DumpDom.java (2 of 5) public class DumpDom { private int indent = 0; // text indentation level public DumpDom(String xmlFile) { try { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder docBuilder = dbf.newDocumentBuilder(); Document doc = docBuilder.parse(new File(xmlFile)); recursiveDump(doc); } catch (ParserConfigurationException pce) { System.err.println("Failed to create document builder"); } catch (SAXParseException spe) { System.err.println("Error: Line=" + spe.getLineNumber() + ": " + spe.getMessage()); } catch (SAXException se) { System.err.println("Parse error found: " + se); } catch (IOException e) { e.printStackTrace(); } }
private void recursiveDump(Node node) { switch (node.getNodeType()) { case Node.DOCUMENT_NODE: dumpNode("document", node); break; case Node.COMMENT_NODE: dumpNode("comment", node); break; case Node.ATTRIBUTE_NODE: dumpNode("attribute", node); break; case Node.TEXT_NODE: dumpNode("text", node); break; case Node.ELEMENT_NODE: dumpNode("element", node); indent += 2; DumpDom.java (3 of 5)
NamedNodeMap atts = node.getAttributes(); for (int i = 0 ; i < atts.getLength() ; ++i) recursiveDump(atts.item(i)); indent -= 2; break; default: System.err.println("Unknown node: " + node); System.exit(1); } // print children of the input node (if there are any) indent+=2; for (Node child = node.getFirstChild() ; child != null ; child = child.getNextSibling()) { recursiveDump(child); } indent-=2; } DumpDom.java (4 of 5)
DumpDom.java (5 of 5) private void dumpNode(String type, Node node) { for (int i = 0 ; i < indent ; ++i) System.out.print(" "); System.out.print("[" + type + "]: "); System.out.print(node.getNodeName()); if (node.getNodeValue() != null) System.out.print("=\"" + node.getNodeValue() + "\""); System.out.print("\n"); } public final static void main(String[] args) { DumpDom dumper = new DumpDom(args[0]); } }
XML Schema • The purpose of an XML Schema is to define a class of XML documents. • An XML document that is syntactically correct is considered well formed. If it also conforms to a XML schema is considered valid. • A XML document is not required to have a corresponding Schema. • XML Schemas are expected to replace the DTD1 as the primary means of describing document structure. • Document Type Definition (uses EBNF form)
XML Schema (cont.) • XML Schema documents are themselves XML documents. • Can be manipulated as such • XML Schema is a language with a XML syntax. • A XML document may explicitly reference the schema document that validates it. • A schema language is validated by a DTD. • Several schema models exist. In this course we will use the W3C XML Schema1. • W3C recommendation since 2001
W3C XML Schema <xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema”> ... </schema> • A W3C XML Schema consists of a schema element and a variety of sub-elements which determine the appearance of elements and their content in instance documents • Each of the elements (and predefined simple types) in the schema has (by convention) a prefix xsd:which is associated with the W3C XML schema namespace.
Elements & Attribute Declarations • Elements are declared using the element element:<xsd:element name=“firstName” type=“xsd:NMTOKEN”/><xsd:element name=“offenders” type=“Offenders”/> • Attributes are declared using the attribute element:<xsd:attribute name=“id” type=“xsd:positiveInteger”/> A pre-defined (simple) type
Element & Attribute Types • Elements that contain sub-elements or carry attributes are said to have complex types. • Elements that contain only text (e.g. numbers, strings, dates etc.) but do contain any sub-elements are said to have simple types. • Attributes always have simple types. • Many simple types (e.g. string, date, integer etc.) are pre-defined.
A Few Built in Simple Types • Should only be used as attribute types
Derived Simple Types • New simple types may be defined by deriving them from existing simple types (build-in and derived) • New simple types are derived by restricting the range of permitted values for an existing simple type. • A new simple type is defined using the simpleType element.
Derived Simple Types (cont.) • Example: Numeric Restriction<xsd:simpleType name="ViolationID"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="100"/> </xsd:restriction></xsd:simpleType> • Example: Enumeration<xsd:simpleType name="ViolationCategory"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="traffic"/> <xsd:enumeration value="criminal"/> <xsd:enumeration value="civil"/> </xsd:restriction></xsd:simpleType>
Complex Types • Complex types are defined using the complexType element. • Elements with complex types may carry attributes. • The content of elements with complex types is categorized as follows: • Empty: no content is allowed. • Simple: content must be of simple type. • Element: content must include only child elements. • Mixed: both element and character content is allowed.
Complex Types: Attributes • Attributes may be declared, using the use attribute, as required, optional (default) or prohibited. • Default values for attributes are declared using the default attribute • Allowed only for optional attributes • The fixed attribute is used to ensure that an attribute is set to a particular value. • Appearance of the attribute is optional. • fixed and use are mutually exclusive.
Complex Types: Attributes (cont.) • Example: use, fixed <xsd:complexType name="Code"> <xsd:attribute name="num" type="ViolationID“ use="required"/> <xsd:attribute name="category" type="ViolationCategory“ fixed="traffic"/> </xsd:complexType> • Example: use, default <xsd:complexType name="IssueTime"> ... <xsd:attribute name="accuracy" type="Accuracy" use="optional" default="accurate"/> ... </xsd:complexType>
Complex Types: Empty Content • Example: schema <xsd:complexType name="Code"> <xsd:attribute name="num" type="ViolationID" use="required"/> <xsd:attribute name="category" type="ViolationCategory“ fixed="traffic"/> </xsd:complexType> • Example: instance document <code num="232" category="traffic"/> <code num="232" category="traffic"></code> <code num="232"/>
Complex Types: Simple Content • Example: element with no attributes <xsd:element name="firstName" type="xsd:NMTOKEN"/> • Example: element with attributes <xsd:complexType name="IssueTime"> <xsd:simpleContent> <xsd:extension base="xsd:time"> <xsd:attribute name="accuracy" type="Accuracy" use="optional" default="accurate"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> Simple type
Complex Types: Element Content • Element Occurrence Constraints • The minimum number of times an element may appear is specified by the value of the optional attribute minOccurs. • The maximum number of times an element may appear is specified by the value of the optional attribute maxOccurs. • The value unbounded indicates that there maximum number of occurrences is unbounded. • The default value of minOccurs and maxOccurs is 1.
Complex Types: Element Content (cont.) • The element sequence is used to specify a sequence of sub-elements. • Elements must appear in the same order that they are declared. <xsd:complexType> <xsd:sequence> <xsd:element name="firstName" type="xsd:NMTOKEN"/> <xsd:element name="middleName" type="xsd:NMTOKEN“ minOccurs="0"/> <xsd:element name="lastName" type="xsd:NMTOKEN"/> <xsd:element name="violation" type="Violation“ minOccurs="0" maxOccurs="unbounded"/> ... </xsd:sequence> ... </xsd:complexType>
Complex Types: Mixed Content • The optional Boolean attribute mixed is used to specify mixed content: <xsd:complexType name="Violation" mixed="true"> <xsd:sequence> <xsd:element name="code" type="Code"/> <xsd:element name="issueDate" type="xsd:date"/> <xsd:element name="issueTime" type="IssueTime"/> </xsd:sequence> ... </xsd:complexType>
Global Elements/Attributes • Global elements and global attributes are created by declarations that appear as the children of the schema element. • A global element is allowed to appear as the root element of an instance document. • The attribute ref of element/attribute elements may be used (instead of the name attribute)to reference a global element/attribute. • Cardinality constraints cannot be placed on global declarations, although they can be placed on local declarations that reference global declarations.
Global Elements/Attributes (cont.) • Example: global declarations <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="offenders" type="Offenders"/> <xsd:element name="comment" type="xsd:string"/> <xsd:attribute name="id" type="xsd:positiveInteger"/> ... • Example: ref attribute <xsd:element ref="comment" minOccurs="0"/> <xsd:attribute ref="id" use="required"/>
Anonymous Type Definitions • When a type is referenced only once, or contains very few constraints, it can be more succinctly defined as an anonymous type. • Saves the overhead of naming the type and explicitly referencing it.
Anonymous Type Definitions (cont.) <xsd:element name="offender" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="firstName" type="xsd:NMTOKEN"/> <xsd:element name="middleName" type="xsd:NMTOKEN“ minOccurs="0"/> <xsd:element name="lastName" type="xsd:NMTOKEN"/> <xsd:element name="violation" type="Violation“ minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="comment" minOccurs="0"/> </xsd:sequence> <xsd:attribute ref="id" use="required"/> </xsd:complexType> </xsd:element> Is this a global declaration? Anonymous
offenders.xsd (1 of 4) Schema for offenders XML documents <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="offenders" type="Offenders"/> <xsd:element name="comment" type="xsd:string"/> <xsd:attribute name="id" type="xsd:positiveInteger"/> <xsd:complexType name="IssueTime"> <xsd:simpleContent> <xsd:extension base="xsd:time"> <xsd:attribute name="accuracy" type="Accuracy" use="optional" default="accurate"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> <xsd:complexType name="Code"> <xsd:attribute name="num" type="ViolationID" use="required"/> <xsd:attribute name="category" type="ViolationCategory" fixed="traffic"/> </xsd:complexType>
offenders.xsd (2 of 4) <xsd:complexType name="Offenders"> <xsd:sequence> <xsd:element name="offender" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="firstName" type="xsd:NMTOKEN"/> <xsd:element name="middleName" type="xsd:NMTOKEN“ minOccurs="0"/> <xsd:element name="lastName" type="xsd:NMTOKEN"/> <xsd:element name="violation" type="Violation" minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="comment" minOccurs="0"/> </xsd:sequence> <xsd:attribute ref="id" use="required"/> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType>
offenders.xsd (3 of 4) <xsd:complexType name="Violation" mixed="true"> <xsd:sequence> <xsd:element name="code" type="Code"/> <xsd:element name="issueDate" type="xsd:date"/> <xsd:element name="issueTime" type="IssueTime"/> </xsd:sequence> <xsd:attribute ref="id" use="required"/> </xsd:complexType> <xsd:simpleType name="ViolationID"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="100"/> </xsd:restriction> </xsd:simpleType>
offenders.xsd (4 of 4) <xsd:simpleType name="ViolationCategory"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="traffic"/> <xsd:enumeration value="criminal"/> <xsd:enumeration value="civil"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="Accuracy"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="accurate"/> <xsd:enumeration value="approx"/> </xsd:restriction> </xsd:simpleType> </xsd:schema>
Validating Parsers • A validating parser is capable of reading a Schema specification or DTD and determine whether or not XML documents conform to it. • A non validating parser is capable of reading a Schema / DTD but cannot check XML documents for conformity. • Limited to syntax checking
Creating a Validating DOM Parser • Create a DocumentBuilderFactory object: DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); • Configure the factory object to produce a validating parser: dbf.setAttribute("http://java.sun.com/xml/jaxp/properties" + "/schemaLanguage", "http://www.w3.org/2001/XMLSchema"); dbf.setAttribute("http://java.sun.com/xml/jaxp/properties" + "/schemaSource", new File(xmlSchema)); dbf.setValidating(true); • Create a builder instance and set its error-handler:DocumentBuilder docBuilder = dbf.newDocumentBuilder();docBuilder.setErrorHandler(new MyErrorHandler());
Handling Parsing Errors • By default, JAXP parsers do not throw exceptions when documents are found to be invalid. • JAXP provides the interface ErrorHandler so that users will be able to implement their own error-handling semantics.
BoundedErrorPrinter.java (1 of 3) import org.xml.sax.ErrorHandler; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; /** * An error handler that prints to the standard error stream a specified * number of errors. Once the specified number of errors is detected, * parsing is aborted. */ public class BoundedErrorPrinter implements ErrorHandler { private int errorCount = 0; private int errorsToPrint; public BoundedErrorPrinter(int errorsToPrint) { this.errorsToPrint = errorsToPrint; }