550 likes | 631 Views
Programming with XML. Written by: Adam Carmi Zvika Gutterman. Agenda. About XML Review of XML syntax Document Object Model (DOM) JAXP W3C XML Schema Validating Parsers. About XML. XML – E X tensible M arkup L anguage Designed to describe data
E N D
Programming with XML Written by: Adam Carmi Zvika Gutterman
Agenda • About XML • Review of XML syntax • Document Object Model (DOM) • JAXP • W3C XML Schema • Validating Parsers XML
About XML • XML – EXtensible Markup Language • Designed to describe data • Provides semantic and structural information • Extensible • Human readable and computer-manipulable • Software and Hardware independent • Open and Standardized by W3C1 • Ideal for data exchange • World Wide Web Consortium (founded in 1994 by Tim Berners-Lee) XML
offenders.xml Information is marked up with structural and semantic information. The characters &, <, >, ‘, “ are reserved and can’t be used in character data. Use &, <, >, ' and " instead. <offenders> <!-- Lists all traffic offenders --> <offender id=“024378449 ”> <firstName> David </firstName> <middleName>Reuven</middleName> <lastName>Harel</lastName> <violation id=’12’> <code num=“232” category=“traffic”/> <issueDate>2001-11-02</issueDate> <issueTime>10:32:00</issueTime> Ran a red light at Arik & Bentz st. </violation> </offender> </offenders> Comment Character Data Tag Character Data XML
offenders.xml: Tags XML tags are not pre-defined and are case sensitive. An XML document may have only one root tag. <offenders> <!-- Lists all traffic offenders --> <offender id="024378449 "> <firstName> David </firstName> <middleName>Reuven</middleName> <lastName>Harel</lastName> <violation id=’12’> <code num=“232” category=“traffic”/> <issueDate>2001-11-02</issueDate> <issueTime>10:32:00</issueTime> Ran a red light at Arik & Bentz st. </violation> </offender> </offenders> Root Tag Start Tag Shorthand for: <code num=...></code> End Tag XML
offenders.xml: Elements Elements mark-up information. Element x begins with a start-tag <x> and ends with an end-tag </x> XML Elements must be properly nested: <x>...<y>...</y>...</x> XML documents must contain exactly one root element. <offenders> <!-- Lists all traffic offenders --> <offender id="024378449 "> <firstName> David </firstName> <middleName>Reuven</middleName> <lastName>Harel</lastName> <violation id=’12’> <code num=“232” category=“traffic”/> <issueDate>2001-11-02</issueDate> <issueTime>10:32:00</issueTime> Ran a red light at Arik & Bentz st. </violation> </offender> </offenders> Root Element XML
offenders.xml: Content The content of an element is all the text that lies between its start and end tags. An XML parser is required to pass all characters in a document, including whitespace characters. <offenders> <!--Listsalltrafficoffenders--> <offender id="024378449"> <firstName>David</firstName> <middleName>Reuven</middleName> <lastName>Harel</lastName> <violation id=’12’> <code num=“232” category=“traffic”/> <issueDate>2001-11-02</issueDate> <issueTime>10:32:00</issueTime> RanaredlightatArik&Benz st. </violation> </offender> </offenders> whitespace XML
offenders.xml: Attributes Attributes are used to provide additional information about elements. Attributes values must always be enclosed in quotes (“/‘) <offenders> <!--Listsalltrafficoffenders--> <offender id="024378449"> <firstName>David</firstName> <middleName>Reuven</middleName> <lastName>Harel</lastName> <violation id=’12’> <code num=“232” category=“traffic”/> <issueDate>2001-11-02</issueDate> <issueTime>10:32:00</issueTime> Ranaredlightat Arik & Benz st. </violation> </offender> </offenders> XML
DOMTM • DOMTM – Document Object Model • A Standard hierarchy of objects, recommended by the W3C, that corresponds to XML documents. • Each element, attribute, comment, etc., in an XML document is represented by a Node in the DOM tree. • The DOM API1 allows data in an XML document to be accessed and modified by manipulating the nodes in a DOM tree. • Application Programming Interface XML
DOM Class Hierarchy1 <<interface>> NodeList <<interface>> Node <<interface>> NamedNodeMap <<interface>> Document <<interface>> CharacterData <<interface>> Element <<interface>> Attr <<interface>> Text <<interface>> Comment • A partial class hierarchy is presented in this slide. XML
:Text :Text :Text :Text offenders.xml: DOM tree :Document :Element offenders :Comment Listsalltrafficoffenders :Element offender :Attribute id :Text 024378449 :Element firstName :Text David XML
:Text Example: offenders DOM :Element lastName The element “middleName” was skipped :Text Harel :Element violation :Attribute id :Text 12 offenders offender :Text :Element code :Attribute num :Text 232 :Text :Attribute category :Text traffic :Element issueDate :Text 2001-11-02 XML
:Text :Text Example: offenders DOM :Text offenders offender violation :Element issueTime :Text 10:32:00 :Text Ranaredlight atArik&Benzst. XML
JAXP • JAXP – JavaTM API for XML Processing • JAXP enables applications to parse and transform XML documents using an API that is independent of a particular XML processor implementation. • JAXP provides two parser types: • SAX1 parser: event driven • DOM document builder: constructs DOM trees by parsing XML documents. • Simple API for XML XML
Creating a DOM Builder • Create a DocumentBuilderFactory object:DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); • Configure the factory object:dbf.setIgnoringComments(true); • Create a builder instance using the factory:DocumentBuilder docBuilder = dbf.newDocumentBuilder(); A ParserConfigurationException is thrown if a DocumentBuilder, which satisfies the configuration requested cannot be created. XML
Building a DOM Document • A DOM document can be built manually from within the application:Document doc = docBuilder.newDocument();Element offenders = doc.createElement("offenders");doc.appendChild(offenders);Element offender = doc.createElement("offender");offender.setAttribute("id", "024378449 ");offenders.appendChild(offender);Element firstName = doc.createElement(“firstName”);Text text = doc.createTextNode(“ David “);firstName.appendChild(text);... A DOMException is raised if an illegal character appears in a name, an illegal child is appended to a node etc. XML
Building a DOM Document • A DOM Tree representation of an XML document can be built automatically by parsing the XML document:Document doc = docBuilder.parse(new File(xmlFile)); A SAXParseException or SAXException is raised to report parse errors. XML
DumpDom.java (1 of 5) import org.w3c.dom.Document; import org.w3c.dom.NodeList; import org.w3c.dom.NamedNodeMap; import org.w3c.dom.Node; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.ParserConfigurationException; import java.io.File; import java.io.IOException; Creating and traversing a DOM document XML
DumpDom.java (2 of 5) public class DumpDom { private int indent = 0; // text indentation level public DumpDom(String xmlFile) { try { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder docBuilder = dbf.newDocumentBuilder(); Document doc = docBuilder.parse(new File(xmlFile)); recursiveDump(doc); } catch (ParserConfigurationException pce) { System.err.println("Failed to create document builder"); } catch (SAXParseException spe) { System.err.println("Error: Line=" + spe.getLineNumber() + ": " + spe.getMessage()); } catch (SAXException se) { System.err.println("Parse error found: " + se); } catch (IOException e) { e.printStackTrace(); } } XML
private void recursiveDump(Node node) { switch (node.getNodeType()) { case Node.DOCUMENT_NODE: dumpNode("document", node); break; case Node.COMMENT_NODE: dumpNode("comment", node); break; case Node.ATTRIBUTE_NODE: dumpNode("attribute", node); break; case Node.TEXT_NODE: dumpNode("text", node); break; DumpDom.java (3 of 5) XML
case Node.ELEMENT_NODE: dumpNode("element", node); indent += 2; NamedNodeMap atts = node.getAttributes(); for (int i = 0 ; i < atts.getLength() ; ++i) recursiveDump(atts.item(i)); indent -= 2; break; default: System.err.println("Unknown node: " + node); System.exit(1); } // end of switch // print children of the input node (if there are any) indent+=2; for (Node child = node.getFirstChild() ; child != null ; child = child.getNextSibling()) { recursiveDump(child); } indent-=2; }// end of recursiveDump DumpDom.java (4 of 5) XML
DumpDom.java (5 of 5) private void dumpNode(String type, Node node) { for (int i = 0 ; i < indent ; ++i) System.out.print(" "); System.out.print("[" + type + "]: "); System.out.print(node.getNodeName()); if (node.getNodeValue() != null) System.out.print("=\"" + node.getNodeValue() + "\""); System.out.print("\n"); } public final static void main(String[] args) { DumpDom dumper = new DumpDom(args[0]); } } XML
DTD - Document Type Definition • A specification for ensuring the validity of XML documents • The original mechanism, defined as part of the XML specification • Various Schema proposals - newer mechanisms for describing validation criteria XML
XML Schema • The purpose of an XML Schema is to define a class of XML documents. • An XML document that is syntactically correct is considered well formed. If it also conforms to an XML schema is considered valid. • An XML document is not required to have a corresponding Schema. XML
XML Schema (cont.) • XML Schema documents are themselves XML documents. • Can be manipulated as such • XML Schema is a language with an XML syntax. • An XML document may explicitly reference the schema document that validates it. • Several schema models exist. In this course we will use the W3C XML Schema1. • W3C recommendation since 2001 XML
W3C XML Schema <xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema”> ... </schema> • A W3C XML Schema consists of a schema element and a variety of sub-elements which determine the appearance of elements and their content in instance documents • Each of the elements (and predefined simple types) in the schema has (by convention) a prefix xsd:which is associated with the W3C XML schema namespace. XML
Elements & Attribute Declarations • Elements are declared using the element element:<xsd:element name=“firstName” type=“xsd:string”/><xsd:element name=“offenders” type=“Offenders”/> • Attributes are declared using the attribute element:<xsd:attribute name=“id” type=“xsd:positiveInteger”/> A pre-defined (simple) type XML
Element & Attribute Types • Elements that contain sub-elements or carry attributes are said to have complex types. • Elements that contain only text (e.g. numbers, strings, dates etc.) but do not contain any sub-elements are said to have simple types. • Attributes always have simple types. • Many simple types (e.g. string, date, integer etc.) are pre-defined. XML
A Few Built in Simple Types XML • Should only be used as attribute types
Derived Simple Types • New simple types may be defined by deriving them from existing simple types (build-in and derived) • New simple types are derived by restricting the range of permitted values for an existing simple type. • A new simple type is defined using the simpleType element. XML
Derived Simple Types (cont.) • Example: Numeric Restriction<xsd:simpleType name="ViolationID"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="100"/> </xsd:restriction></xsd:simpleType> • Example: Enumeration<xsd:simpleType name="ViolationCategory"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="traffic"/> <xsd:enumeration value="criminal"/> <xsd:enumeration value="civil"/> </xsd:restriction></xsd:simpleType> XML
Complex Types • Complex types are defined using the complexType element. • Elements with complex types may carry attributes. • The content of elements with complex types is categorized as follows: • Empty: no content is allowed. • Simple: content must be of simple type. • Element: content must include only child elements. • Mixed: both element and character content is allowed. XML
Complex Types: Attributes • Attributes may be declared, using the use attribute, as required or optional (default). • Default values for attributes are declared using the default attribute • Allowed only for optional attributes • The fixed attribute is used to ensure that an attribute is set to a particular value. • Appearance of the attribute is optional. • fixed and use are mutually exclusive. XML
Complex Types: Attributes (cont.) • Example: use, fixed <xsd:complexType name="Code"> <xsd:attribute name="num" type="ViolationID“ use="required"/> <xsd:attribute name="category" type="ViolationCategory“ fixed="traffic"/> </xsd:complexType> • Example: use, default <xsd:complexType name="IssueTime"> ... <xsd:attribute name="accuracy" type="Accuracy" use="optional" default="accurate"/> ... </xsd:complexType> XML
Complex Types: Empty Content • Example: schema <xsd:complexType name="Code"> <xsd:attribute name="num" type="ViolationID" use="required"/> <xsd:attribute name="category" type="ViolationCategory“ fixed="traffic"/> </xsd:complexType> • Example: instance document <code num="232" category="traffic"/> <code num="232" category="traffic"></code> <code num="232"/> XML
Complex Types: Simple Content • Example: element with no attributes <xsd:element name="firstName" type="xsd:string"/> • Example: element with attributes <xsd:complexType name="IssueTime"> <xsd:simpleContent> <xsd:extension base="xsd:time"> <xsd:attribute name="accuracy" type="Accuracy" use="optional" default="accurate"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> Simple type XML
Complex Types: Element Content • Element Occurrence Constraints • The minimum number of times an element may appear is specified by the value of the optional attribute minOccurs. • The maximum number of times an element may appear is specified by the value of the optional attribute maxOccurs. • The value unbounded indicates that there maximum number of occurrences is unbounded. • The default value of minOccurs and maxOccurs is 1. XML
Complex Types: Element Content (cont.) • The element sequence is used to specify a sequence of sub-elements. • Elements must appear in the same order that they are declared. <xsd:complexType> <xsd:sequence> <xsd:element name="firstName" type="xsd:string"/> <xsd:element name="middleName" type="xsd:string“ minOccurs="0"/> <xsd:element name="lastName" type="xsd:string"/> <xsd:element name="violation" type="Violation“ minOccurs="0" maxOccurs="unbounded"/> ... </xsd:sequence> ... </xsd:complexType> XML
Complex Types: Mixed Content • The optional Boolean attribute mixed is used to specify mixed content: <xsd:complexType name="Violation" mixed="true"> <xsd:sequence> <xsd:element name="code" type="Code"/> <xsd:element name="issueDate" type="xsd:date"/> <xsd:element name="issueTime" type="IssueTime"/> </xsd:sequence> ... </xsd:complexType> XML
Global Elements/Attributes • Global elements and global attributes are created by declarations that appear as the children of the schema element. • A global element is allowed to appear as the root element of an instance document. • The attribute ref of element/attribute elements may be used (instead of the name attribute)to reference a global element/attribute. • Cardinality constraints cannot be placed on global declarations, although they can be placed on local declarations that reference global declarations. XML
Global Elements/Attributes (cont.) • Example: global declarations <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="offenders" type="Offenders"/> <xsd:element name="comment" type="xsd:string"/> <xsd:attribute name="id" type="xsd:positiveInteger"/> ... • Example: ref attribute <xsd:element ref="comment" minOccurs="0"/> <xsd:attribute ref="id" use="required"/> XML
Anonymous Type Definitions • When a type is referenced only once, or contains very few constraints, it can be more succinctly defined as an anonymous type. • Saves the overhead of naming the type and explicitly referencing it. XML
Anonymous Type Definitions (cont.) <xsd:element name="offender" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="firstName" type="xsd:string"/> <xsd:element name="middleName" type="xsd:string“ minOccurs="0"/> <xsd:element name="lastName" type="xsd:string"/> <xsd:element name="violation" type="Violation“ minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="comment" minOccurs="0"/> </xsd:sequence> <xsd:attribute ref="id" use="required"/> </xsd:complexType> </xsd:element> Is this a global declaration? Anonymous XML
offenders.xsd (1 of 4) Schema for offenders XML documents <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="offenders" type="Offenders"/> <xsd:element name="comment" type="xsd:string"/> <xsd:attribute name="id" type="xsd:positiveInteger"/> <xsd:complexType name="IssueTime"> <xsd:simpleContent> <xsd:extension base="xsd:time"> <xsd:attribute name="accuracy" type="Accuracy" use="optional" default="accurate"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> <xsd:complexType name="Code"> <xsd:attribute name="num" type="ViolationID" use="required"/> <xsd:attribute name="category" type="ViolationCategory" fixed="traffic"/> </xsd:complexType> XML
offenders.xsd (2 of 4) <xsd:complexType name="Offenders"> <xsd:sequence> <xsd:element name="offender" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="firstName" type="xsd:string"/> <xsd:element name="middleName" type="xsd:string“ minOccurs="0"/> <xsd:element name="lastName" type="xsd:string"/> <xsd:element name="violation" type="Violation" minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="comment" minOccurs="0"/> </xsd:sequence> <xsd:attribute ref="id" use="required"/> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> XML
offenders.xsd (3 of 4) <xsd:complexType name="Violation" mixed="true"> <xsd:sequence> <xsd:element name="code" type="Code"/> <xsd:element name="issueDate" type="xsd:date"/> <xsd:element name="issueTime" type="IssueTime"/> </xsd:sequence> <xsd:attribute ref="id" use="required"/> </xsd:complexType> <xsd:simpleType name="ViolationID"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="100"/> </xsd:restriction> </xsd:simpleType> XML
offenders.xsd (4 of 4) <xsd:simpleType name="ViolationCategory"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="traffic"/> <xsd:enumeration value="criminal"/> <xsd:enumeration value="civil"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="Accuracy"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="accurate"/> <xsd:enumeration value="approx"/> </xsd:restriction> </xsd:simpleType> </xsd:schema> XML
Validating Parsers • A validating parser is capable of reading a Schema specification or DTD and determine whether or not XML documents conform to it. • A non validating parser is capable of reading a Schema / DTD but cannot check XML documents for conformity. • Limited to syntax checking XML