300 likes | 309 Views
Learn about XML, its syntax, standards, and document structure. Understand elements, tags, attributes, and entities used in XML. Explore how DTDs and XML Schema define constraints in XML documents.
E N D
G52IWS: Extensible Markup Language (XML) Chris Greenhalgh
Contents • What is XML • XML standards • XML Syntax • DTDs • XML Schema See “Developing Java Web Services” chapter 8, first part and G51WPS notes on XML; see W3C standards
What is XML • Text-based language for structured data encoding • Tree-structured • Common abstract syntax • any XML document can be read by a common parser • DTDs or XML-Schema define particular application-specific constraints • E.g. new tags, allowed structures & datatypes
XML standards • Created in 1996 • Derived from SGML markup language • Managed by the W3C XML (www.w3c.org) group(s) since 1998 • http://www.w3.org/XML/Core/#Publications inc: • Extensible Markup Language (XML) 1.0 (Fourth Edition) • http://www.w3.org/XML/Schema#dev inc: • XML Schema Part 0: Primer • http://www.w3.org/XML/Query/#specs inc: • XML Path Language (XPath) 2.0 • …
XML Example (no DTD) <?xml version="1.0" ?> <Friends> <Person> <Name>Jane Doe</Name> <Age>21</Age> <Body> <Weight Unit="lbs">126</Weight> <Height Unit="inches">62</Height> </Body> <Trust trusted="yes"/> </Person> <Person> <Name>John Doe</Name> <Age>26</Age> <Trust trusted="no"/> </Person> </Friends>
XML document structure • Prolog • Document type declaration • Optional • Includes element declarations • Root element • With nested elements • With optional attributes • With optional text content (incl. CDATA sections) • Interleaved with optional comments and processing instructions
XML Syntax Contents • Prolog • Root • Processing instructions • Comments • Names • Tags • Elements • Content and CDATA sections • Attributes • Entities • Namespaces
Prolog • Every XML document starts with prolog, e.g. <?xml version="1.0" ?><?xml version="1.0" encoding="ISO-8859-1" ?> • Known start allows multi-byte and byte-order encodings to be identified • Allows specific encoding to be specified • Defaults to Unicode (UTF-8 if single byte)
Root • Every XML document has exactly one “top”-level or root element, e.g. <?xml version="1.0" ?> <Friends> … </Friends> • But not e.g. <?xml version="1.0" ?> <Friends> … </Friends> <Friends> … </Friends>
Processing instructions • Provide information for XML processing application(s) • Are of the form: <?targetinstructions?> • Includes the document prolog:<?xml version="1.0" ?>
Comments • Used for documentation • Are of the form:<!-- some comment --> • E.g.:<?xml version="1.0" ?><!-- my friends --><Friends> <!-- my first friend --> <Person> … </Person></Friends>
Names • No blanks spaces • Must start with alphabetical letter (e.g. A-Z or a-z) or underscore (_) • Can be followed by letters, digits (0-9), underscores (_), hyphens (-), periods (.) and colons (:) • Colons are normally reserved for use with namespaces • Case-sensitive • E.g. “product” is different from “Product”
Tags • Main building block of XML • Start tag:<tagname optional-attributes> • End tag:</tagname> • Empty-element tag:<tagname optional-attributes/>
XML Example <?xml version="1.0" ?> <Friends> <Person> <Name>Jane Doe</Name> <Age>21</Age> <Body> <Weight Unit="lbs">126</Weight> <Height Unit="inches">62</Height> </Body> <Trust trusted="yes"/> </Person> <Person> <Name>John Doe</Name> <Age>26</Age> <Trust trusted="no"/> </Person> </Friends> Start tag without attributes Start tag with attributes Empty-element tag End tag
Elements • Basic building block of XML • Have form: • Start tag … matching end tag or • Empty-element tag • Never overlap • Unlike SGML • E.g. can’t have “<a>…<b>…</a>…</b>” • But can be nested • I.e. a tree, starting from the root element • E.g. can have “<a>…<b>…</b>…</a>” • Can contain textual content
Content and CDATA sections • Within elements • between start and end tags • Plain text • Whitespace optionally significant • No ‘<‘ or ‘&’ • Use entity references instead (“<&”) • CDATA “escape” section can include any text unescaped except “]]>” e.g.<![CDATA[<hello>&asoa,osd>as<]]>
Attributes • Set of key-value pairs associated with each element • Defined in the start tag or empty-element tag • never in the end tag • Optional • Each key must be unique within that element • E.g. attribute key is “Unit” and value is “lbs”:<Weight Unit="lbs">126</Weight>
Entities • Short-cuts/references to text • Of the form:&entityname; • E.g.< <> >& &" "' ' • More can be defined in the (optional) DTD
Namespaces • Are contexts within which names are defined • Prevent confusion between coincidental uses of the same names (for elements or attributes) • Namespace is a URI • Never actually resolved to a document • Default namespace introduced by attributexmlns="namespaceuri" • Applies to that and all subsequent unqualified element names (NOT attribute names) • Namespace prefix introduced by attributexmlns:prefix="namespaceuri" • Used explicitly as “prefix:name” • No namespace is the same as the empty URI “” • This is the top-level default namespace and default namespace for all attributes at any level
Namespace example Expanded names <?xml version="1.0" ?> <Friends xmlns="http://woo.foo/"> <Person xmlns:n2="http://wee.fee/"> <n2:Name>Jane Doe</n2:Name> <Age xmlns="http://wee.fee/">21</Age> <Weight Unit="lbs">126</Weight> <Height n2:Unit="inches">62</Height> </Person> </Friends> “http://woo.foo/”,”Friends” Default NS“http://woo.foo/” “http://woo.foo/”,”Person” “http://wee.fee/”,”Name” Default NS“http://wee.fee/” “http://wee.fee/”,”Age” “http://woo.foo/”,”Weight” (att.) “”,”Unit” (att.) “http://wee.fee/”,”Unit”
Document Type Definitions • Use regular expressions to specify valid document structure • Element nesting, required and optional attributes, default values • May be included after prolog in document • Or may be referenced from an external name or URL • Relatively limited expressiveness, especially for attribute and text values See G51WPS notes
XML Schema • More modern alternative to DTDs for specifying valid XML document structure and content • See http://www.w3.org/XML/Schema#dev • XML Schema Part 0: Primer • XML Schema Part 1: Structures • XML Schema Part 2: Datatypes
XML Schema • An XML Schema definition is an XML document conforming to the XML Schema schema • Allows definition of • simple types • Without nested elements • Including built-in types such as xsd:decimal, xsd:string • complex types • with nested elements and optional attributes • Elements (which may be simple or complex) • Attributes (which all have simple types)
XML Schema example 1 <?xml version="1.0"?> <schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://woo.foo/"> <xsd:element name="comment" type="xsd:string"/> </schema> Defines one element “http://woo.foo”,”comment” of simple type xsd:string, e.g. <?xml version="1.0"?> <comment xmlns="http://woo.foo/">this is a comment</comment>
XML Schema example 2 <?xml version="1.0"?> <schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://woo.foo/"> <xsd:simpleType name="Chocolate"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="dark"/> <xsd:enumeration value="milk"/> <xsd:enumeration value="white"/> </xsd:restriction> </xsd:simpleType> <xsd:element name="chocolate" type="Chocolate"/> </schema> Defines one element “http://woo.foo”,”chocolate” of new simple type “http://woo.foo”,”Chocolate”, which must be “dark”, “milk” or “white” <?xml version="1.0"?> <chocolate xmlns="http://woo.foo/">dark</chocolate>
XML Schema example 3 <?xml version="1.0"?> <schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://woo.foo/"> <xsd:complexType name=“ThreePiece"> <xsd:sequence> <xsd:element name="lead" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="bass" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="drums" type="xsd:string" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> <xsd:element name=“band" type="ThreePiece"/> </schema> Defines one element “http://woo.foo”,”band” of new complex type “http://woo.foo”,”ThreePiece”, with three mandatory child elements <?xml version="1.0"?> <band xmlns="http://woo.foo/"> <lead>Bill</lead> <bass>Bob</bass> <drums>Ben</drums> </band>
XML Schema example 4 <?xml version="1.0"?> <schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://woo.foo/"> <xsd:complexType name=“WeightType"> <xsd:simpleContent> <xsd:extension base="xsd:double"> <xsd:attribute name="Units" type="xsd:string"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> <xsd:element name="weight" type=“WeightType"/> </schema> Defines one element “http://woo.foo”,”weight” of new simple type “http://woo.foo”,”Chocolate”, which must be “dark”, “milk” or “white” <?xml version="1.0"?> <weight xmlns="http://woo.foo/" Units="kg">dark</chocolate>
XML Schema built-in data types • string • base64binary – Base64 encoded binary • boolean – true or false • decimal – integers • double – 64 bit floating point • float – 32 bit floating point • anyUri – URI • duration – duration • dateTime- date & time • … And various restrictions, e.g. minimum & maximum values, lengths
Complex type building blocks • Element combinations: • Sequence – in order given, specifiable count • All – in any order, 0 or 1 of each • Choice – one of • Additional constructions • Reusable groups of elements • Reusable groups of attributes • Substitution groups • Alternative elements which may appear in a particular place
Summary • XML • Common abstract syntax • Hierarchical element tree, plus content and attributes • XML Schema • Specifies XML elements and allowed structure and content for XML document(s) • Checked by “validating” parsers • Used to formally specify WSDL, SOAP, etc. • Can be used to generate schema-specific APIs • E.g. Java API for XML Binding (JAXB) • Typically more readable code than DOM or SAX