1k likes | 1.01k Views
XML Documents and Schema in greater depth. In one sense XML is …. A language neutral way of representing structured data Analogy to serialized object is easiest to understand in this context Great intermediate data format for applications to talk cross-platform, cross-language, etc.
E N D
In one sense XML is … • A language neutral way of representing structured data • Analogy to serialized object is easiest to understand in this context • Great intermediate data format for applications to talk cross-platform, cross-language, etc.
Equivalently, XML is … • like HTML • But you define can whatever tags you want for your application. • Actually more like SGML • HTML is really a Document Type in SGML (Standard Generalized Markup Language) • A flexible format for describing any kind of data (document). • A self-describing format: • an XML document gives complete information about what fields values are associated with • an application doesn’t have to infer the field names from the order. • It just describes a document • Doesn't say what it means. • Doesn't tell how to display it.
Article example <Article > <Headline>Direct Marketer Offended by Term 'Junk Mail' </Headline> <authors> <author> Joe Garden</author> <author> Tim Harrod</author> </authors> <abstract>Dan Spengler, CEO of the direct-mail-marketing firm Mailbox of Savings, took umbrage Monday at the use of the term "junk mail." </abstract> <body type="url" > http://www.theonion.com/archive/3-11-01.html </body> </Article>
Order / Whitespace • Note that element order is important, but whitespace is not. This is the same as far as the xml parser is concerned: • <Article > • <Headline>Direct Marketer Offended by Term 'Junk Mail' </Headline> • <authors> • <author> Joe Garden</author> • <author> Tim Harrod</author> • </authors> • <abstract>Dan Spengler, CEO of the direct-mail-marketing firm Mailbox of • Savings, took umbrage Monday at the use of the term "junk mail." • </abstract> • <body type="url" > http://www.theonion.com/archive/3-11-01.html </body> • </Article>
Molecule Example <?xml version "1.0" ?> <CML> <MOL TITLE="Water" > <ATOMS> <ARRAY BUILTIN="ELSYM" > H O H</ARRAY> </ATOMS> <BONDS> <ARRAY BUILTIN="ATID1" >1 2</ARRAY> <ARRAY BUILTIN="ATID2" >2 3</ARRAY> <ARRAY BUILTIN="ORDER" >1 1</ARRAY> </BONDS> </MOL> </CML>
Rooms example <?xml version="1.0" ?> <rooms> <room name="Red"> <capacity>10</capacity> <equipmentList> <equipment>Projector</equipment> </equipmentList> </room> <room name="Green"> <capacity>5</capacity> <equipmentList /> <features> <feature>No Roof</feature> </features> </room> </rooms>
Suggestion • Try building each of those documents in XMLSpy. • Note: it is not required to create a schema to do this. Just create new XML document and start building.
Things that can appear in an XML document • ELEMENTS: simple, complex, empty, or mixed content; attributes. • The XML declaration • Processing instructions(PIs) <? …?> • Most common is <?xml-stylesheet …?> • <?xml-stylesheet type=“text/css” href=“mys.css”?> • Comments<!-- comment text -->
Parts of an XML document Declaration <?xml version "1.0"?> <CML><MOL TITLE="Water" > <ATOMS> <ARRAY BUILTIN="ELSYM" > H O H</ARRAY> </ATOMS> <BONDS> <ARRAY BUILTIN="ATID1" >1 2</ARRAY> <ARRAY BUILTIN="ATID2" >2 3</ARRAY> <ARRAY BUILTIN="ORDER" >1 1</ARRAY> </BONDS> </MOL> </CML> Tags Begin Tags End Tags Attributes Attribute Values An XML element is everything from (including) the element's start tag to (including) the element's end tag.
XML and Trees Root element • Tags give the structure of a document. They divide the document up into Elements, starting at the top most element, the root element. The stuff inside an element is its content – content can include other elements along with ‘character data’ CML MOL ATOMS BONDS ARRAY ARRAY ARRAY ARRAY CDATA sections 12 23 11 HOH
XML and Trees Root element <?xml version "1.0"?> <CML> <MOL TITLE="Water" > <ATOMS> <ARRAY BUILTIN="ELSYM" > H O H</ARRAY> </ATOMS> <BONDS> <ARRAY BUILTIN="ATID1" >1 2</ARRAY> <ARRAY BUILTIN="ATID2" >2 3</ARRAY> <ARRAY BUILTIN="ORDER" >1 1</ARRAY> </BONDS> </MOL> </CML> CML MOL ATOMS BONDS ARRAY ARRAY ARRAY ARRAY Data sections 12 23 11 HOH
XML and Trees rooms room room capacity features capacity equipmentlist equipmentlist equipment 10 5 feature projector No Roof
Element relationships Book is the root element. Title, prod, and chapter are child elements of book. Book is the parent element of title, prod, and chapter. Title, prod, and chapter are siblings (or sister elements) because they have the same parent. <book> <title>My First XML</title> <prod id="33-657" media="paper"></prod> <chapter>Introduction to XML <para>What is HTML</para> <para>What is XML</para> </chapter> <chapter>XML Syntax <para>Elements must have a closing tag</para> <para>Elements must be properly nested</para> </chapter> </book>
Element content • Elements can have different content types. • An XML element is everything from (including) the element's start tag • to (including) the element's end tag. • An element can have • element content • mixed content, • simple content or • empty content. • An element can also have attributes. • .
Element content, cont. • In the previous example: • book has element content, because it contains other elements. • chapter has mixed content because it contains both text and other elements. • para has simple content (or text content) because it contains only text. • prod has empty content, because it carries no information
Element naming • XML elements must follow these naming rules: • Names can contain letters, numbers, and other characters • Names must not start with a number or punctuation character • Names must not start with the letters xml (or XML or Xml ..) • Names cannot contain spaces • Take care when you "invent" element names and follow these simple rules: • Any name can be used, no words are reserved, but the idea is to make • names descriptive. Names with an underscore separator are nice. • Examples: <first_name>, <last_name>.
Element naming, cont. Avoid "-" and "." in names. For example, if you name something "first-name,“ it could be a mess if your software tries to subtract name from first. Or if you name something "first.name," your software may think that "name" is a property of the object "first." Element names can be as long as you like, but don't exaggerate. Names should be short and simple, like this: <book_title> not like this: <the_title_of_the_book>. XML documents often have a corresponding database, in which fields exist corresponding to elements in the XML document. A good practice is to use the naming rules of your database for the elements in the XML documents. Non-English letters like éòá are perfectly legal in XML element names, but watch out for problems if your software vendor doesn't support them. The ":" should not be used in element names because it is reserved to be used for something called namespaces (more later).
Well-formed vs. Valid • Recall that an XML document is said to be well-formed if it obeys basic semantic and syntactic constraints. • This is different from a valid XML document, which (as we will see in more depth) properly matches a schema.
Rules for Well-Formed XML • An XML document is considered well-formed if it obeys the following rules: • There must be one element that contains all others (root element) • All tags must be balanced • <BOOK>...</BOOK> • <BOOK /> • Tags must be nested properly: • <BOOK> <LINE> This is OK </LINE> </BOOK> • <LINE> <BOOK> This is </LINE> definitely NOT </BOOK> OK • Text is case-sensitive so • <P>This is not ok, even though we do it all the time in HTML!</p>
More Rules for Well-Formed XML • The attributes in a tag must be in quotes • < ITEM CATEGORY=“Home and Garden” Name=“hoe-matic t500”> • Comments are allowed • <!–- They are done just as in HTML… --> • Must begin document with • <?xml version=‘1.0’ ?> • Special characters must be escaped: the most common are • < " ' > & • <formula> x < y+2x </formula> • <cd title="" mmusic"> • An XML document that obeys these rules isWell-Formed
Creating XML • There are many XML editors. • Xeena (on cspp machines) • XMLSpy (simple version freely available) • Like HTML, text editors are frequently the only thing available or the only thing that produces what you want • Test in IE6 or NetScape 7.0
Next Step XML Schema
XML Schema • XML allows any sort of tag you want. • In a given application, you want to fix a vocabulary -- what tags make sense. • Use a Schema to define an XML dialect • MusicXML, VoiceXML, ADXML, CML, etc. • Restrict documents to those tags. • Anyone who has your Schema can validate their document to see if it obeys the rules of the dialect.
Schema determine … • What sort of elements can appear in the document. • What elements MUST appear • Which elements can appear as part of another element • What attributes can appear or must appear • What kind of values can/must be in an attribute.
Rooms XML Schema <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:element name="rooms"><xs:complexType><xs:sequence> <xs:element name="room" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="capacity" type="xs:decimal"/> <xs:element name="equiptmentList"/> <xs:element name="features" minOccurs="0"><xs:complexType> <xs:sequence> <xs:element name="feature" type="xs:string“ maxOccurs="unbounded"/> </xs:sequence> </xs:complexType></xs:element> </xs:sequence> <xs:attribute name="name" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:sequence></xs:complexType></xs:element> </xs:schema>
Bookings XML Schema <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:element name="bookings"> <xs:complexType> <xs:sequence> <xs:element ref="lastUpdated" maxOccurs="1" minOccurs="0"/> <xs:element ref="meetingDate" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="year" type="xs:integer"/> <xs:element name="month" type="xs:string"/> <xs:element name="day" type="xs:integer"/> Note that there are four global types in this document!
Bookings, cont. <xs:element name="meetingDate"> <xs:complexType> <xs:sequence> <xs:element ref="year"/> <xs:element ref="month"/> <xs:element ref="day"/> <xs:element ref="meeting" maxOccurs="unbounded" minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="lastUpdated"> <xs:complexType> <xs:attribute name="date" type="xs:string"/> <xs:attribute name="time" type="xs:string"/> </xs:complexType> </xs:element>
Bookings, cont. <xs:element name="meeting"> <xs:complexType> <xs:sequence> <xs:element name="meetingName" maxOccurs="1" minOccurs="1" type="xs:string"/> <xs:element name="roomName" maxOccurs="1" minOccurs="1" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
An Example Bookings Document <?xml version="1.0" encoding="UTF-8"?> <bookings xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../schemas/Bookings.xsd"> <meetingDate> <year>2003</year> <month>April</month> <day>1</day> <meeting> <meetingName>Democratic Party</meetingName> <roomName>Green Room</roomName> </meeting> <meeting> <meetingName>Republican Party</meetingName> <roomName>Red Room</roomName> </meeting> </meetingDate> </bookings>
XML Schema (Document Type Definition) • A Schema (or the older DTD) is a specification: it specifies the language that you speak. • Check the DTDs for musicxml, adxml, etc. that are available off the course webpage • These give you the basic structure of each of these applications. • Not many schemas available, but much better • As we said before, like a user-defined type in a programming language. Also somewhat analogous to a database schema • says what are the components that can appear • gives default values and restrictions.
What’s in a Schema? • A Schema is an XML document (a DTD is not) • Because it is an XML document, it must have a root element • The root element is <schema> • Within the root element, there can be • Any number and combination of • Inclusions • Imports • Re-definitions • Annotations • Followed by any number and combinations of • Simple and complex data type definitions • Element and attribute definitions • Model group definitions • Annotations
Structure of a Schema <schema> <!– any number of the following --> <include .../> <import> ... </import> <redefine> ... </redefine> <annotation> ... </annotation> <!– any number of following definitions --> <simpleType> ... </simpleType> <complexType> ... </complexType> <element> ... </element> <attribute/> <attributeGroup> ... </attributeGroup> <group> ... </group> <annotation> ... </annotation> </schema>
Elements • What is a Simple Element? • A simple element is an XML element that can contain only text. It cannot contain any other elements or attributes. • Can also add restrictions (facets) to a data type in order to limit its content, and you can require the data to match a defined pattern.
Example Simple Element • The syntax for defining a simple element is: • <xs:element name="xxx" type="yyy"/>where xxx is the name of the element and yyy is the data type of the element. Here are some XML elements: • <lastname>Refsnes</lastname> • <age>34</age> • <dateborn>1968-03-27</dateborn> And here are the corresponding simple element definitions: • <xs:element name="lastname" type="xs:string"/> • <xs:element name="age" type="xs:integer"/> • <xs:element name="dateborn" type="xs:date"/>
Common XML Schema Data Types • XML Schema has a lot of built-in data types. Here is a list of the most common types: • xs:string • xs:decimal • xs:integer • xs:boolean • xs:date • xs:time
Declare Default and Fixed Values for Simple Elements • Simple elements can have a default value OR a fixed value set. • A default value is automatically assigned to the element when no other value is specified. In the following example the default value is "red": • <xs:element name="color" type="xs:string" default="red"/> A fixed value is also automatically assigned to the element. You cannot specify another value. In the following example the fixed value is "red": • <xs:element name="color" type="xs:string" fixed="red"/>
Attributes(Another simple type) • All attributes are declared as simple types. • Only complex elements can have attributes!
What is an Attribute? • Simple elements cannot have attributes. • If an element has attributes, it is considered to be of complex type. • But the attribute itself is always declared as a simple type. • This means that an element with attributes always has a complex type definition.
How to Define an Attribute • The syntax for defining an attribute is: • <xs:attribute name="xxx" type="yyy"/>where xxx is the name of the attribute and yyy is the data type of the attribute. Here are an XML element with an attribute: • <lastname lang="EN">Smith</lastname> And here are a corresponding simple attribute definition: • <xs:attribute name="lang" type="xs:string"/>
Declare Default and Fixed Values for Attributes • Attributes can have a default value OR a fixed value specified. • A default value is automatically assigned to the attribute when no other value is specified. In the following example the default value is "EN": • <xs:attribute name="lang" type="xs:string" default="EN"/> A fixed value is also automatically assigned to the attribute. You cannot specify another value. In the following example the fixed value is "EN": • <xs:attribute name="lang" type="xs:string" fixed="EN"/>
Creating Optional and Required Attributes • All attributes are optional by default. To explicitly specify that the attribute is optional, use the "use" attribute: • <xs:attribute name="lang" type="xs:string" use="optional"/> To make an attribute required: • <xs:attribute name="lang" type="xs:string" use="required"/>
Restrictions • As we will see later, simple types can have ranges put on their values • These are known as restrictions