430 likes | 631 Views
COMPS311F. Li Tak Sing. XML Schemas. XML Schema is a more powerful alternative to DTD to describe XML document structures. The XML Schema language is also referred to as XML Schema Definition (XSD). You have seen the DTD for the employee list. XML schemas for employee-list.
E N D
COMPS311F Li Tak Sing
XML Schemas • XML Schema is a more powerful alternative to DTD to describe XML document structures. The XML Schema language is also referred to as XML Schema Definition (XSD). You have seen the DTD for the employee list.
XML schemas for employee-list <?xml version="1.0" encoding="utf-8" ?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="employee-list"> <xs:complexType> <xs:sequence> <xs:element minOccurs="0" maxOccurs="unbounded" name="employee"> <xs:complexType> <xs:sequence> <xs:element minOccurs="1" maxOccurs="1" name="name" type="xs:string" /> <xs:element minOccurs="1" maxOccurs="1" name="hours">
XML schemas for employee-list <xs:simpleType> <xs:restriction base="xs:decimal"> <xs:minInclusive value="0" /> <xs:maxInclusive value="60" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element minOccurs="1" maxOccurs="1" name="rate" type="xs:decimal" /> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
The root element of an XML Schema is just schema. It must be specified with the namespace of "http://www.w3.org/2001/XMLSchema". By convention, we use the prefix of xs or xsd though technically we could choose other prefixes, for example a meaningless prefix abc. The XML Schema will still be correct if we change all occurrences of xs to abc.
Referring to an XSD file • You have learned that the inclusion of the following DOCTYPE declaration in an XML document will validate it against employee-list.dtd. <!DOCTYPE employee-list SYSTEM "employee-list.dtd"> • For validation of an XML document against an XSD, you will remove the DOCTYPE declaration and modify the start tag of the root element with two added attributes as follows. The xmlns:xsi attribute indicates that this XML document should be validated against an XML Schema. The xsi:noNamespaceSchemaLocation attribute specifies the file name of the schema and its location. Since no path is specified, the schema file is assumed to be in the same directory as the XML file being validated. As can be seen below: <employee-list xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="employee-list.xsd"> • The namespace http://www.w3.org/2001/XMLSchema-instance is now bound to xsi which is the standard prefix for XML Schema Instance.
Simple elements in XSD • A simple XML element is one that does not contain any other elements. Its content can be one of the few dozens of built-in data types. The more popular ones are: string, decimal, integer, boolean, date and time. The name element is a simple element. • For example: <xs:element minOccurs="1" maxOccurs="1" name="name" type="xs:string"/> This makes use of the optional attributes minOccurs and maxOccurs to specify the minimum and maximum number of occurrences in its parent (enclosing) element. Since both attributes are set to one, the name element must occur exactly once in the enclosing employee element.
Value restrictions by range • The hours element is defined with the restriction element available from XML Schema. The base type decimal is restricted to the minimum value of 0 and maximum value of 60 inclusively. This lower bound of zero is obvious because it is impossible to work negative number of hours. The upper bound of sixty could be due to the company policy or labour regulations.
Value restrictions by range <xs:element minOccurs="1" maxOccurs="1" name="hours"> <xs:simpleType> <xs:restriction base="xs:decimal"> <xs:minInclusive value="0" /> <xs:maxInclusive value="60" /> </xs:restriction> </xs:simpleType> </xs:element> The XSD specification is vastly better that the earlier DTD specification which does not consider non-numeric values in hours as errors.
Value restrictions by enumeration • There are other forms of value restrictions. Enumeration allows you to restrict an element to a list of possible values. The fruit element is defined below to have one of three possible values: Apple, Banana and Orange. <xs:element name="fruit"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Apple"/> <xs:enumeration value="Banana"/> <xs:enumeration value="Orange"/> </xs:restriction> </xs:simpleType> </xs:element>
Value restrictions by pattern • More advanced restrictions are specified with the pattern element. The direction element is restricted to one of the directions: north, south, east and west respectively denoted by their first characters n, s, e and w. Even the base type is specified as a string, the direction element can only have one character because there is only one pair of square brackets in the pattern. The square brackets specify the four allowed values: n, s, e and w.
Value restrictions by pattern <xs:element name="direction"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[nsew]"/> </xs:restriction> </xs:simpleType> </xs:element>
Value restrictions by pattern • The following definition allows 2-character strings. The first character is a lower case or upper case character while the second character is a decimal digit. Values allowed include A1, a1, B3 but not BB. <xs:element name="mixedLetterDigit"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[a-zA-Z][0-9]"/> </xs:restriction> </xs:simpleType> </xs:element>
Suppose we want to allow strings like r2d2, H5N1, H1N1 and c9L3k4. The strings can hold non-zero repetitions of the 2-character string pattern that we defined above. We could put the original pattern in brackets and add a trailing + for non-zero repetitions as follows. If we also allow empty strings, we will replace + with *. <xs:pattern value="([a-zA-Z][0-9])+"/>
Whitespace processing restrictions • A whitespace character is one of line feed, tab, space and carriage return. Inside a restriction element, you can have a whiteSpace element. The attribute value of preserve is for not changing any whitespace characters, replace for replacing whitespace characters with space characters, and collapse for replacing consecutive whitespace characters with a single space character. The following is an example. <xs:element name="address"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:whiteSpace value="collapse"/> </xs:restriction> </xs:simpleType> </xs:element>
Restrictions on string length • You can restrict the length of a string to a fixed number. In this example, a password must have 8 characters. <xs:element name="password"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:length value="8"/> </xs:restriction> </xs:simpleType> </xs:element>
Restrictions on string length • You can also restrict the length of a string to a range. In the following example, a password must have at least 6 characters and a maximum of 10 characters. <xs:element name="password"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:minLength value="6"/> <xs:maxLength value="10"/> </xs:restriction> </xs:simpleType> </xs:element>
Complex elements in XSD • An XML element is complex if it contains attributes or other elements.
Sequence • We may have a student element that contains the elements of firstname and lastname as follows: <student> <firstname>Peter</firstname> <lastname>Wong</lastname> </student>
It could be defined as a sequence in a complex type. <xs:element name="student"> <xs:complexType> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>
Note that if the two child elements appear in the XML document in a different order, for example lastname before firstname as follows, the validation will fail. In other words, ordering is crucial in sequences. <student> <lastname>Wong</lastname> <firstname>Peter</firstname> </student>
Attributes • If an element can have attributes, there must be attribute elements defined within its complexType attribute. Suppose we have a student element. <student studentId="1374" /> Its XML Schema definition would look like this. <xs:element name="student"> <xs:complexType> <xs:attribute name="studentId" type="xs:positiveInteger"/> </xs:complexType> </xs:element>
Text with attributes • In XML Schema, we can use the extension element to define attributes for any simpleType or complexType element. Consider the following element to express shoe sizes. <shoesize>9</shoesize> Its XSD definition would look like this. <xs:element name="shoesize" type="xs:integer"> </xs:element>
However shoe sizes are not standardized across countries. For example, the UK shoe size 8 is slightly larger than the US shoe size 8. In a shoe catalogue, it may be necessary to indicate the country for which the size number applies.<shoesize country="US">9</shoesize> • We can define the XML Schema with the extension element to hold the integer representing the shoe size. Within the extension element, we have the attribute element for the country. To keep our schema simple, we chose not to represent half size.
<xs:element name="shoesize"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:integer"> <xs:attribute name="country" type="xs:string" /> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element>
Texts mixed with other elements • In its simplest form, an element may contain just texts as follows. <sms> Your buy order of 400 shares of HSBC has been executed on 2009-05-04. </sms> • Perhaps certain parts of the texts have special meaning. You can turn special parts into elements to facilitate processing. The following element sms has been enhanced by making the quantity of shares, stock name and date into elements. <sms>Your buy order of <qty>400</qty> shares of <stockName>HSBC</stockName> has been executed on <execDate>2009-05-04</execDate>. </sms>
You can define the corresponding complexType element by setting its mixed attribute to true. <xs:element name="sms"> <xs:complexType mixed="true"> <xs:sequence> <xs:element name="qty" type="xs:positiveInteger" /> <xs:element name="stockName" type="xs:string" /> <xs:element name="execDate" type="xs:date" /> </xs:sequence> </xs:complexType> </xs:element>
Unspecific order in a complex type • Suppose we don’t care if the first name or the last name appears first in a student element. <student> <firstname>Peter</firstname> <lastname>Wong</lastname> </student> <student> <lastname>Wong</lastname> <firstname>Peter</firstname> </student>
In that case, in place of the sequence element, we can use the all element to accept any order for firstname and lastname. <xs:element name="person"> <xs:complexType> <xs:all> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:all> </xs:complexType> </xs:element>
Choice in a complex type • Suppose you only want to contact customers by their email addresses or phone numbers but not both. The following are the contact information for two different customers.<customer> <email>johndoe@coolmail.com</email> </customer> <customer> <tel>709394</tel> </customer>
The two alternative customer elements are expressed with the choice element as follows. <xsd:element name="customer"> <xsd:complexType> <xsd:choice> <xsd:element name="email" type="xsd:string"/> <xsd:element name="tel" type="xsd:string"/> </xsd:choice> </xsd:complexType> </xsd:element>
Named types for reuse • Suppose you want to have an element to capture someone’s favourite fruit, as follows: <myfruit>Apple</myfruit> This can also be enforced as follows which also allows you to specific which fruit, here it would be banana and orange. <xs:element name="myfruit"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Apple"/> <xs:enumeration value="Banana"/> <xs:enumeration value="Orange"/> </xs:restriction> </xs:simpleType> </xs:element>
If we have elements other than myfruit that make use of this enumeration of Apple, Banana and Orange, we can define a type called fruitType. This will result in a more readable XML Schema without the risk of different usages of the enumeration to get out of step when for example Coconut is added to the existing fruits. • Changing the original element definition to use a named type is trivial. First, you add the type attribute to the element with the new type name. <xs:element name="fruit" type="fruitType"/>
Second, you add the name attribute to the simpleType or complexType. <xs:simpleType name="fruitType"> <xs:restriction base="xs:string"> <xs:enumeration value="Apple"/> <xs:enumeration value="Banana"/> <xs:enumeration value="Orange"/> </xs:restriction> </xs:simpleType>
However, where can you place this named type definition in the XSD file? You can put it right below its first use. Alternatively, you can group all the named type definitions together and place them right after xs:schema’s start tag or just before its end tag. Good use of named types can make an XSD file more readable and maintainable.
elementFormDefault and attributeFormDefault • In the sample XML Schemas we have presented so far, some names have prefixes and some don’t. You are probably confused when you should use a prefix. In the schema element, you can use two attributes to control this. Attribute elementFormDefault controls whether prefixes are required for element names. Likewise attributeFormDefault controls whether prefixes are required for attribute names. We use the value qualified for prefixes required and the value unqualified for prefixes not required. The sample code that we have been using has prefixes for elements but not for attributes as indicated below.
<xs:schema xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema" >
The default values of both are unqualified. Therefore removing attributeFormDefault as follows will have the same meaning. <xs:schema xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema" >
In case the two attributes are set to qualified, we can still avoid the use of prefixes with a default namespace. Consider the following XML schema saved in the file employee.xsd.
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.ouhk.edu.hk/employeeNS" xmlns="http://www.ouhk.edu.hk/employeeNS"> <xsd:element name="employee"> <xsd:complexType> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="email" type="xsd:string"/> <xsd:element name="hireDate" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema>
The following XML document can be successfully validated against it. <?xml version="1.0"?> <em:employee xmlns:em="http://www.ouhk.edu.hk/employeeNS" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ouhk.edu.hk/employeeNS employee.xsd"> <name>Oliver Au</name> <email>oau@ouhk.edu.hk</email> <hireDate>2009-09-01</hireDate> </em:employee>
The xsi:schemaLocation attribute specifies two URI references separated by white space. The first value http://www.ouhk.edu.hk/employeeNS here is a namespace. The second value employee.xsd gives a hint to the location of the schema document. • If we had specified elementFormDefault attribute as qualified. <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.ouhk.edu.hk/employeeNS" elementFormDefault="qualified" xmlns="http://www.ouhk.edu.hk/employeeNS">
The validation will fail. The XML document must be modified as follows. Note the required prefixes added to the name, email and hireDate elements. <?xml version="1.0"?> <em:employee xmlns:em="http://www.ouhk.edu.hk/employeeNS" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ouhk.edu.hk/employeeNS employee.xsd"> <em:name>Oliver Au</em:name> <em:email>oau@ouhk.edu.hk</em:email> <em:hireDate>2009-09-01</em:hireDate> </em:employee>