270 likes | 455 Views
XML. DTD & XML Schema Monica Farrow G30 email : monica@macs.hw.ac.uk. A Complete XML Document. <? XML version ="1.0" encoding="UTF-8"> <!DOCTYPE addresses SYSTEM "http://www.addbook.com/addresses.dtd"> <addresses> <person ssno= “123 4589” > <name> Lisa Simpson </name>
E N D
XML DTD & XML Schema Monica Farrow G30 email : monica@macs.hw.ac.uk
A Complete XML Document <?XML version ="1.0" encoding="UTF-8"> <!DOCTYPE addresses SYSTEM "http://www.addbook.com/addresses.dtd"> <addresses> <person ssno= “123 4589”> <name>Lisa Simpson</name> <tel> 0131-828 1234 </tel> <tel> 078-4701 7775 </tel> <email> lisa@macs.hw.ac.uk </email> </person> </addresses> Required Optional Link to document defining the XML elements
Defining the structure of an XML file • We can check if an XML file is well-formed • by looking at it, maybe • By loading it into a browser • If well-formed, it will be displayed • However, how can we check that the well-formed file contains the correct elements in the correct quantities? • We need to write a specification for the XML file
Defining the structure of an XML file • There are 2 main alternatives • Document Type Definitions • Original and simple • XML Schema • More versatile and complex • We will look at both • Concentrating on XML Schema
Exactlyonename An attribute Up to 4 tel nos Optionally one email One or more persons Example: An Address Book <person ssn = “4444”> <name> Homer Simpson </name> <tel> 2543 </tel> <tel> 2544 </tel> <email> homer@math.springfield.edu </email> </person>
DTD - Specifying the Structure • In a DTD, we can specify the permitted content for each element, using regular expressions • Describes the pattern • For a person element, the regular expression is • name, title?, tel*,email+
What’s in a person Element? • This means • name= there must be a name element • title? = there is an optional title element (i.e., 0 or 1 title elements) • name, title?= the name element is followed by an optional title element • tel* = there are 0 or more telelements • email+= there are 1 or more email elements
Regular expressions DTD For the Address Book <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE addressbook [ <!ELEMENT addressbook (person*)> <!ELEMENT person (name, title?, tel*, email+)> <!ELEMENT name (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT tel (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ATTLIST person ssn CDATA REQUIRED> ]> PCDATA means parsed character data
Attributes in a DTD • XML elements can have attributes. • General Syntax for DTD: <!ATTLIST element-name attribute-name1 type1 default-value1 …. attribute-namen typen default-valuen> • Example: <!ATTLIST person ssn CDATA REQUIRED> • CDATA means Character data • Default value could be REQUIRED or IMPLIED (meaning optional)
Connecting a Document with its DTD • A DTD can be internal (part of the document file) <?xml version="1.0"?> <!DOCTYPE db [<!ELEMENT ...> … ]> <db> ... </db> • Or external (the DTD and the document are in different files) • A DTD from the local file system: <!DOCTYPE db SYSTEM "schema.dtd"> • A DTD from a remote file system: <!DOCTYPE db SYSTEM "http://www.schemaauthority.com/schema.dtd">
Valid Documents • A document with a DTD is validif it conforms to the DTD, i.e., • the document conforms to the regular-expression grammar, • types of attributes are correct, and • constraints on references are satisfied
DTDs Problems • DTDs are rather weak specifications by DB & programming-language standards • Some limitations: • Only one base type – PCDATA • Also no constraints, e.g range of values, frequency of occurrence • Not easily parsed (since they are not XML) • Not easy to express that element a has exactly the children c, d, e in any order
XML Schema • DTDs are now being superceded by XML schemas. • They provide the following features • XML Syntax • So can be parsed, validated with standard XML tools • Data types other than #PCDATA • There are built in types such as integer, float, boolean, string and many others • Greater control over permitted constructs • Can specify maximum and minimum occurrences • Can use regular expressions to set patterns to be matched • Support for modularity and inheritance
XML Schema continued • XML Schema are more precise and therefore more complicated than DTDs • They were designed to replace DTDs but DTDs are very well established, and simpler • http://www.w3schools.com/schema
Schema types • There are some basic built-in types such as xs:string, xs:decimal, xs:integer, xs:ID • Each element is composed of either simple types or complex types. A complex type is often a sequence of elements • The content of the type can be declared as shown in the following example. A type can also be declared, named and referred to. • Notice the use of minOccurs and maxOccurs. Their default is 1.
standard stuff Top-level element Namespace Simple Schema Example <?xml version="1.0" ?> <xs:schema xmlns:xs= "http://www.w3.org/2001/XMLSchema"> <xs:element name="people"> <xs:complexType> <xs:sequence> <xs:element name="person" maxOccurs = "unbounded"> details of the person element -pto </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
Namespaces • You’ll see namespaces when using XML schemas and stylesheets. • There is a namespace associated with the tags used in each that lets them be used unambiguously. • e.g. a schema element, a chemical element • A namespace is identified by • a short prefix e.g. xs • A unique URL
Namespace declaration • So at the start of a document we must specify what namespaces we are using. • In the schema example, we are using the XML schema namespace with the xs prefix • We declare this namespace in an attribute in the top-level element<xs:schema xmlns:xs= "http://www.w3.org/2001/XMLSchema"> • We then use the xs prefix in all the XML Schema elements e.g. complexType, sequence, element etc
Schema Example Continued Details of the person element <xs:element name="person" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name ="name" type="xs:string"/> <xs:element name = "tel" type="xs:string" /> <xs:element name = "email" type="xs:string" minOccurs="0" maxOccurs="1"/> </xs:sequence> <xs:attribute name= "sssNo" type="xs:integer" use="required"/> </xs:complexType> </xs:element> Empty element A person is a complex type which is a sequence of elements and an attribute
Exercise 1 • Create a schema for the holiday house example. Each home has an id, a name and a location • Additionally, each home has between one and three sets of contact details. Contact details consist of a name and a phone number, and optionally an email address and website.
Restrictions on elements • You can also restrict the values of the data in • a range • <xs:minInclusive value="0"/> <xs:maxInclusive value="120"/> • an enumerated list • <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> • a pattern • <xs:pattern value="([a-z])*"/> • Means 0 or more lowercase alphabetic chars
Declaring your own types • Named types can be used for elements or attributes. Here’s an example which specifies restrictions on the attribute • A named type is declared <xs:simpleType name = "ssstype"> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> </xs:restriction> </xs:simpleType> • And used as the attribute type • <xs:attribute name= "sssNo" type="ssstype" use="required"/>
More complex Schemas • The previous example shows a simple schema. • It is also possible to make the schema easier to maintain • by declaring all the simple elements first and then referring to them in the body of the document • By naming the declaration of simple and complex types, which could then be used later in the document, and more than once if necessary • See http://www.w3schools.com/Schema/schema_example.asp if you are interested
Referring to a schema • Save your schema in a file with the extension xsd. • Linking schema definition with a document is done using a special attribute of the root node of the document: <people xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation=“people.xsd">
Validating • Validators • http://www.w3.org/2001/03/webdata/xsv • I don’t seem to be able to revalidate with the same filenames • http://tools.decisionsoft.com/schemaValidate/ • No problems, nicer layout • Others also on the web
XML: Summary • XML lets you choose application specific element names and define special purpose document types. • Need document type definition or schema to define allowed markup. • What can we do with our valid document? – next 2 lectures
Exercises 2 • Alter the schema given in the lecture notes so that there must be between 1 and 4 tel numbers which must be in the range 1000 – 9999 • Create a simple type for tel