470 likes | 486 Views
This lecture provides an overview of XML Schemas, including elements, types, and regular expressions. It covers the expressive power of XML Schema and provides resources for further exploration.
E N D
Web Data Management XML Schema
In this lecture • XML Schemas • Elements v. Types • Regular expressions • Expressive power Resources W3C Draft: www.w3.org/TR/2001/REC-xmlschema-1-20010502
XML Schemas • http://www.w3.org/TR/xmlschema-1/10/2000 • generalizes DTDs • uses XML syntax • two documents: structure and datatypes • http://www.w3.org/TR/xmlschema-1 • http://www.w3.org/TR/xmlschema-2 • XML-Schema is very complex • often criticized • some alternative proposals
BookCatalogue.dtd <!ELEMENT BookCatalogue (Book)*> <!ELEMENT Book (Title, Author, Date, ISBN, Publisher)> <!ELEMENT Title (#PCDATA)> <!ELEMENT Author (#PCDATA)> <!ELEMENT Date (#PCDATA)> <!ELEMENT ISBN (#PCDATA)> <!ELEMENT Publisher (#PCDATA)>
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" targetNamespace="http://www.publishing.org" xmlns="http://www.publishing.org" elementFormDefault="qualified"> <xsd:elementname="BookCatalogue"> <xsd:complexType> <xsd:sequence> <xsd:elementref="Book" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:elementname="Book"> <xsd:complexType> <xsd:sequence> <xsd:elementref="Title" minOccurs="1" maxOccurs="1"/> <xsd:elementref="Author" minOccurs="1" maxOccurs="1"/> <xsd:elementref="Date" minOccurs="1" maxOccurs="1"/> <xsd:elementref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:elementref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:elementname="Title" type="xsd:string"/> <xsd:elementname="Author" type="xsd:string"/> <xsd:elementname="Date" type="xsd:string"/> <xsd:elementname="ISBN" type="xsd:string"/> <xsd:elementname="Publisher" type="xsd:string"/> </xsd:schema> (explanations on succeeding pages) BookCatalogue.xsd xsd = Xml-Schema Definition
<?xml version="1.0"?> <xsd:schemaxmlns:xsd="http://www.w3.org/2000/10/XMLSchema" targetNamespace="http://www.publishing.org" xmlns="http://www.publishing.org" elementFormDefault="qualified"> <xsd:elementname="BookCatalogue"> <xsd:complexType> <xsd:sequence> <xsd:elementref="Book" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:elementname="Book"> <xsd:complexType> <xsd:sequence> <xsd:elementref="Title" minOccurs="1" maxOccurs="1"/> <xsd:elementref="Author" minOccurs="1" maxOccurs="1"/> <xsd:elementref="Date" minOccurs="1" maxOccurs="1"/> <xsd:elementref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:elementref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:elementname="Title" type="xsd:string"/> <xsd:elementname="Author" type="xsd:string"/> <xsd:elementname="Date" type="xsd:string"/> <xsd:elementname="ISBN" type="xsd:string"/> <xsd:elementname="Publisher" type="xsd:string"/> </xsd:schema> All XML Schemas have "schema" as the root element. <!ELEMENT BookCatalogue (Book)*> <!ELEMENT Book (Title, Author, Date, ISBN, Publisher)> <!ELEMENT Title (#PCDATA)> <!ELEMENT Author (#PCDATA)> <!ELEMENT Date (#PCDATA)> <!ELEMENT ISBN (#PCDATA)> <!ELEMENT Publisher (#PCDATA)>
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" targetNamespace="http://www.publishing.org" xmlns="http://www.publishing.org" elementFormDefault="qualified"> <xsd:elementname="BookCatalogue"> <xsd:complexType> <xsd:sequence> <xsd:elementref="Book" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:elementname="Book"> <xsd:complexType> <xsd:sequence> <xsd:elementref="Title" minOccurs="1" maxOccurs="1"/> <xsd:elementref="Author" minOccurs="1" maxOccurs="1"/> <xsd:elementref="Date" minOccurs="1" maxOccurs="1"/> <xsd:elementref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:elementref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:elementname="Title" type="xsd:string"/> <xsd:elementname="Author" type="xsd:string"/> <xsd:elementname="Date" type="xsd:string"/> <xsd:elementname="ISBN" type="xsd:string"/> <xsd:elementname="Publisher" type="xsd:string"/> </xsd:schema> The elements that are used to create an XML Schema come from the XMLSchema namespace
XMLSchema Namespace http://www.w3.org/2000/10/XMLSchema complexType element sequence schema boolean string integer
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" targetNamespace="http://www.publishing.org" xmlns="http://www.publishing.org" elementFormDefault="qualified"> <xsd:elementname="BookCatalogue"> <xsd:complexType> <xsd:sequence> <xsd:elementref="Book" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:elementname="Book"> <xsd:complexType> <xsd:sequence> <xsd:elementref="Title" minOccurs="1" maxOccurs="1"/> <xsd:elementref="Author" minOccurs="1" maxOccurs="1"/> <xsd:elementref="Date" minOccurs="1" maxOccurs="1"/> <xsd:elementref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:elementref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:elementname="Title" type="xsd:string"/> <xsd:elementname="Author" type="xsd:string"/> <xsd:elementname="Date" type="xsd:string"/> <xsd:elementname="ISBN" type="xsd:string"/> <xsd:elementname="Publisher" type="xsd:string"/> </xsd:schema> Says that the elements declared in this schema (BookCatalogue, Book, Title, Author, Date, ISBN, Publisher) are to go in this namespace
Publishing Namespace (targetNamespace) http://www.publishing.org (targetNamespace) BookCatalogue Author Book Title ISBN Publisher Date
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" targetNamespace="http://www.publishing.org" xmlns="http://www.publishing.org" elementFormDefault="qualified"> <xsd:elementname="BookCatalogue"> <xsd:complexType> <xsd:sequence> <xsd:elementref="Book" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:elementname="Book"> <xsd:complexType> <xsd:sequence> <xsd:elementref="Title" minOccurs="1" maxOccurs="1"/> <xsd:elementref="Author" minOccurs="1" maxOccurs="1"/> <xsd:elementref="Date" minOccurs="1" maxOccurs="1"/> <xsd:elementref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:elementref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:elementname="Title" type="xsd:string"/> <xsd:elementname="Author" type="xsd:string"/> <xsd:elementname="Date" type="xsd:string"/> <xsd:elementname="ISBN" type="xsd:string"/> <xsd:elementname="Publisher" type="xsd:string"/> </xsd:schema> The default namespace is http://www.publishing.org which is the targetNamespace! This is referencing a Book element declaration. The Book in what namespace? Since there is no namespace qualifier it is referencing the Book element in the default namespace, which is the targetNamespace!
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" targetNamespace="http://www.publishing.org" xmlns="http://www.publishing.org" elementFormDefault="qualified"> <xsd:elementname="BookCatalogue"> <xsd:complexType> <xsd:sequence> <xsd:elementref="Book" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:elementname="Book"> <xsd:complexType> <xsd:sequence> <xsd:elementref="Title" minOccurs="1" maxOccurs="1"/> <xsd:elementref="Author" minOccurs="1" maxOccurs="1"/> <xsd:elementref="Date" minOccurs="1" maxOccurs="1"/> <xsd:elementref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:elementref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:elementname="Title" type="xsd:string"/> <xsd:elementname="Author" type="xsd:string"/> <xsd:elementname="Date" type="xsd:string"/> <xsd:elementname="ISBN" type="xsd:string"/> <xsd:elementname="Publisher" type="xsd:string"/> </xsd:schema> This is a directive to instance documents which use this schema: Any elements used by the instance document which were declared by this schema must be namespace qualified by the namespace specified by targetNamespace.
Referencing a schema in an XML instance document <?xmlversion="1.0"?> <BookCataloguexmlns ="http://www.publishing.org" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://www.publishing.org BookCatalogue.xsd"> <Book> <Title>My Life and Times</Title> <Author>Paul McCartney</Author> <Date>July, 1998</Date> <ISBN>94303-12021-43892</ISBN> <Publisher>McMillin Publishing</Publisher> </Book> ... </BookCatalogue> 1. First, using a default namespace declaration, tell the schema-validator that all of the elements used in this instance document come from the publishing namespace. 2. Second, with schemaLocation tell the schema-validator that the http://www.publishing.org namespace is defined in BookCatalogue.xsd. 3. Third, tell the schema-validator that schemaLocation attribute we are using is the one in the schema instance namespace.
Referencing a schema in an XML instance document targetNamespace="A" schemaLocation="A BookCatalogue.xsd" BookCatalogue.xsd BookCatalogue.xml - uses elements from namespace A - defines elements in namespace A
Note multiple levels of checking BookCatalogue.xml BookCatalogue.xsd XMLSchema.xsd (schema-for-schemas) Validate that the xml document conforms to the rules described in BookCatalogue.xsd Validate that BookCatalogue.xsd is a valid schema document, i.e., it conforms to the rules described in the schema-for-schemas
Default Value for minOccurs and maxOccurs • The default value for minOccurs is "1" • The default value for maxOccurs is "1" <element ref="Title" minOccurs="1" maxOccurs="1"/> Equivalent! <element ref="Title"/>
Qualify XMLSchema, Default targetNamespace • In the last example, we explicitly qualified all elements from the XML Schema namespace. The targetNamespace was the default namespace. http://www.publishing.org http://www.w3.org/2000/10/XMLSchema complexType BookCatalogue element Author annotation Book documentation Title ISBN Publisher sequence Date schema
Default XMLSchema, Qualify targetNamespace • Alternatively (equivalently), we can design our schema so that XMLSchema is the default namespace. http://www.publishing.org http://www.w3.org/2000/10/XMLSchema complexType BookCatalogue element Author annotation Book documentation Title ISBN Publisher sequence Date schema
<?xmlversion="1.0"?> <schemaxmlns="http://www.w3.org/2000/10/XMLSchema" targetNamespace="http://www.publishing.org" xmlns:pub="http://www.publishing.org" elementFormDefault="qualified"> <elementname="BookCatalogue"> <complexType> <sequence> <elementref="pub:Book" minOccurs="0" maxOccurs="unbounded"/> </sequence> </complexType> </element> <elementname="Book"> <complexType> <sequence> <elementref="pub:Title"/> <elementref="pub:Author"/> <elementref="pub:Date"/> <elementref="pub:ISBN"/> <elementref="pub:Publisher"/> </sequence> </complexType> </element> <elementname="Title" type="string"/> <elementname="Author" type="string"/> <elementname="Date" type="string"/> <elementname="ISBN" type="string"/> <elementname="Publisher" type="string"/> </schema> Here we are referencing a Book element. Where is that Book element defined? In what namespace? The pub: prefix indicates what namespace this element is in. pub: has been set to be the same as the targetNamespace.
"pub" References the targetNamespace http://www.publishing.org (targetNamespace) http://www.w3.org/2000/10/XMLSchema complexType BookCatalogue element Author annotation Book documentation Title ISBN Publisher sequence Date schema pub
Alternate Schema • In the previous examples we declared an element and then we ref’ed that element declaration. Instead, we can inline the element declarations. • On the following slide is an alternate (equivalent) way of representing the schema shown previously, using inlined element declarations.
Elements Declared Inline <?xmlversion="1.0"?> <xsd:schemaxmlns:xsd="http://www.w3.org/2000/10/XMLSchema" targetNamespace="http://www.publishing.org" xmlns="http://www.publishing.org" elementFormDefault="qualified"> <xsd:elementname="BookCatalogue"> <xsd:complexType> <xsd:sequence> <xsd:elementname="Book" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:elementname="Title" type="xsd:string"/> <xsd:elementname="Author" type="xsd:string"/> <xsd:elementname="Date" type="xsd:string"/> <xsd:elementname="ISBN" type="xsd:string"/> <xsd:elementname="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema>
<?xmlversion="1.0"?> <xsd:schemaxmlns:xsd="http://www.w3.org/2000/10/XMLSchema" targetNamespace="http://www.publishing.org" xmlns="http://www.publishing.org" elementFormDefault="qualified"> <xsd:elementname="BookCatalogue"> <xsd:complexType> <xsd:sequence> <xsd:elementname="Book" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:elementname="Title" type="xsd:string"/> <xsd:elementname="Author" type="xsd:string"/> <xsd:elementname="Date" type="xsd:string"/> <xsd:elementname="ISBN" type="xsd:string"/> <xsd:elementname="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> Anonymous types (no name)
Named Types • The following slide shows an alternate (equivalent) schema which uses a named type.
Named Types <?xmlversion="1.0"?> <xsd:schemaxmlns:xsd="http://www.w3.org/2000/10/XMLSchema" targetNamespace="http://www.publishing.org" xmlns="http://www.publishing.org" elementFormDefault="qualified"> <xsd:elementname="BookCatalogue"> <xsd:complexType> <xsd:sequence> <xsd:elementname="Book" type="CardCatalogueEntry" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:complexTypename="CardCatalogueEntry"> <xsd:sequence> <xsd:elementname="Title" type="xsd:string"/> <xsd:elementname="Author" type="xsd:string"/> <xsd:elementname="Date" type="xsd:string"/> <xsd:elementname="ISBN" type="xsd:string"/> <xsd:elementname="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:schema> Named type
Please note that <xsd:elementname="A" type="foo"/> <xsd:complexTypename="foo"> <xsd:sequence> <xsd:elementname="B" …/> <xsd:elementname="C" …/> </xsd:sequence> </xsd:complexType> is equivalent to <xsd:elementname="A"> <xsd:complexType> <xsd:sequence> <xsd:elementname="B" …/> <xsd:elementname="C" …/> </xsd:sequence> </xsd:complexType> </xsd:element> Element A references the complexType foo. Element A has the complexType definition inlined in the element declaration.
type Attribute or complexType Child Element, but not Both! • An element declaration can have a type attribute, or a complexType child element, but it cannot have both a type attribute and a complexType child element. <xsd:element name="A" type="foo"> <xsd:complexType> … </xsd:complexType> </xsd:element>
Summary of Declaring Elements (two ways to do it) 1 <xsd:element name="name" type="type" minOccurs="int" maxOccurs="int"/> A simple type (e.g., xsd:string) or the name of a complexType A nonnegative integer A nonnegative integer or "unbounded" Note: minOccurs and maxOccurs can only be used in nested (local) element declarations. <xsd:element name="name" minOccurs="int" maxOccurs="int"> <xsd:complexType> … </xsd:complexType> </xsd:element> 2
XML Schemas <xsd:elementname=“paper” type=“papertype”/> <xsd:complexTypename=“papertype”> <xsd:sequence> <xsd:elementname=“title” type=“xsd:string”/> <xsd:elementname=“author” minOccurs=“0”/> <xsd:elementname=“year”/> <xsd: choice> < xsd:elementname=“journal”/> <xsd:elementname=“conference”/> </xsd:choice> </xsd:sequence> </xsd:element> DTD: <!ELEMENT paper (title,author*,year, (journal|conference))>
Elements v.s. Types in XML Schema <xsd:elementname=“person”> <xsd:complexType> <xsd:sequence> <xsd:elementname=“name” type=“xsd:string”/> <xsd:elementname=“address”type=“xsd:string”/> </xsd:sequence> </xsd:complexType></xsd:element> <xsd:elementname=“person”type=“ttt”/><xsd:complexTypename=“ttt”> <xsd:sequence> <xsd:elementname=“name” type=“xsd:string”/> <xsd:elementname=“address”type=“xsd:string”/> </xsd:sequence></xsd:complexType> DTD: <!ELEMENT person (name,address)>
Elements v.s. Types in XML Schema • Types: • Simple types (integers, strings, ...) • Complex types (regular expressions, like in DTDs) • Element-type-element alternation: • Root element has a complex type • That type is a regular expression of elements • Those elements have their complex types... • ... • On the leaves we have simple types
Local and Global Types in XML Schema • Local type: <xsd:elementname=“person”> [define locally the person’s type] </xsd:element> • Global type: <xsd:elementname=“person” type=“ttt”/> <xsd:complexTypename=“ttt”> [define here the type ttt] </xsd:complexType> Global types: can be reused in other elements
Local v.s. Global Elements inXML Schema • Local element: <xsd:complexTypename=“ttt”> <xsd:sequence> <xsd:elementname=“address” type=“...”/>... </xsd:sequence> </xsd:complexType> • Global element: <xsd:elementname=“address” type=“...”/> <xsd:complexTypename=“ttt”> <xsd:sequence><xsd:elementref=“address”/> ... </xsd:sequence> </xsd:complexType> Global elements: like in DTDs
Regular Expressions in XML Schema Recall the element-type-element alternation: <xsd:complexTypename=“....”> [regular expression on elements] </xsd:complexType> Regular expressions: • <xsd:sequence> A B C </...> = A B C • <xsd:choice> A B C </...> = A | B | C • <xsd:group> A B C </...> = (A B C) • <xsd:...minOccurs=“0”maxOccurs=“unbounded”> ..</...> = (...)* • <xsd:...minOccurs=“0”maxOccurs=“1”> ..</...> = (...)?
Local Names in XML-Schema <xsd:elementname=“person”> <xsd:complexType> . . . . . <xsd:elementname=“name”> <xsd:complexType> <xsd:sequence> <xsd:elementname=“firstname” type=“xsd:string”/> <xsd:elementname=“lastname” type=“xsd:string”/> </xsd:sequence> </xsd:element> . . . . </xsd:complexType></xsd:element> <xsd:elementname=“product”> <xsd:complexType> . . . . . <xsd:elementname=“name” type=“xsd:string”/> </xsd:complexType></xsd:element> name has different meanings in person and in product
Subtle Use of Local Names <xsd:complexTypename=“oneB”> <xsd:choice> <xsd:elementname=“B” type=“xsd:string”/> <xsd:sequence> <xsd:elementname=“A” type=“onlyAs”/> <xsd:elementname=“A” type=“oneB”/> </xsd:sequence> <xsd:sequence> <xsd:elementname=“A” type=“oneB”/> <xsd:element name=“A” type=“onlyAs”/> </xsd:sequence> </xsd:choice></xsd:complexType> <xsd:elementname=“A” type=“oneB”/> <xsd:complexTypename=“onlyAs”> <xsd:choice> <xsd:sequence> <xsd:elementname=“A” type=“onlyAs”/> <xsd:elementname=“A” type=“onlyAs”/> </xsd:sequence> <xsd:elementname=“A” type=“xsd:string”/> </xsd:choice></xsd:complexType> Arbitrary deep binary tree with A elements, and a single B element
Attributes in XML Schema <xsd:elementname=“paper” type=“papertype”/> <xsd:complexTypename=“papertype”> <xsd:sequence> <xsd:elementname=“title” type=“xsd:string”/> . . . . . . </xsd:sequence> <xsd:attributename=“language" type="xsd:NMTOKEN" fixed=“English"/> </xsd:complexType> • Attributes are associated to the type, not to the element • Only to complex types; more trouble if we want to add attributes to simple types.
“Mixed” Content, “Any” Type <xsd:complexTypemixed="true"> . . . . • Better than in DTDs: can still enforce the type, but now may have text between any elements • Means anything is permitted there <xsd:elementname="anything" type="xsd:anyType"/> . . . .
“All” Group <xsd:complexTypename="PurchaseOrderType"> <xsd:all> <xsd:elementname="shipTo" type="USAddress"/> <xsd:elementname="billTo" type="USAddress"/> <xsd:elementref="comment" minOccurs="0"/> <xsd:elementname="items" type="Items"/> </xsd:all> <xsd:attributename="orderDate" type="xsd:date"/> </xsd:complexType> • A restricted form of & in SGML • Restrictions: • Only at top level • Has only elements • Each element occurs at most once • E.g. “comment” occurs 0 or 1 times
Derived Types by Extensions <complexTypename="Address"> <sequence> <elementname="street" type="string"/> <elementname="city" type="string"/> </sequence> </complexType> <complexTypename="USAddress"> <complexContent> <extensionbase="ipo:Address"> <sequence> <elementname="state" type="ipo:USState"/> <elementname="zip" type="positiveInteger"/> </sequence> </extension> </complexContent> </complexType> Corresponds to inheritance
Derived Types by Restrictions • (*): may restrict cardinalities, e.g. (0,infty) to (1,1); may restrict choices; other restrictions… <complexContent> <restrictionbase="ipo:Items“> … [rewrite the entire content, with restrictions]... </restriction> </complexContent> Corresponds to set inclusion
String Token Byte unsignedByte Integer positiveInteger Int (larger than integer) unsignedInt Long Short ... Time dateTime Duration Date ID IDREF IDREFS Simple Types
Examples length minLength maxLength pattern enumeration whiteSpace maxInclusive maxExclusive minInclusive minExclusive totalDigits fractionDigits Facets of Simple Types • Facets = additional properties restricting a simple type • 15 facets defined by XML Schema
Facets of Simple Types • Can further restrict a simple type by changing some facets • Restriction = subset
Not so Simple Types • List types: • Union types • Restriction types <xsd:simpleTypename="listOfMyIntType"> <xsd:listitemType="myInteger"/> </xsd:simpleType> <listOfMyInt>20003 15037 95977 95945</listOfMyInt>
Summary of XML Schema • Formal Expressive Power: • Can express precisely the regular tree languages (over unranked trees) • Lots of other stuff • Some form of inheritance • A “null” value • Large collection of data types
Summary of Schemas • in SS data: • graph theoretic • data and schema are decoupled • used in data processing • in XML • from grammar to object-oriented • schema wired with the data • emphasis on semantics for exchange