680 likes | 761 Views
Schemas. Deitel XML chapter 7 Peltzer, XML Language Mechanics and Applications (Addison Wesley) Chapter 4 – has much more on W3C schemas. Schemas vs DTD. DTDs are inherited from SGML (Standard Generalized Markup Language).
E N D
Schemas Deitel XML chapter 7 Peltzer, XML Language Mechanics and Applications (Addison Wesley) Chapter 4 – has much more on W3C schemas
Schemas vs DTD • DTDs are inherited from SGML (Standard Generalized Markup Language). • They can’t be manipulated (searched or transformed into another format, like HTML) the way XML documents can, because DTDs are not XML. • Schemas are XML. XML documents conforming to schema require validating parsers like DTDs do. • Schemas themselves conform to DTDs which are bundled with the parser. • Repositories of existing DTDs and Schema are available for download free.
Schemas • DTDs define document structure, not content, so although <value>5</value> contains legal PCDATA, it can’t be checked to insure the content is numeric. • Markup like <value>Hello Bob</value> would also be valid PCDATA. The application using this XML document would itself have to test if value were numeric and take appropriate action if the test failed. • Schemas are XML documents conforming to DTDs and must be validated to be processed. Schema do not use EBNF but use XML syntax. • Schema can be manipulated (eg., searched, or elements added or removed) as with any XML document. • W3C XML Schema are not covered (much) in Deitel’s book, only MS Schema. Many W3C examples of schema are in Peltzer, XML Language Mechanics and Applications (Addison Wesley)
Schemas • Schemas view xml docs as a collection of datatypes • DTDs view xml docs as a single entitity • W3C 2001 schema specification lists 44 datatypes. • There are 19 primitive types and 25 built-in, derived types. • User derived and built-in types are both defined using the simpleType definitions, which restrict the type of data that can appear as content for an attribute value or text-only element. • A schema datatype has 3 components: a value space, a lexical space, a set of facets.
Examples • In <book> learning XLM </book> the value space and lexical space are both string. (the ‘value’ is the literal string “learning XML” and lexical space is the type string). • In <number>123</number> the value space is a set of literals (digits), the lexical representation might be a specified number of digit chars.
A simple schema for Author (saved as .xsd) <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="Author"> <xs:annotation> <xs:documentation> </xs:documentation> </xs:annotation> <xs:complexType> <xs:sequence> <xs:element name="Name" type="xs:string"/> <xs:element name="Address" type="xs:string"/> <xs:element name="City" type="xs:string"/> <xs:element name="State" type="xs:string"/> <xs:element name="Zip" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
And an instance of the Author schema <?xml version="1.0" encoding="UTF-8"?> <Author> <Name>Dwight Peltzer</Name> <Address>Po Box 555</Address> <City>Oyster Bay</City> <State>NY</State> <Zip>11771</Zip> </Author>
Elements make up XML documents • in MS Schema, ElementType defines an element. It contains attributes describing content, data type, name and so on for this element. • MSXML (microsoft’s validating parser) is part of IE5, and is needed to build MS Schema. • Element Schema is the root element for every MS Schema document.
facets • Facets define the value space and properties for a specified data type. They consist of two types: fundamental and non-fundamental facets. • fundamental facets define a type • non-fundamental facets impose restrictions on the type by limiting the range
fundamental facets • Equal – allows comparison • Ordered- allows words to be placed in a predefined ordering • Bounded- allows a lower and upper limit to be provided • Cardinality- defines numeric relationship between occurrences of an entity (as in minOccurs=“0” maxOccurs=“unbounded” ) Recall, we used the “+” for this in DTDs. • Numeric – a value can be classified as numeric or nonnumeric as in numeric(value=“true”) or numeric(value=“false”) • Example…we might define isbn to consist of exactly 10 digit chars: <xs:simpleType name =“isbnType” <xs:restriction base=“xs:string”> xs:pattern value=“[0-9]{10}”/> </xs:restriction> </xs:simpleType> • Note- this is not precisely the way isbns are defined, since the last character might be alpha and provides a parity check
Euros…up to 10 decimal digits and exactly 2 decimal places <xs:element name =“Euros”> <xs:simpleType name =“EuroDollarType” <xs:restriction base=“decimal”> <xs:totalDigits value=“10”/> <xs:fractionDigits value=“2”/> </xs:restriction> </xs:simpleType> </xs:element> A document instance <Euros>55.63</Euros>
Derived user types: use simpleType definitions and one of 3 methods: restriction, list and Union • Restriction uses one or more constraining facets to restrict the value or lexical space for the base type. A postal code might use: <xsd:simpleType name =“zipType”> <xsd:restriction base=“xsd:string”> <xsd:pattern value =“[0-9]{9}”/> </xsd:restriction> </xsd:simpleType>
Derived user types: list • List uses a predetermined itemType sequence of attributes to derive a new type. A “whitespace-delimited” list of decimal values for some lottery might be <?xml version=‘1.0’?> <xs:schema xmlns:xs=“http://www.w3.org/2001/XMLSchema”> <xs:simpleType name=“MyWinningNumbers”> <xs:list itemType name=“decimal”/> </xs:simpleType> </xs:schema> With a document instance of <numbers xsi:type=“MyWinningNumbers”>94 33 12 76</numbers>
Derived user types:union • Union creates a datatype derived from more than one base type. A number of basetypes participate in the union. <xsd:simpleType name=“UnionDemo”> <xsd:union memberTypes=“AType BType”/> </xsd:simpleType> Here the two types could be any base types. This would enable using eg., string or int to define a month as in: <Month>Jan Feb Mar</Month> <Month>1 2 3</Month>
A complex type xsd (see next slide for discussion of sequence) <xs:complexType> <xs:sequence> <xs:element name="Author" type="xs:string"/> <xs:element name="Name" type="xs:string"/> <xs:element name="Address" type="xs:string"/> <xs:element name="City" type="xs:string"/> <xs:element name="State" type="xs:string"/> <xs:element name="Zip" type="xs:string"/> </xs:sequence> </xs:complexType>
Compositors • Allow us to specify: • Sequential order of elements • Choice of elements • The ALL compositor allowing no restrictions for order and selection • The previous slide used sequence.
Choice • We might use, as part of a schema: <xs:choice> <xs:element name=“creditcard” type=“xs:string”/> <xs:element name=“cash” type=“xs:decimal”/> <xs:element name=“trade” type=“xs:string”/> <xs:choice>
ALL • ALL is similar to ANY <xs:element name="FamilyName"> <xs:complexType> <xs:all> < xs:element name="firstName" type="xs:string"/> <xs:element name="middleName" type="xs:string"/> <xs:element name="lastName" type="xs:string" minOccurs="0"/> </xs:all> </xs:complexType> </xs:element>
namespaces • Xsd and xs are used interchangeably. • Xs serves as the default prefix for all XSD schemas. • There are 3 distinct namespaces • The XML schema namespace • The XML schema data type namespace • The XML schema instance namespace • An example of the first appeared above as: <xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema>
Global elements- anything declared before complexType is global (see below) <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" > <xs:element name="Name" type="xs:string"/> <xs:element name="Address" type="xs:string"/> <xs:element name="Author"> <!- -this is where global declarations stop - -> <xs:complexType> <xs:sequence> <xs:element name="City" type="xs:string"/> <xs:element name="State" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
Global elements-document instance <Author xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="global.xsd"> <City>String</City> <State>String</State> </Author>
Using a target namespace for your document --- this binds the document to the schema <?xml version="1.0" encoding="UTF-8"?> <Author xmlns:xs="http://employees.oneonta.edu/higgindm/Authors"> <Name>Dwight Peltzer</Name> <Address>PO Box 555</Address> <City>Oyster Bay</City> <State>NY</State> <Zip>11771</Zip> <Publisher> <Name>Addison Wesley</Name> <City>Boston</City> <State>Massachusetts</State> </Publisher> <BookTitle>XML Language Mechanics</BookTitle> <ISBN>0-1-23458-0</ISBN> </Author>
target namespace for your document may be omitted. This means you are using built-in types and mapping all elements/attributes to the default namespace. You are then prevented from reusing locally declared elements. <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema”> <xs:element name="Author"> <xs:complexType> <xs:sequence> <xs:element name="Name" type="xs:string"/> <xs:element name="Address" type="xs:string"/> <xs:element name="City" type="xs:string"/> <xs:element name="State" type="xs:string"/> <xs:element name="Zip" type="xs:short"/> <xs:element name="Publisher" type="xs:string"/> <xs:element name="BookTitle" type="xs:string"/> <xs:element name="ISBN" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
The document instance would not then have a namespace <?xml version="1.0" encoding="UTF-8"?> <Author xmlns:xsi=http://www.w3.org.2001.XMLSchema-instance" xsi:noNamespaceSchemaLocation="Author.xsd"> <Name>Dwight Peltzer</Name> <Address>PO Box 555</Address> <City>Oyster Bay</City> <State>NY</State> <Zip>11771</Zip> <Publisher> <Name>Addison Wesley</Name> <City>Boston</City> <State>Massachusetts</State> </Publisher> <BookTitle>XML Language Mechanics</BookTitle> <ISBN>0-1-23458-0</ISBN> </Author>
Adding the target namespace to the schema root defines a namespace for your user-defined declarations <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.dpsoftware.com/namespaces/Author" xmlns="http://www.dpsoftware.com/namespaces/Author"> <xs:element name="Author"> <xs:complexType> <xs:sequence> <xs:element name="Name" type="xs:string"/> <xs:element name="Address" type="xs:string"/> <xs:element name="City" type="xs:string"/> <xs:element name="State" type="xs:string"/> <xs:element name="Zip" type="xs:string"/> <xs:element name="Publisher" type="xs:string"/> <xs:element name="BookTitle" type="xs:string"/> <xs:element name="ISBN" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
An example document <?xml version="1.0" encoding="UTF-8"?> <dp:Author xmlns:dp="http://www.dpsoftware.com/namespaces/Author" xmlns:xsi="http://www.w3.org/20011/XMLSchema-instance" xsi:schemaLocation="http://www.dpsoftware.com/namespaces/author/AuthorV1.xsd"> <Name>Dwight Peltzer</Name> <Address>Po Box 555</Address> <City>Oyster Bay</City> <State>NY</State> <Zip>11771</Zip> <Publisher>Addison Wesley</Publisher> <BookTitle>XML Language Mechanics</BookTitle> <ISBN>0-1-23458-0</ISBN> </dp:Author>
Namespace prefix can be used to qualify each element in a doc <?xml version="1.0" encoding="UTF-8"?> <dp:Author xmlns:dp="http://www.dpsoftware.com/namespaces/author" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:SchemaLocation="http://www.dpsoftware.com/namespaces/author Author.xsd"> <dp:Name>Dwight Peltzer</dp:Name> <dp:Address>PO Box 555</dp:Address> <dp:City>Oyster Bay</dp:City> <dp:State>NY</dp:State> <dp:Zip>11771</dp:Zip> <dp:BookTitle>XML Language Mechanics</dp:BookTitle> <dp:ISBN>0-1-23458-0</dp:ISBN> </dp:Author>
Russian doll model <Book> <Title>XML</Title> <Author>Dwight Peltzer</Author> </Book> <xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema> <xs:element name="Book"> <xs:complexType> <xs:sequence> <xs:element name="Title" type="xs:string"/> <xs:element name="Author" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
Salami slice model <xs:element name="Title" type="xs:string"/> <xs:element name="Author" type="xs:string"/> <xs:element name="Book"> <xs:complexType> <xs:sequence><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xs:element name="Title" type="xs:string"/> <xs:element name="Author" type="xs:string"/> <xs:element name="Book"> <xs:complexType> <xs:sequence> <xs:element ref="Title"/> <xs:element ref="Author"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> <!- -reassemble Title and Author - -> <xs:element ref="Title"/> <xs:element ref="Author"/> </xs:sequence> </xs:complexType> </xs:element>
Venetian blind model • Venetian blind model uses elementFormDefault and attributeFormDefault to switch back and forth (hiding/exposing namespaces) in the document instance <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:element name="Employer"> <xs:annotation> <xs:documentation>Comment describing your root element</xs:documentation> </xs:annotation> </xs:element> <xs:complexType name="employeeType"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="contact" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:complexType name="employeeTypeExt"> <xs:complexContent> <xs:extension base="employeeType"> <xs:sequence> <xs:element name="empName" type="employeeType"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> <xs:element name="employee" type="employeeTypeExt"/> </xs:schema>
All NamedType components in this xsd are reusable <xs:simpleType name="Title"> <xs:restriction base="xs:string"> <xs:enumeration value="Sci_Fi"/> <xs:enumeration value="Information Systems"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="Name"> <xs:restriction base="xs:string"> <xs:minLength value="1"/> </xs:restriction> </xs:simpleType> <xs:complexType name="Editor"> <xs:sequence> <xs:element name="Title" type="Title"/> <xs:element name="Author" type="Editor"/> </xs:complexType> <xs:element name="Book" type="Editor"/>
ContentModel template- use type attribute to reference the named complex type definition <xs:complexType name="nameType"> <xs:sequence> <xs:element ref="firstName"/> <xs:element ref="middleName"/> <xs:element ref="lastName"/> </xs:sequence> /xs:complexType>
An ms schema <?xml version = "1.0"?> <!-- intro-schema.xml --> <!-- Microsoft XML Schema showing the ElementType --> <Schema xmlns = "urn:schemas-microsoft-com:xml-data"> <ElementType name = "message" content = "textOnly" model = "closed"> <description>Text messages</description> </ElementType> <ElementType name = "greeting" model = "closed" content = "mixed" order = "many"> <element type = "message"/> </ElementType> <ElementType name = "myMessage" model = "closed" content = "eltOnly" order = "seq"> <element type = "greeting" minOccurs = "0" maxOccurs = "1"/> <element type = "message" minOccurs = "1" maxOccurs = "*"/> </ElementType> </Schema>
schema elements • xmlns specifies the default namespace for the Schema element and the elements it contains. • Attribute value urn:... is the uri for this namespace. • Microsoft’s xml parser recognizes element Schema and this namespace and validates the schema. • Element Schema can contain only elements of ElementType for defining elements, AttributeType for their attributes and description for describing the element. • This example specifies that element message may contain textOnly. • The closed model attribute specifies that only elements declared in this schema may appear in conforming xml documents, anything else would invalidate the document. • Element greeting has mixed content, indicating that both elements and character data may appear here. Order =“many” indicates that any number of message elements and text may be contained in the greeting.
a conforming xml document <?xml version = "1.0"?> <!-- Fig. 7.2 : intro.xml --> <!-- Introduction to Microsoft XML Schema --> <myMessage xmlns = "x-schema:intro-schema.xml"> <greeting>Welcome to XML Schema! <message>This is the first message.</message> </greeting> <message>This is the second message.</message> </myMessage>
a well-formed but non-conforming xml document <?xml version = "1.0"?> <!-- Fig. 7.3 : intro2.xml --> <!-- An invalid document --> <myMessage xmlns = "x-schema:intro-schema.xml"> <greeting>Welcome to XML Schema!</greeting> <message>This is a message that contains another message. <message>This is the inner message.</message> </message> </myMessage>
Namespaces and declaring schema <myMessage xmlns = "x-schema:intro-schema.xml"> • The namespace declaration xmlns=“…” references the schema being used. • For MS Schema, the URI must begin with x-schema followed by a colon and the name of the schema document. • Element greeting may have mixed content and in this example greeting marks up text and has a child message element.
Element attributes • ElementType has attributes: content, dt:type, name, model and order. • Element ElementType’s child elements are: description, datatype, element, group, AttributeType and attribute. • Element element has attributes type, minOccurs, maxOccurs. • Element group has attributes order, minOccurs, maxOccurs. • Element AttributeType has attributes: default, dt:type, dt:values, name and required. • Element attribute has attributes: default, type, required.
An example of AttributeType and attribute <?xml version = "1.0"?> <!-- contact-schema.xml --> <!-- Defining attributes --> <Schema xmlns = "urn:schemas-microsoft-com:xml-data"> <ElementType name = "contact" content = "eltOnly" order = "seq" model = "closed"> <AttributeType name = "owner" required = "yes"/> <attribute type = "owner"/> <element type = "name"/> <element type = "address1"/> <element type = "address2" minOccurs = "0" maxOccurs = "1"/> <element type = "city"/> <element type = "state"/> <element type = "zip"/> <element type = "phone" minOccurs = "0" maxOccurs = "*"/> </ElementType>
An example of AttributeType and attribute (part2) <ElementType name = "name" content = "textOnly" model = "closed"/> <ElementType name = "address1" content = "textOnly" model = "closed"/> <ElementType name = "address2" content = "textOnly" model = "closed"/> <ElementType name = "city" content = "textOnly" model = "closed"/> <ElementType name = "state" content = "textOnly" model = "closed"/> <ElementType name = "zip" content = "textOnly" model = "closed"/> <ElementType name = "phone" content = "textOnly" model = "closed"> <AttributeType name = "location" default = "home"/> <attribute type = "location"/> </ElementType> </Schema>
A conforming xml document <?xml version = "1.0"?> <!-- Fig. 7.11 : contact.xml --> <!-- A contact list marked up as XML --> <contact owner = "Bob Smith" xmlns = "x-schema:contact-schema.xml"> <name>Jane Doe</name> <address1>123 Main St.</address1> <city>Sometown</city> <state>Somestate</state> <zip>12345</zip> <phone>617-555-1234</phone> <phone location = "work">978-555-4321</phone> </contact>
MS Schema datatypes • DTD did not permit the specification of allowable datatypes (content) an element or attribute might contain. • Namespace prefix dt is defined by the document author and assigned to urn:schemas-microsoft-com:datatypes • Msdn.microsoft.com/xml/reference/schema/datatypes.asp has a complete list of types supported.
MS Schema datatypes • boolean: 0 or 1 • char: a character, “X” • string: a sequence of char as in “XYZ” • float and int: as in C or Java • date: YYYY-MM-DD • time:HH:MM:SS • id: text which uniquely identifies an element or its attribute. • idref: a reference to an id. • enumeration: a series of values from which one is chosen.