1 / 31

CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

Lecture 5 XML Schema (Based on Møller and Schwartzbach, 2006, pp.113-159). CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226). David Meredith d.meredith@gold.ac.uk www.titanmusic.com/teaching/cis336-2006-7.html. Problems with DTDs.

misty
Download Presentation

CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 5 XML Schema (Based on Møller and Schwartzbach, 2006, pp.113-159) CIS336Website design, implementation and management(also Semester 2 of CIS219, CIS221 and IT226) David Meredith d.meredith@gold.ac.uk www.titanmusic.com/teaching/cis336-2006-7.html

  2. Problems with DTDs • DTDs cannot constrain character data • e.g., cannot specify that (#PCDATA) must only be a valid integer representation • need more powerful datatype mechanism • Attribute types are too limited • e.g., cannot specify that an attribute value must be an integer, a URI etc. • Element and attribute definitions cannot depend on context • e.g., cannot specify that unit attribute only allowed if amount attribute is present • Character data cannot be combined with regular expression content model • i.e., mixed content always has form (#PCDATA | e1 | e2)* • cannot specify order in which character data may be interspersed with elements • Element content model lacks "interleaving" operator that allows us to specify that an element may occur anywhere inside an element • e.g., cannot (easily) specify that comment element may occur anywhere in contents of recipe element

  3. More problems with DTDs • DTD provides very limited support for modularity, reuse and evolution of schemas • hard to write, maintain and read large DTD schemas • ID/IDREF mechanism is too limited • sometimes want to specify a more restricted scope for an ID attribute than the whole instance document • also might want to use multiple attribute values or character data as keys rather than just single attribute value • DTDs do not support namespaces

  4. XML Schema • DTDs defined as part of the XML 1.0 specification (February 1998) • inherited from SGML • Shortly afterwards, W3C initiated XML Schema project to deal with problems in DTDs • XML Schema Requirements (1999) specifies that XML Schema should be: • more expressive than XML DTD • a well-formed XML language • self-describing • i.e., it should be possible to describe the syntax of XML Schema using an XML Schema (since XML Schema is an XML language) • simple enough to implement with modest design and runtime resources (which limits expressiveness) • XML Schema specification should be: • defined quickly to prevent competing schema languages gaining a foothold • precise, concise, human-readable and illustrated with examples

  5. XML Schema technical requirements • XML Schema should • contain mechanism for constraining use of namespaces • allow creation of user-defined datatypes for describing character data and attribute values • enable inheritance for element, attribute and datatype definitions • support evolution of schemas • permit embedded structured documentation within schemas

  6. XML Schema recommendation • Official XML Schema specification published as W3C recommendation in 2001 • in 2 parts: • XML Schema Part 1: Structures • Describes core XML Schema including, for example, element and attribute declarations • Most recent version: Second Edition, 28 October 2004 • Available online at http://www.w3.org/TR/xmlschema-1/ • XML Schema Part 2: Datatypes • Defines facilities for defining datatypes in XML Schema • Most recent version: Second Edition, 28 October 2004 • Available online at http://www.w3.org/TR/xmlschema-2/ • Does not satisfy all original requirements: • not simple • Partly remedied by XML Schema Part 0: Primer • Provides easily readable description of the XML Schema facilities • Most recent version: 28 October 2004 • Available online at • http://www.w3.org/TR/xmlschema-0/ • not fully self-describing • not sufficiently expressive • e.g., cannot express full syntax of RecipeML

  7. XML Schema overview • Contains a sophisticated type system like those in common programming languages • Facilitates re-use and improves schema structure • Four central constructs in XML Schema all based on types and are as follows: • Simple type definition • Defines a family of Unicode text strings • Describes text without markup • Complex type definition • Defines validity requirements for attributes, sub-elements and character data in an element of that type • Describes text which may contain markup • Element declaration • Associates element name with either a simple or complex type • Attribute declaration • Associates attribute name with simple type • Attribute values are always unstructured text

  8. An example schema written in XML Schema • Schema at left shows • one element declaration • student • two attribute declarations: • id, score • one complex type definition: • StudentType • one simple type definition: • Score • XML Schema elements identified by namespace http://www.w3.org/2001/XMLSchema • Namespace prefix ("xsd") is arbitrary but conventional • Root element in XML Schema document is named schema • usually contains targetNamespace attribute • defines namespace being defined by the schema • also declare this namespace with a prefix so that can refer to definitions within the schema • Definitions create new types; declarations describe constituents of the instance document • Definitions and declarations populate the target namespace

  9. Syntax for element and attribute declarations • Element declaration has form<element name="name" type="type"/> • associates simple or complex type, type, with the element named name • Attribute declaration has form<attribute name="name" type="type"/> • associates simple type, type, with an attribute named name

  10. Simple student instance document • Can avoid use of prefixes in attribute names Can avoid use of

  11. Business card example • Instance doc at top left in language defined at bottom left • Assume we own the domain businesscard.org • so no-one else uses this namespace • Can fix it so that no need for prefix in uri attribute • Compare DTD

  12. Connecting instance documents and schemas • Instance document can refer to a schema using schemaLocation attribute from the namespace, http://www.w3.org/2001/XMLSchema-instance • Value of schemaLocation attribute has two parts, separated by whitespace: • target namespace of schema • URI of schema document • schemaLocation indicates that document is supposed to be valid with respect to the schema • schemaLocation attributes may appear in any element • usually appear in root element • can also appear in another element to indicate that the schema applies to the subtree under that element • means XML languages can be combined at will • schemaLocation attribute value is actually sequence of "namespace URI" pairs • if more than one pair, all schemas apply independently

  13. More on schemaLocation • All attributes defined in http://www.w3.org/2001/XMLSchema-instanceimplicitly declared for all elements in instance document • schemaLocation attributes are optional • make instance documents self-describing • Applications require documents to be valid relative to schemas decided by application developers, not schemas decided by document authors • XMLSchema does not directly enforce a particular root element • e.g., an XMLSchema definition of XHTML cannot express that the root element must be html • means that application must check root element as well as carrying out XML validation

  14. Simple types • Simple type or datatype is set of Unicode strings with a particular semantic interpretation • e.g., decimal datatype is built-in XML Schema datatype which consists of all strings that represent decimal numbers (e.g., 3.1415) • 3.1415 is equal to 3.141500 • 42 is less than 117 • XML Schema contains some primitive simple types with pre-defined meanings • XML Schema also provides various mechanisms for deriving new types from existing ones

  15. Simple Types (Datatypes) – Primitive stringany Unicode string boolean true, false, 1, 0 decimal 3.1415 float 6.02214199E23 double 42E970 dateTime 2004-09-26T16:29:00-05:00 time 16:29:00-05:00 date 2004-09-26 hexBinary 48656c6c6f0a base64Binary SGVsbG8K anyURI http://www.brics.dk/ixwt/ QName rcp:recipe, recipe ...

  16. Some built-in derived simple types • normalizedString • as string but whitespace facet is replace • token • as string but whitespace facet is collapse • language • "en", "da", "en-US", etc. • NMTOKEN • e.g., "42", "my.form", "r103" • NMTOKENS • e.g., "42 my.form r103" • nonPositiveInteger • e.g., "-87", "0"

  17. A simple type element declaration • <element name="serialnumber" type="nonNegativeInteger"/> • assigns built-in primitive simple type, nonNegativeInteger, to elements named serialnumber • contents of a serialnumber element must match nonNegativeInteger (possibly with surrounding whitespace) • serialnumber element cannot contain child elements or attributes

  18. Deriving new simple types by restriction • Restriction of a simple type defines a new type by restricting possible values of a base type • restriction performed on facets of base type (see table above left) • restriction may contain multiple constraining facets • Facet restrictions operate at semantic not syntactic level • e.g., <totalDigits value="3"/> allows 123, 0123 and 0123.0 but not 1234 and 123.05

  19. Deriving new simple types by restriction • enumeration facet restricts values to a finite set of possibilities (see above left) • pattern facet allows values to be constrained to satisfy regular expressions (see above right) • symbols that have a special meaning within regular expressions can be escaped by prefixing with a backslash (e.g., \*) • For most facets, restrictions may be changed in further derivations unless fixed="true" attribute is added to constraining facet

  20. Deriving simple types using list and union • Use the list element inside a simpleType definition to define a whitespace separated string of values of a particular type (see above left) • e.g., "23 4 56 -7" is of type integerlist • Use union element inside a simpleType definition to specify that a value must be one of two or more types • e.g., "true" and "1.3" are both of type boolean_or_decimal

  21. Complex types • An element declaration may assign a complex type to an element name:<element name="card" type="b:card_type"/> • means that elements with the name card must satisfy all the requirements specified in the definition of the type card_type • complex type definition may specify attributes, child element types and ordering and character data • Complex type defined using XML Schema element, complexType • content of complexType element can be either complex or simple

  22. Element reference • Element reference takes the form<element ref="name" /> • name is the name of an element that has already been declared • Note difference between element element with name attribute and one with a ref attribute!

  23. sequence element • Concatenation within the content of an element with a complex content model is expressed using the sequence element

  24. choice element • Union (i.e., the '|' operator in a regular expression) corresponds to the choice element • At left, each card element contains either an email element or zero or 1 phone elements but not both

  25. all element • A content sequence matches an all expression if each constituent of the expression is matched somewhere in the content model and every element in the content model is matched by a constituent in the expression • Essentially variant of sequence in which order does not matter

  26. any element • any empty element is a wildcard that matches any element • Attribute namespace limits matching elements in various ways • whitespace separated list of URIs • ##targetNamespace • ##local • empty namespace • ##any • ##other • any namespace except targetNamespace

  27. any element • Can be used to specify that a different language is used inside an element • e.g., XHTML used inside the info element in WidgetML (see above) • content must consist of one or more elements from the XHTML namespace

  28. Some restrictions • all element may only contain element references • sequence and choice elements cannot contain all elements • complexType contents cannot consist of single element or any declaration • need to wrap it in a sequence or choice element

  29. Attribute references • A complex type may optionally contain a number of attribute references of the form<attribute ref="name" /> • name is the name of the attribute that has been declared elsewhere • attribute reference must appear after the content model description of a complex type • attribute reference can contain an attribute named use which can take the values optional (default) or required

  30. minOccurs and maxOccurs • minOccurs and maxOccurs attributes can be used with • element, sequence, choice, all and any elements • define possible cardinalities of the element • values must be non-negative integers or, for maxOccurs, unbounded • by default, minOccurs and maxOccurs are 1

  31. mixed attribute • complexType may optionally have an attribute, mixed="true" • means arbitrary character data is permitted anywhere in the content in addition to the elements declared in the content model • Without mixed="true" attribute, only whitespace allowed between elements in content model • Character data cannot be constrained if we also want to allow elements in the content

More Related