600 likes | 694 Views
SE 5145 – eXtensible Markup Language ( XML ). XML Schema. 2011-12/Spring, Bahçeşehir University , Istanbul. 3rd Assignment: Validating XML with DTD & XML Schema (page 1/2).
E N D
SE 5145 – eXtensibleMarkup Language (XML) XML Schema 2011-12/Spring, Bahçeşehir University, Istanbul
3rd Assignment: Validating XML with DTD & XML Schema (page 1/2) The goal of this exercise is to understand the basic concepts of XML Schema and how it extends the capabilities of DTDs. You will use your XML Resume (CV) that you provided in Assignment 2. Task 1. XML Schema: Write an XML schema definition for your XML Resume satisfying the following requirements: • For any date in your Resume XML, make sure that your XML Schema checks for a valid date value. Try to avoid xs:string as much as possible, or if you think that something really is a string, use your own string type which for example could take care of checking for a maximum length and some character set (a xs:pattern could be used to achieve the latter). • Make sure that at least one of your types is used by more than one element (because reuse is good). In real-life applications, you would start to design a type library, and then start using it when constructing your schema from the ground up. • Use minOccurs and maxOccurs to restrict the cardinality of some elements. See next slide..
3rd Assignment: Validating XML with DTD & XML Schema (page 2/2) The following are recommended (but optional) for this assignment: • Depending on how similar or different your employer and education entries are, try to think of a way how you could find some structural similarity between these entries and then represent this similarity using complex type derivation. • Try to add a targetNamespace to your schema, so that your Resume schema now is a full-grown schema with its own namespace. Don't forget that you have to change the instance (by using the namespace there) to match the schema when you do that. • Identity constraints could be used to check various aspects of the Resume , depending on what you think should be unique, a key, or a reference to an existing key. A typical example would be to have a key for institutions (educational or companies), and then have each of your skills reference this key so that you can represent where you have acquired each skill. Task 3 – Validate XML: Use a tool to validate your XML Resume (*.xml) using your XML schema (*.xsd). • A suitable online tool is http://www.xmlvalidation.com/. On the first page, provide the XML document and select ‘Validate against external XML schema’. Click ‘Validate’ and provide the XML schema on the second page. • Another tool is Altova XML Spy (You can use download & use a trial version) • Alternatively, you can remember and use the recommended tools described by Melike (validator.w3.org) and Erokan (iexmltls.exe, msval.vbs) from the last presentations.
XML Schemas • “Schemas” is a general term--DTDs are a form of XML schemas • According to the dictionary, a schema is “a structured framework or plan” • When we say “XML Schemas,” we usually mean the W3C XML Schema Language • This is also known as “XML Schema Definition” language, or XSD • It has been introduced to overcome some of the commonly observed limitations of DTDs, most notably the lack of typing • DTDs, XML Schemas, RELAX NGand Schematronare all XML schema languages
What’s Wrong with DTDs? • DTDs do not support application-level datatypes • XML for B2B is very data-centric and needs typing • SGML was created for documents where typing was less important • DTDs do not support any relationships between markup constructs • content models cannot be reused • attribute lists cannot be reused • structural relationships cannot be exploited in the DTD • DTDsprovide a very weak specification language • No restrictions on text content • Verylittle control over mixed content (text plus elements) • Littlecontrol over ordering of elements • DTDs are written in a strange (non-XML) format • You need separate parsers for DTDs and XML
Why XML Schemas? • XML Schema Definition language (XSD) solves these problems • XSD allows you to constrain the content of XML documents like DTDs, but they are much more powerful & sophisticated. • XSDallows a much finer level of control over structure and content • XSD is written using XMLsyntax instead of a custom syntax like DTDs use • XML Schema's simple data type provide some semantics • a formerly undescribed attribute can now be described as being a xs:date • it can be understood as being a date and inserted into a calendar • but what kind of date is it? a birthday? an order date? a shipping date? • a question of the context of where the xs:date appears • XML Schema better supports model-level information • however, XML Schema also only captures part of the application semantics • an XML Schema is usually better than a DTD, because it contains types • types provide information about the basic datatypes being used • additional semantics (e.g., different kinds of dates) must be documented elsewhere
Why not XML schemas? • DTDs have been around longer than XSD • Therefore they are more widely used • Also, more tools support them • Power of XSD comes with a price: • XSDis a little harder and more verbose to write than DTDs, even by XML standards • More advanced XML Schema instructions can be non-intuitive and confusing • Nevertheless, XSD is not likely to go away quickly
Validation and Typing • XML Schema does two things at the same time: 1. Validation checks for structural integrity (is the document schema-valid?) • checking elements and attributes for proper usage (as with DTDs) • checking element contents and attribute values for proper values 2. Type annotations make the types available to applications • instead of having to look at the schema, applications get the Post-Schema Validation Infoset (PSVI) • type-based applications (such as XSLT 2.0) can work on the typed instance
Anatomy of aSchema • Schema uses the namespace defined by http://www.w3.org/2001/XMLSchema and usually uses xsd or xs prefix in the XML code • The file extension is .xsd • The root element is <schema> • XSD starts like this: • <?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.rg/2001/XMLSchema">
Referring to a schema • To refer to a DTD in an XML document, the reference goes beforethe root element: • <?xml version="1.0"?><!DOCTYPErootElement SYSTEM "url"><rootElement> ... </rootElement> • To refer to an XML Schema in an XML document, the reference goes inthe root element: • <?xml version="1.0"?><rootElementxmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"(The XML Schema Instance reference is required)xsi:noNamespaceSchemaLocation="url.xsd">(This is where your XML Schema definition can be found) ...</rootElement>
TYPES • A type is a set of values • the values can be enumerated (home, mobile, office) • the values can be described by extension (intervals, regular expressions) • DTDshave (almost) no types • element content is always #PCDATA (any number of any characters) • attributes most often are CDATA (any number of any characters) • attributes may have enumerated types (but no extensional types) • attributes may use ID/IDREF
“Simple” and “complex” elements • A “simple” element is one that contains text and nothing else • A simple element cannot have attributes • A simple element cannot contain other elements • A simple element cannot be empty • However, the text can be of many different types, and may have various restrictions applied to it • If an element isn’t simple, it’s “complex” • A complex element may have attributes • A complex element may be empty, or it may contain text, other elements, or both text and other elements
“Simple” types • Simple types describe values not structured by XML markup • they describe attribute values (date="2006-10-03") • they describe element content (<phone>+1-510-6432253</phone>) • Simple types can be used for elements or attributes • XML Schema treats contents in elements and attributes equally • simple type libraries can be designed independent of their eventual use • Simple types are available in three flavors • atomic types: one value of one type (one number in some range) • union types: one value of a union of types (a number or the string undefined) • list types: a whitespace-separated list of values (phone type="home office")
Named vs. Anonymous • Types can be named or anonymous • named types have a name and can be referenced (and thus be reused) • anonymous types have no name and can only be used where they are defined
Type Definitions • Simple types are sets of values • named simple types are sets of values with a name (and thus reusable) • anonymous simple types are sets of values defined where they are needed • Simple types are defined to represent model-level information • in most cases, they will have restrictions associated with them • they may also simply be tags for semantics (fax and phone numbers share the same value space) • XML Schema has a library of built-in datatypes • ur-types are the conceptual grounding of all types • primitive types are the types that are there by definition • derived types are based on primitive types • users can derive their own types using simple type restriction
Declaring Elements with Schema • Elements can be declared as having a simple or complex type • Types can be either built-in or defined by your Schema • Elements can also have mixed, empty, or element content, just like in DTDs • Elements can be given a minimum and maximum number of times that they are allowed to occur
Defining a simple element • A simple element is defined asxs:elementname="name" type="type" minoccurs/maxoccurs="number/unbounded" /> • where: • name is the name of the element • the most common values for type arexs:booleanxs:integerxs:datexs:stringxs:decimalxs:time • minoccurs and maxoccurs are optional, default value= 1 • Other attributes a simple element may have: • default="default value"if no other value is specified • fixed="value"no other value may be specified
Custom Simple Types with Restrict • You can define your own custom simpletypes byderiving them from existing simple types with restriction. • the base type must be a simple type • the derived type will be a simple type • all simple types form a tree, rooted asthe anySimpleType • Restriction are based on facets • each restriction can use 0-n facets • facets can be refined in further simple type restrictions • XML Schema designers should try to restrict types as much as possible – WHY ?
Restrictions • The general form for putting a restriction on a text value is: • <xs:element name="name"> (or xs:attribute) <xs:restriction base="type">... the restrictions ... </xs:restriction></xs:element> • For example: • <xs:element name="age"> <xs:simpleType> <xs:restriction base="xs:positiveInteger"> <xs:minInclusive value="0"/> <xs:maxInclusive value="140"/> </xs:restriction><xs:simpleType> </xs:element>
Facets • Facets define a certain way of restricting a simple type • Facets may be repeated in different levels of the type hierarchy • Not all facets are applicable to all types • the applicability depends on the primitive type being used
Restrictions on numbers • minInclusive -- number must be ≥ the given value • minExclusive -- number must be > the given value • maxInclusive -- number must be ≤ the given value • maxExclusive -- number must be < the given value • totalDigits -- number must have exactly value digits • fractionDigits -- number must have no more than value digits after the decimal point
Restrictions on strings • length -- the string must contain exactly value characters • minLength -- the string must contain at least value characters • maxLength -- the string must contain no more than value characters • pattern -- the value is a regular expression that the string must match • whiteSpace -- not really a “restriction”--tells what to do with whitespace • value="preserve" Keep all whitespace • value="replace" Change all whitespace characters to spaces • value="collapse" Remove leading and trailing whitespace, and replace all sequences of whitespace with a single space
Patterns • Patterns restrict the lexical space of simple types • most other facets restrict the value space (e.g., intervals of numbers) • in many cases, patterns are useful additions to value-oriented facets • Patterns are regular expressions • they support many common regex constructs and Unicode • the language pattern allows de, de-CH, and other tags • the pattern checks for lexical correctness, not against a code list ([a-zA-Z]{2}|[iI]-[a-zA-Z]+|[xX]-[a-zA-Z]{1,8})(-[a-zA-Z]{1,8})*
Facet Limitations • Facets limit one dimension of a type's value space • using pattern, the lexical space can also be restricted • restrictions should be made as specific as possible • no limitations are possible beyond the predefined facets • There is no connection to the context within the document • facets cannot make references to other values (e.g., neighboring attributes) • Additional constraints should be documented • documentation enables applications to implement constraint checking • other schema languages (likeSchematron) may be used to express these constraints
Enumeration • An enumeration restricts content to allowable choices • Example: • <xsd:elementname="season"><xsd:simpleType><xsd:restrictionbase="xsd:string"><xsd:enumerationvalue="Spring"/><xsd:enumerationvalue="Summer"/><xsd:enumerationvalue="Autumn"/><xsd:enumerationvalue="Fall"/><xsd:enumerationvalue="Winter"/></xsd:restriction></xsd:simpleType></xsd:element>
What is a Complex Type ? • Complex types describe the allowed element content • they describe what the element may contain (the element's content model) • they describe the attributes that an element may have (the element's attribute list) • Complex types do not define the element name • they define which content is allowed for the element • the element definition uses the complex type to define the allowed element content • Complex types have similar properties to simple types • they can be named or anonymous • Complex Type Derivation can be used to construct a type hierarchy
Declaring Complex Elements • To declare the elements with complex type: • Use the xsd:anyType value for the type attribute • Use the <xsd:complexType> tag in the definition Structure:<xs:element name="name"> <xs:complexType>... information about the complex type... </xs:complexType> </xs:element> • Remember that attributes are always simple types
Complex elements • Example:<xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element> • <xs:sequence> says that elements must occur in this order
Complex Types & Content Types • Complex types can have different kinds of content • simple content refers to simple type content using additional attributes • complex content is anything else (anything beyond simple type content) • Complex Type Derivation heavily depends on this classification
DTD Content Models • Defining Elements in DTDs uses a compact syntax • XML Schema supports the same facilities with a more verbose syntax • XML Schemas adds features which DTDs do not support • DTDs allow elements to be mandatory, optional, repeatable, or optional and repeatable • XML Schema allows the cardinality to be specified • DTDsallow sequences (,) and alternatives (|) • XML Schema introduces a (very limited) operator for all groups • Apart from the syntax, XML Schema content models are not very different
Empty Content • DTDs have a special keyword for empty elements • instead of the content model, the keyword EMPTY is used • empty elements may still have attribute lists associated with them • XML Schema empty types are defined implicitly • there is no explicit keyword for defining an empty type • if a type has no model group inside it, it is empty (it still may have attributes) • Declaring empty elements <xs:element name="myEmptyElement"><xs:complexType></xs:complexType> </xs:element>
Mixed Content • DTDs define mixed content by mixing #PCDATA into the content model • DTDs always require mixed content to use the form ( #PCDATA | a | b )* • the occurrence of elements in mixed content cannot be controlled • XML Schema defines mixed content outside of the content model • the content model is defined like an element-only content model • the mixed attribute on the type marks the type as being mixed • Example: (only one subtitle is allowed, why ?)
Mixed Content • XML Schema mixed content can use all model groups • it is possible to constrain element occurrences in the same way as in element-only content • in practice, this feature is rarely used (mixed content often is very loosely defined)
Defining an attribute • Attributes are always declared as simple types • Any of the simple types that can be used for elementscan also be used for attributes. • An attribute is defined as<xs:attribute name="name" type="type" />where: • name and type are the same as for xs:element
Defining an attribute • Other attributes a simple element may have: • default="default value"if no other value is specified • fixed="value"no other value may be specified • use="required"attribute must be present • use="optional" attribute is not required (default) • use="prohibited"attribute can not be used • Example: • <xsd:attributename="city" type="xsd:string" use="optional" default="istanbul"/>
Adding attributes to the elements • Adding attributes to an element that has an empty content model
Adding attributes to the elements • Adding attributes to an element that only has character data content
Adding attributes to the elements • Adding attributes to an element that have element or mixed content models
Global and local definitions • Elements declared at the “top level” of a <schema> are available for use throughout the schema • Elements declared within a xs:complexType are local to that type • Thus, in<xs:elementname="person"> <xs:complexType> <xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element>the elements firstName and lastName are only locally declared • The order of declarations at the “top level” of a <schema>do not specify the order in the XML data document
Declaration and use • So far we’ve been talking about how to declare types, not how to use them • To use a type we have declared, use it as the value of type="..." • Examples: • <xs:element name="student" type="person"/> • <xs:element name="professor" type="person"/> • Scope is important: you cannot use a type if is local to some other type
Declaring elements with element content • Sequence: child elements must appear in order • All: child elements can occur in any order • Choice: any one of the child elements from a list
sequence • child elements must appear in a specific order: • <xs:element name="person"> <xs:complexType> <xs:sequence><xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /></xs:sequence> </xs:complexType> </xs:element>
xs:all • Child elements can appear in any order • <xs:element name="person"> <xs:complexType> <xs:all> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /></xs:all> </xs:complexType> </xs:element> • Despite the name, the members of an xs:all group can occur once or not at all • You can useminOccurs="n" andmaxOccurs="n" to specify how many times an element may occur (default value is 1) • In this context, n may only be 0 or 1