390 likes | 407 Views
Introduction to XML Schema. John Arnett, MSc Standards Modeller Information and Statistics Division NHSScotland Tel: 0131 551 8073 (x2073) mailto:John.Arnett@isd.csa.scot.nhs.uk http://isdscotland.org/xml. Contents. Introduction Document Type Definitions - reminder W3C Schema
E N D
Introduction to XML Schema John Arnett, MSc Standards Modeller Information and Statistics Division NHSScotland Tel: 0131 551 8073 (x2073) mailto:John.Arnett@isd.csa.scot.nhs.uk http://isdscotland.org/xml
Contents • Introduction • Document Type Definitions - reminder • W3C Schema • Schema Structures • Built-In Types • Summary • Find Out More
Introduction • Schema • a diagram, plan or framework • XML – a document that describes an XML document.
Introduction • Purpose • Data validation • Contract • System documentation • Processing information
Introduction • Schema Data Validation • Element and attribute structure • Element ordering • Value constraints • Built-in data types • Size and pattern constraints • Enumerations • Uniqueness constraints
Introduction • Schema Languages • Document Type Definitions (DTD’s) • W3C XML Schema • OASIS RELAX NG • Schematron
Document Type Definitions • DTD Benefits <!ELEMENT Record (FamilyName, GivenName, Sex, DateOfBirth)> <!ELEMENT FamilyName (#PCDATA)> <!ELEMENT GivenName (#PCDATA)> <!ELEMENT Sex (#PCDATA)> <!ELEMENT DateOfBirth (#PCDATA)> <!ATTLIST Record recordId CDATA #REQUIRED> • Easy to understand and implement • Lightweight alternative to schemas
Document Type Definitions • DTD Limitations • Use non-XML syntax • Only limited support for data typing and namespaces • Difficult to extend
W3C Schema • W3C Recommendation • XML Schema Part 0: Primer • Introduction (guidance) • XML Schema Part I: Structures • defines schema components • XML Schema Part 2: Datatypes • defines built-in datatypes and their restrictions
W3C Schema Structures • Most commonly used structures: • elements and attributes • simpleTypes • complexTypes • model groups • minOccurs and maxOccurs • annotation and documentation • schema and namespaces
W3C Schema Structures • element and attribute • Basic building blocks of documents <element name=“Record”> <complexType> <sequence> <element name=“FamilyName” type=“string”/> <element name=“GivenName” type=“string”/> <element name=“Sex” type=“token”/> <element name=“DateOfBirth” type=“date”/> </sequence> <attribute name=“recordId” type=“integer”/> </complexType> </element>
W3C Schema Structures • element and attribute • valid instances of Record element <Record recordId=“1”> <FamilyName>Arnett</FamilyName> <GivenName>John</GivenName> <Sex>M</Sex> <DateOfBirth>1963-06-01</DateOfBirth> </Record> <Record recordId=“2”> <FamilyName>Smith</FamilyName> <GivenName/> <Sex>FEMALE</Sex> <DateOfBirth>1971-04-11</DateOfBirth> </Record>
W3C Schema Structures • element and attribute • invalid Record element instance <Record recordId=“1”>Mr <Surname>Arnett</Surname> <GivenName>John</GivenName> <Sex>M</Sex> <DateOfBirth>06-Jan-63</DateOfBirth> </Record>
W3C Schema Structures • simpletype Definitions • Define element content • Character data only - no nested (child) elements permitted • No attributes permitted • Always derived from a built-in types (using restriction)
W3C Schema Structures • simpletype definition examples <simpleType name=“TextType”> <restriction base=“string”> <minLength value=“1”/> <maxLength value=“35”/> </restriction> </simpleType> <simpleType name=“GenderType”> <restriction base=“token”> <enumeration value=“M”/> <enumeration value=“F”/> <enumeration value=“NK”/> </restriction> </simpleType>
W3C Schema Structures • complexType Definitions • Define element content • Child elements and character data permitted • attributes permitted
W3C Schema Structures • complexType definition examples <complexType name=“DemographicsStructure”> <sequence> <element name=“FamilyName” type=“TextType”/> <element name=“GivenName” type=“TextType”/> <element name=“Sex” type=“GenderType”/> <element name=“DateOfBirth” type=“date”/> </sequence> <attribute name=“recordId” type=“integer”/> </complexType> <element name=“Record” type=“DemographicsStructure”/> <element name=“Person” type=“DemographicsStructure”/> <element name=“Client” type=“DemographicsStructure”/>
W3C Schema Structures • sequence • elements must occur in the order specified • choice • one of several child elements must be selected • all • 0 or 1 occurences in any order • Model groups
W3C Schema Structures • Model group examples <complexType name=“DemographicsStructure”> <sequence> <element name=“FamilyName” type=“TextType”/> <element name=“GivenName” type=“TextType”/> <element name=“Sex” type=“GenderType”/> <choice> <element name=“DateOfBirth” type=“date”/> <element name=“Age” type=“integer”/> </choice> </sequence> <attribute name=“recordId” type=“integer”/> </complexType>
W3C Schema Structures • Model groups • Valid instances of Record element <Record recordId=“1”> <FamilyName>Arnett</FamilyName> <GivenName>John</GivenName> <Sex>M</Sex> <DateOfBirth>1963-06-01</DateOfBirth> </Record> <Record recordId=“2”> <FamilyName>Smith</FamilyName> <GivenName>Jane</GivenName> <Sex>F</Sex> <Age>28</Age> </Record>
W3C Schema Structures • minOccurs and maxOccurs • control the occurence of element instances • minOccurs=“0” • occurrence is optional • maxOccurs=“unbounded” • multiple occurences allowed • may be applied to any child element, sequence or choice
W3C Schema Structures • minOccurs and maxOccurs examples <complexType name=“DemographicsStructure”> <sequence> <element name=“FamilyName” type=“TextType”/> <element name=“GivenName” type=“TextType” maxOccurs=“unbounded”/> <element name=“Sex” type=“GenderType” minOccurs=“0”/> <choice> <element name=“DateOfBirth” type=“date”/> <element name=“Age” type=“integer”/> </choice> </sequence> <attribute name=“recordId” type=“integer”/> </complexType>
W3C Schema Structures • minOccurs and maxOccurs • Valid instances of Record element <Record recordId=“1”> <FamilyName>Arnett</FamilyName> <GivenName>John</GivenName> <GivenName>Gordon</GivenName> <Sex>M</Sex> <DateOfBirth>1963-06-01</DateOfBirth> </Record> <Record recordId=“2”> <FamilyName>Smith</FamilyName> <GivenName>Jane</GivenName> <Age>28</Age> <!-- Optional “Sex” element missing --> </Record>
W3C Schema Structures • Namespaces • W3C namespace http//www.w3.org/2001/XMLSchema • element, complexType, sequence, etc • targetNamespace • Optional • User defined • One per schema document
W3C Schema Structures • schema with namespaces <xsd:schema=“PersonalRecord” targetNamespace=“http://www.person.rec” xmlns:xsd=“http//www.w3.org/2001/XMLSchema”> <!-- Type definitions, etc with namespace prefixes --> <xsd:complexType name=“RecordStructure”> ... </xsd:complexType> <xsd:simpleType name=“TextType”/> ... </xsd:complexType> <xsd:simpleType name=“GenderType”/> ... </xsd:complexType> </xsd:schema>
W3C Schema Structures • annotation and documentation <xsd:simpleType name=“GenderType”> <xsd:annotation> <xsd:documentation>The sex of an individual for administrative purposes.</xsd:documentation> <xsd:annotation> <xsd:restriction base=“token”> <xsd:enumeration value=“M”/> <xsd:enumeration value=“F”/> <xsd:enumeration value=“NK”/> </xsd:restriction> </xsd:simpleType>
W3C Schema Structures • annotation and documentation <xsd:simpleType name=“GenderType”> <xsd:restriction base=“token”> <xsd:enumeration value=“M”/> <xsd:enumeration value=“F”/> <xsd:enumeration value=“NK”> <xsd:annotation> <xsd:documentation>This is used when the sex cannot be determined for physical reasons, e.g. a new born baby</xsd:documentation> <xsd:annotation> </xsd:enumeration> </xsd:restriction> </xsd:simpleType>
Built-in Simple Types • 44 built-in simple types - most are atomic • Used directly in schemas or used to create user-defined simple types
Built-in Simple Types • String-based types • string • normalizedString • token
Built-in Simple Types • Numeric Types • float and double • decimal • integer
Built-in Simple Types • Date and Time Types • date • time • dateTime • gYear, gMonth, gDay • duration
Built-in Simple Types • Others • boolean • base64Binary and hexBinary • anyURI
Built-in Simple Types • Facets • length • minLength • maxLength • minExclusive • minInclusive • maxExclusive • minExclusive • totalDigits • fractionDigits • whiteSpace • pattern • enumeration
Built-in Simple Types • Length facets <xsd:simpleType name=“TextType”> <xsd:restriction base=“string”> <xsd:minLength value=“1”/> <xsd:maxLength value=“35”/> </xsd:restriction> </xsd:simpleType> <xsd:element name=“Comment” type=“TextType”/> <Comment>This is a valid value</Comment> <Comment/> <Comment>This is an invalid value because it contains more than 35 characters</Comment>
Built-in Simple Types • enumeration facet <xsd:simpleType name=“GenderType”> <xsd:restriction base=“token”> <xsd:enumeration value=“M”/> <xsd:enumeration value=“F”/> <xsd:enumeration value=“NK”/> </xsd:restriction> </xsd:simpleType> <xsd:element name=“Sex” type=“GenderType”/> <Sex>NK</Sex> <Sex>Male</Sex>
Built-in Simple Types • pattern facet <xsd:simpleType name="PostCodeType"> <xsd:restriction base="xsd:string"> <xsd:pattern value="[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}"/> </xsd:restriction> </xsd:simpleType>
Advanced Features • Multi-document schemas • Complex type derivation • Reusable groups • Element substitution • Schema redefinition • Identity constraints • Schema design
Summary • Used to validate structure and values XML instance documents • Uses XML syntax • W3C Recommendation specifies data structures and built-in types • Supports namespaces • Has many advanced features, incl. several extensibilty mechanisms
Find Out More • XML Schema Part 0: Primer • www.w3.org/TR/xmlschema-0/ • XML Schema Part 0: Structures • www.w3.org/TR/xmlschema-1/ • XML Schema Part 0: Datatypes • www.w3.org/TR/xmlschema-2/