390 likes | 409 Views
This comprehensive introduction covers XML Schema, including elements, attributes, data types, and constraints. Explore W3C Schema structures and built-in types to validate and document XML data effectively. Understand the importance of schema languages like DTD and W3C standards for structuring and modeling data. Discover how to define document structures using XML schemas for system documentation and information processing.
E N D
Introduction to XML Schema John Arnett, MSc Standards Modeller Information and Statistics Division NHSScotland Tel: 0131 551 8073 (x2073) mailto:John.Arnett@isd.csa.scot.nhs.uk http://isdscotland.org/xml
Contents • Introduction • Document Type Definitions - reminder • W3C Schema • Schema Structures • Built-In Types • Summary • Find Out More
Introduction • Schema • a diagram, plan or framework • XML – a document that describes an XML document.
Introduction • Purpose • Data validation • Contract • System documentation • Processing information
Introduction • Schema Data Validation • Element and attribute structure • Element ordering • Value constraints • Built-in data types • Size and pattern constraints • Enumerations • Uniqueness constraints
Introduction • Schema Languages • Document Type Definitions (DTD’s) • W3C XML Schema • OASIS RELAX NG • Schematron
Document Type Definitions • DTD Benefits <!ELEMENT Record (FamilyName, GivenName, Sex, DateOfBirth)> <!ELEMENT FamilyName (#PCDATA)> <!ELEMENT GivenName (#PCDATA)> <!ELEMENT Sex (#PCDATA)> <!ELEMENT DateOfBirth (#PCDATA)> <!ATTLIST Record recordId CDATA #REQUIRED> • Easy to understand and implement • Lightweight alternative to schemas
Document Type Definitions • DTD Limitations • Use non-XML syntax • Only limited support for data typing and namespaces • Difficult to extend
W3C Schema • W3C Recommendation • XML Schema Part 0: Primer • Introduction (guidance) • XML Schema Part I: Structures • defines schema components • XML Schema Part 2: Datatypes • defines built-in datatypes and their restrictions
W3C Schema Structures • Most commonly used structures: • elements and attributes • simpleTypes • complexTypes • model groups • minOccurs and maxOccurs • annotation and documentation • schema and namespaces
W3C Schema Structures • element and attribute • Basic building blocks of documents <element name=“Record”> <complexType> <sequence> <element name=“FamilyName” type=“string”/> <element name=“GivenName” type=“string”/> <element name=“Sex” type=“token”/> <element name=“DateOfBirth” type=“date”/> </sequence> <attribute name=“recordId” type=“integer”/> </complexType> </element>
W3C Schema Structures • element and attribute • valid instances of Record element <Record recordId=“1”> <FamilyName>Arnett</FamilyName> <GivenName>John</GivenName> <Sex>M</Sex> <DateOfBirth>1963-06-01</DateOfBirth> </Record> <Record recordId=“2”> <FamilyName>Smith</FamilyName> <GivenName/> <Sex>FEMALE</Sex> <DateOfBirth>1971-04-11</DateOfBirth> </Record>
W3C Schema Structures • element and attribute • invalid Record element instance <Record recordId=“1”>Mr <Surname>Arnett</Surname> <GivenName>John</GivenName> <Sex>M</Sex> <DateOfBirth>06-Jan-63</DateOfBirth> </Record>
W3C Schema Structures • simpletype Definitions • Define element content • Character data only - no nested (child) elements permitted • No attributes permitted • Always derived from a built-in types (using restriction)
W3C Schema Structures • simpletype definition examples <simpleType name=“TextType”> <restriction base=“string”> <minLength value=“1”/> <maxLength value=“35”/> </restriction> </simpleType> <simpleType name=“GenderType”> <restriction base=“token”> <enumeration value=“M”/> <enumeration value=“F”/> <enumeration value=“NK”/> </restriction> </simpleType>
W3C Schema Structures • complexType Definitions • Define element content • Child elements and character data permitted • attributes permitted
W3C Schema Structures • complexType definition examples <complexType name=“DemographicsStructure”> <sequence> <element name=“FamilyName” type=“TextType”/> <element name=“GivenName” type=“TextType”/> <element name=“Sex” type=“GenderType”/> <element name=“DateOfBirth” type=“date”/> </sequence> <attribute name=“recordId” type=“integer”/> </complexType> <element name=“Record” type=“DemographicsStructure”/> <element name=“Person” type=“DemographicsStructure”/> <element name=“Client” type=“DemographicsStructure”/>
W3C Schema Structures • sequence • elements must occur in the order specified • choice • one of several child elements must be selected • all • 0 or 1 occurences in any order • Model groups
W3C Schema Structures • Model group examples <complexType name=“DemographicsStructure”> <sequence> <element name=“FamilyName” type=“TextType”/> <element name=“GivenName” type=“TextType”/> <element name=“Sex” type=“GenderType”/> <choice> <element name=“DateOfBirth” type=“date”/> <element name=“Age” type=“integer”/> </choice> </sequence> <attribute name=“recordId” type=“integer”/> </complexType>
W3C Schema Structures • Model groups • Valid instances of Record element <Record recordId=“1”> <FamilyName>Arnett</FamilyName> <GivenName>John</GivenName> <Sex>M</Sex> <DateOfBirth>1963-06-01</DateOfBirth> </Record> <Record recordId=“2”> <FamilyName>Smith</FamilyName> <GivenName>Jane</GivenName> <Sex>F</Sex> <Age>28</Age> </Record>
W3C Schema Structures • minOccurs and maxOccurs • control the occurence of element instances • minOccurs=“0” • occurrence is optional • maxOccurs=“unbounded” • multiple occurences allowed • may be applied to any child element, sequence or choice
W3C Schema Structures • minOccurs and maxOccurs examples <complexType name=“DemographicsStructure”> <sequence> <element name=“FamilyName” type=“TextType”/> <element name=“GivenName” type=“TextType” maxOccurs=“unbounded”/> <element name=“Sex” type=“GenderType” minOccurs=“0”/> <choice> <element name=“DateOfBirth” type=“date”/> <element name=“Age” type=“integer”/> </choice> </sequence> <attribute name=“recordId” type=“integer”/> </complexType>
W3C Schema Structures • minOccurs and maxOccurs • Valid instances of Record element <Record recordId=“1”> <FamilyName>Arnett</FamilyName> <GivenName>John</GivenName> <GivenName>Gordon</GivenName> <Sex>M</Sex> <DateOfBirth>1963-06-01</DateOfBirth> </Record> <Record recordId=“2”> <FamilyName>Smith</FamilyName> <GivenName>Jane</GivenName> <Age>28</Age> <!-- Optional “Sex” element missing --> </Record>
W3C Schema Structures • Namespaces • W3C namespace http//www.w3.org/2001/XMLSchema • element, complexType, sequence, etc • targetNamespace • Optional • User defined • One per schema document
W3C Schema Structures • schema with namespaces <xsd:schema=“PersonalRecord” targetNamespace=“http://www.person.rec” xmlns:xsd=“http//www.w3.org/2001/XMLSchema”> <!-- Type definitions, etc with namespace prefixes --> <xsd:complexType name=“RecordStructure”> ... </xsd:complexType> <xsd:simpleType name=“TextType”/> ... </xsd:complexType> <xsd:simpleType name=“GenderType”/> ... </xsd:complexType> </xsd:schema>
W3C Schema Structures • annotation and documentation <xsd:simpleType name=“GenderType”> <xsd:annotation> <xsd:documentation>The sex of an individual for administrative purposes.</xsd:documentation> <xsd:annotation> <xsd:restriction base=“token”> <xsd:enumeration value=“M”/> <xsd:enumeration value=“F”/> <xsd:enumeration value=“NK”/> </xsd:restriction> </xsd:simpleType>
W3C Schema Structures • annotation and documentation <xsd:simpleType name=“GenderType”> <xsd:restriction base=“token”> <xsd:enumeration value=“M”/> <xsd:enumeration value=“F”/> <xsd:enumeration value=“NK”> <xsd:annotation> <xsd:documentation>This is used when the sex cannot be determined for physical reasons, e.g. a new born baby</xsd:documentation> <xsd:annotation> </xsd:enumeration> </xsd:restriction> </xsd:simpleType>
Built-in Simple Types • 44 built-in simple types - most are atomic • Used directly in schemas or used to create user-defined simple types
Built-in Simple Types • String-based types • string • normalizedString • token
Built-in Simple Types • Numeric Types • float and double • decimal • integer
Built-in Simple Types • Date and Time Types • date • time • dateTime • gYear, gMonth, gDay • duration
Built-in Simple Types • Others • boolean • base64Binary and hexBinary • anyURI
Built-in Simple Types • Facets • length • minLength • maxLength • minExclusive • minInclusive • maxExclusive • minExclusive • totalDigits • fractionDigits • whiteSpace • pattern • enumeration
Built-in Simple Types • Length facets <xsd:simpleType name=“TextType”> <xsd:restriction base=“string”> <xsd:minLength value=“1”/> <xsd:maxLength value=“35”/> </xsd:restriction> </xsd:simpleType> <xsd:element name=“Comment” type=“TextType”/> <Comment>This is a valid value</Comment> <Comment/> <Comment>This is an invalid value because it contains more than 35 characters</Comment>
Built-in Simple Types • enumeration facet <xsd:simpleType name=“GenderType”> <xsd:restriction base=“token”> <xsd:enumeration value=“M”/> <xsd:enumeration value=“F”/> <xsd:enumeration value=“NK”/> </xsd:restriction> </xsd:simpleType> <xsd:element name=“Sex” type=“GenderType”/> <Sex>NK</Sex> <Sex>Male</Sex>
Built-in Simple Types • pattern facet <xsd:simpleType name="PostCodeType"> <xsd:restriction base="xsd:string"> <xsd:pattern value="[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}"/> </xsd:restriction> </xsd:simpleType>
Advanced Features • Multi-document schemas • Complex type derivation • Reusable groups • Element substitution • Schema redefinition • Identity constraints • Schema design
Summary • Used to validate structure and values XML instance documents • Uses XML syntax • W3C Recommendation specifies data structures and built-in types • Supports namespaces • Has many advanced features, incl. several extensibilty mechanisms
Find Out More • XML Schema Part 0: Primer • www.w3.org/TR/xmlschema-0/ • XML Schema Part 0: Structures • www.w3.org/TR/xmlschema-1/ • XML Schema Part 0: Datatypes • www.w3.org/TR/xmlschema-2/