340 likes | 449 Views
QUALITY CONTROL WITH SCHEMAS. CSC1310 Fall 2009. BASIS CONCEPTS. Schema is a pass-or-fail test for document Schema is a minimum set of requirements for document to prevent anomalous processing or to formalize an application. Validation is a testing a document with a schema.
E N D
QUALITY CONTROL WITH SCHEMAS CSC1310 Fall 2009
BASIS CONCEPTS • Schema is a pass-or-fail test for document • Schema is a minimum set of requirements for document to prevent anomalous processing or to formalize an application. • Validation is a testing a document with a schema. • Structure: use and placement of markup elements and attributes. • Data typing: patterns of character data • Integrity: the status of links between nodes and resources. • Business rules: spelling checks, checksum results and so on.
DOCUMENT TYPE DEFINITIONS(DTDS) • DTD is the oldest and widely supported schema language. • DTD declares a set of allowed elements (vocabulary). • DTD defines a content model for each element (grammar) • DTD declares a set of allowed attributes for each element: name, data type, default values, behavior (for example, required or optional).
DOCUMENT PROLOG FOR DTD • All external parsed entities (including DTD) should begin with text declaration. • Text declaration looks like XML declaration except explicitly excluding the standalone property. <?xml version=“1.0” encoding=“character set”> • Encoding in DTD won’t automatically carry over the XML documents that use the DTD. • External parsed entities (including DTD) must not contain a document type declaration.
DECLARATIONS • DTD is a set of rules (declarations). • Each declaration adds a new element, set of attributes, entity or notation. • If there are redundant entity declarations, the first one that appears takes precedence, others are ignored. • EMPTY: no information (special tags like <br>) • ANY: any information. • PCDATA or CDATA : character data. • With Children : a parent-child relationship (order of kids).
USE OF CHILDREN • There are ways that children elements can be defined in a DTD file : • One Occurrence Only • Minimum of One Occurence (+) • Zero or More Occurences (*) • Zero or One Occurences (?) • Either / Or Occurrences ( | )
ATTRIBUTES • There are four value options : • Value: The default value of the attribute surrounded by quotes ( " ") • #IMPLIED: The attribute is optional • #FIXED: A fixed value. • #REQUIRED: The attribute is required when the element is used.
TYPES OF ATTRIBUTE • CDATA : The value is Character Data. • (en1|en2|...) : The value is an enumerated list. • ID : The value is a unique id. • IDREF : The value is the id of another element. • IDREFS : The value is a list of other ids • NMTOKEN : The value is a valid XML name. • NMTOKENS : The value is a list of valid XML names. • ENTITY : The value is an entity. • ENTITIES : The value is a list of entities. • NOTATION : The value is a name of a notation. • xml: The value is a predefined XML value.
EXAMPLE <!ELEMENT date (year, month, day)> <!ELEMENT year #PCDATA> <!ELEMENT month #PCDATA > <!ELEMENT day #PCDATA >
EXAMPLE <!ELEMENT address (street, city, country, zip)> <!ELEMENT street (#PCDATA | unit )*> <!ELEMENT city #PCDATA > <!ELEMENT country #PCDATA > <!ELEMENT zip #PCDATA > <!ELEMENT unit #PCDATA >
EXAMPLE <!ELEMENT person (name, age, gender)> <!ELEMENT name (first, last, (junior | senior)? )> <!ELEMENT age #PCDATA > <!ELEMENT gender #PCDATA > <!ELEMENT first #PCDATA > <!ELEMENT last #PCDATA > <!ELEMENT junior #EMPTY> <!ELEMENT senior #EMPTY> <!ATTLIST person pid ID #REQUIRED employed (fulltime|partime)>
TIPS FOR DESIGNING DTD • Organize declarations into groups by their purpose • Blocks, hierarchical elements, part of tables, lists, etc. • Use whitespace • More understandable and easier to navigate. • Use comments • At the top of each DTD file: purpose, version number, contact information • Customization: original, authors, your changes. • Label each section and subsection of the DTD. • Track version • Use parameter entities • Hold recurring parts of declarations and allow to edit them in one place.
PARAMETER ENTITIES • In the external DTD, can be used in: • Element-type declarations to hold element groups • Attribute list declarations to hold attribute definition. • In the internal DTD, can hold only complete declarations. <!ENTITY % common.atts “ id ID # IMPLIED class CDATA #IMPLIED”> <!ATTLIST foo %common.atts;> <!ATTLIST bar %common.atts; extra CDATA #FIXED “blah”>
IMPORTING MODULES • .mod means file contains declarations but should not be used as DTD on its own. • External entity import all the text in a file. <!ELEMENT catalog (title, metadata, front, entries+)> <!ENTITY % basic.stuff SYSTEM “basics.mod”> %basic.stuff; <!ENTITY % front.matter SYSTEM “front.mod”> %front.matter; <!ENTITY % metadata SYSTEM “metadata.dtd”> %metadata;
CONDITIONAL SECTIONS • Conditional section is a special form of markup in DTD to mark a region for inclusion or exclusion. • Conditional section can be used only in external subsets <![INCLUDE [ DTD text ]]> <![IGNORE [ DTD text ]]> <![INCLUDE [ <!ELEMENT blah #PCDATA>]]>
OVERRIDING ELEMENT • In DTD: <!ENTITY % default.polyhedron “INCLUDE”> <![%default.polyhedron;[ <!ELEMENT polyhedron (side+,angle+)>]]> • In XML: <!DOCTYPE picture SYSTEM “shapes.dtd”[ <!ENTITY %default.polyhedron “IGNORE”> <!ELEMENT polyhedron (side, side, side+, angle, angle, angle+)>] >
LIMITATION OF DTD • DTD describes how elements are arranged in document, but say a little about the content in document. • DTD is not flexible in children order. • Lockdown namespace: any element in a document has to have a corresponding declaration in DTD. • Schema is a new validation system: • contains rules that all must be satisfied for a document to be considered valid • is not built into the XML specification. • W3C XML Schema, RELAX NG, Schematron.
NAMESPACES • Namespaces are used to group elements and attributes. xmlns: namespace_prefix = “namespace_identifier” <part catalog xlmns:nw=“http://www.nutware.com” xlmns=“http://www.bobco.com”> #implicit namespace <nw:entry nw:number=“1327”> < nw:decription > hexnut < /nw:description ></nw:entry> <part id=“555”> <name> type 4 </name> </part> </part-catalog>
W3C SCHEMA (2001) • XML document by themselves. • In DTD: <!ELEMENT country #PCDATA > • In W3C Schema <xs:schema xlmns:xs=“http://www.w3.org/2001/XMLSchema”> <xs:element name=“country” type=“xs:string”/> </xs:schema>
WIDELY USED TYPES. • xs:string any text • xs:token textual tokens separated by whitespace • xs:decimal any decimal number • xs:integer any integer number • xs:float floating-point number • xs:ID, xs:IDREF the same as ID, IDREF in DTD • xs:boolean “true”/”false” (“1”/”0”) • xs:time time as HH:MM:SS-Timezone • xs:date date in format CCYY-MM-DD • xs:dateTime date/time combination in format CCYY-MM-DDTHH:MM:SS-Timezone • xs:Qname namespace-qualified name
COMPLEX ELEMENT IN SCHEMA <xs:element name=“date”> <xs:complexType> <xs:all> <xs:element ref=“year”/> <xs:element ref=“month”/> <xs:element ref=“day”/> </xs:all> </xs:complexType> </xs:element> <xs:element name=“year” type=“xs:integer”/> <xs:element name=“month” type=“xs:integer”/> <xs:element name=“day” type=“xs:integer”/>
FACETS • Facet is a way to control the range of the data type. <xs:simpleType name=“monthNum”> <xs:restriction base=“xs:integer”> <xs:minInclusive value=“1”/> <xs:maxInclusive value=“12”/> </xs:restriction> </xs:simpleType> <xs:element name=“month” type=“monthNum”/> • Facets can create fixed values, constrain the length of strings, match patterns, set allowed values.
FACETS EXAMPLE • List of allowed values: <xs:simpleType name=“genderType”> <xs:restriction base=“xs:token”> <xs:enumeration value=“female”/> <xs:enumeration value=“male”/> </xs:restriction> </xs:simpleType> • Pattern: <xs:simpleType name=“pcode”> <xs:restriction base=“xs:token”> <xs:pattern value=“[0-9]{3}[A-Z]{3}”/> </xs:restriction> </xs:simpleType>
SCHEMA EXAMPLE <xs:schema xlmns:xs= “http://www.w3.org/2001/ XMLSchema”> <xs:element name=“census-record”> <xs:complexType> <xs:sequence> <xs:element ref=“date”/> <xs:element ref=“address”/> <xs:element ref=“person” maxOccurs=“unbounded”/> </xs:sequence> <xs:attribute ref=“taker”/> </xs:complexType> </xs:element>
SCHEMA EXAMPLE <xs:attribute name=“taker”> <xs:simpleType> <xs:restriction base=“xs:integer”> <xs:minInclusive value=“1”/> <xs:maxInclusive value=“9999”/> </xs:restriction> </xs:simpleType> </xs:attribute>
SCHEMA EXAMPLE <xs:element name=“date” type=“xs:date”> <xs:element name=“address”> <xs:complexType> <xs:all> <xs:element ref=“street”/> <xs:element ref=“city”/> <xs:element ref=“country”/> <xs:element ref=“zip”/> </xs:all> </xs:complexType> </xs:element> <xs:element name=“street” type=“xs:string”/> <xs:element name=“city” type=“xs:string”/> <xs:element name=“country” type=“xs:string”/>
SCHEMA EXAMPLE <xs:element name=“zip”> <xs:simpleType> <xs:restriction base=“xs:token”> <xs:pattern value=“[0-9]{3}[A-Z]{3}”/> </xs:restriction> </xs:simpleType> </xs:element>
SCHEMA EXAMPLE <xs:element name=“person”> <xs:complexType> <xs:all> <xs:element ref=“name”/> <xs:element ref=“age”/> <xs:element ref=“gender”/> </xs:all> <xs:attribute ref=“employed”/> <xs:attribute ref=“pid”/> </xs:complexType> </xs:element>
SCHEMA EXAMPLE <xs:attribute name=“employed”> <xs:simpleType > <xs:restriction base=“xs:token”> <xs:enumeration value=“fulltime”/> <xs:enumerationvalue=“parttime”/> <xs:enumerationvalue=“none”/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name=“pid”> <xs:simpleType> <xs:restriction base=“xs:integer”> <xs:minInclusive value=“1”/> <xs:maxInclusive value=“999999”/> </xs:restriction> </xs:simpleType> </xs:attribute>
SCHEMA EXAMPLE <xs:element name=“age”> <xs:simpleType> <xs:restriction base=“xs:integer”> <xs:minInclusive value=“0”/> <xs:maxInclusive value=“150”/> </xs:restriction> </xs:simpleType> </xs:element> <xs:attribute name=“gender”> <xs:simpleType > <xs:restriction base=“xs:token”> <xs:enumeration value=“female”/> <xs:enumerationvalue=“male”/> </xs:restriction> </xs:simpleType> </xs:element>
SCHEMA EXAMPLE <xs:element name=“name”> <xs:complexType> <xs:all> <xs:element ref=“first”/> <xs:element ref=“last”/> </xs:all> <xs:choice minOccurs=“0”> <xs:element ref=“junior”/> <xs:element ref=“senior”/> </xs:choice> </xs:complexType> </xs:element>
SCHEMA EXAMPLE <xs:element name=“junior” type=“emptyElem”/> <xs:element name=“senior” type=“emptyElem”/> <xs:complexType name=“emptyElem”/> </xs:schema>