230 likes | 357 Views
ISO 19757 – Document Schema Definition Languages (DSDL). Martin Bryan Convenor, JTC1/SC18 WG1. Parts of DSDL. Overview Regular-grammar-based validation (RELAX NG) Rule-based validation (Schematron) Namespace-based validation dispatch language (NVDL) Datatypes
E N D
ISO 19757 –Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1 ISO 19757 - DSDL
Parts of DSDL • Overview • Regular-grammar-based validation (RELAX NG) • Rule-based validation (Schematron) • Namespace-based validation dispatch language (NVDL) • Datatypes • Path-based integrity constraints • Character repertoire validation • Declarative document architectures • Datatype- and namespace-aware DTDs • Validation management ISO 19757 - DSDL
Regular-grammar-based validation (RELAX NG) • XML description of a data model • Compact syntax is even simpler than DTDs • Provides way of defining short-cuts • More functional than parameter entities • Provides context-dependent models • Models can be amended when imported • Supports namespaces and datatypes • Any datatype, including W3C Schema datatypes • Can import modules from multiple namespaces • Can build multi-source schemas ISO 19757 - DSDL
Main components of RELAX NG pattern ::= <element name="QName"> pattern+ </element> | <element> nameClass pattern+ </element> | <attribute name="QName"> [pattern] </attribute> | <attribute> nameClass [pattern] </attribute> | <group> pattern+ </group> | <interleave> pattern+ </interleave> | <choice> pattern+ </choice> | <optional> pattern+ </optional> | <zeroOrMore> pattern+ </zeroOrMore> | <oneOrMore> pattern+ </oneOrMore> | <list> pattern+ </list> | <mixed> pattern+ </mixed> | <ref name="NCName"/> | <parentRef name="NCName"/> | <empty/> | <text/> | <value [type="NCName"]> string </value> | <data type="NCName"> param* [exceptPattern] </data> | <notAllowed/> | <externalRef href="anyURI"/> | <grammar> grammarContent* </grammar> ISO 19757 - DSDL
Using the full syntax <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <ref name="document"/> </start> <define name="document"> <element name="document"> <ref name="head"/> <ref name="body"/> </element> </define> <define name="head"> <element name="head"> <interleave> <element name="organization"> <choice> <value>ISO</value> <value>ISO/IEC</value> </choice> </element> <element name="document-type"> <choice> <value>International Standard</value> <value>Technical Report</value> <value>Guide</value> <value>Publicly Available Specification</value> <value>Technical Specification</value> <value>International Standardized Profile</value> </choice> </element> ISO 19757 - DSDL
Alternative compact syntax • Can produce a whole ISO standard using just: namespace p = "http://relaxng.org/ns/proofsystem" datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes" formal = element p:* { attribute * { text }*, (formal|text)* } inline &= formal* block |= formal block |= element grammarref|rngref {attribute src { xsd:anyURI }} include "is.rnc“ • Can replace existing definitions with new one • Can extend definitions • |= means “add this option to an existing OR group” • &= means “add this option to an existing AND group” • Can merge grammars ISO 19757 - DSDL
“A Schematron schema contains natural-language assertions concerning a set of documents, marked up with various elements and attributes for testing these natural-language assertions, and for simplifying and grouping the assertions.” “A Schematron schema reduces to a non-chaining rule system whose terms are boolean functions invoking an external query language on the instance and other visible XML documents, with syntactic features to reduce specification size and to allow efficient implementation.” Rule-based validation (Schematron) ISO 19757 - DSDL
Schematron example <sch:rule context="failed-assert | successful-report"> <sch:extends rule="second-level" /> <sch:assert test="count(diagnostic-reference) + count(text) = count(*)"> The <sch:name/> element should only contain a text element and diagnostic reference elements. </sch:assert> <sch:assert test="count(text) = 1"> The <sch:name/> element should only contain a text element. </sch:assert> <sch:assert test="preceding-sibling::fired-rule | preceding-sibling::failed-assert | preceding-sibling::successful-report"> A <sch:name/> comes after a fired-rule, a failed-assert or a succesful-report. </sch:assert> </sch:rule> ISO 19757 - DSDL
Schematron core elements • active • assert • extends • include • let • name • ns • param • pattern • phase • report • rule • schema • value-of ISO 19757 - DSDL
Ancilliary elements and attributes • diagnostics element • diagnostic element • dir element • emph element • p element • span element • title element • flag attribute • fpi attribute • icon attribute • role attribute • see attribute • subject attribute ISO 19757 - DSDL
Namespace-based ValidationDispatching Language (NVDL) • Allows data from different namespaces to be validated by different processes • Can validate one namespace using RELAX, another using a DTD and a third using a W3C Schema • Simple and full syntaxes • Full syntax simplified to simple syntax before use • All validation is done in context • Slots are created to identify where data from alternative namespaces has been removed • Allows attributes from different namespaces to be validated • Elements and attributes in different namespaces are separated into separate “sections” ISO 19757 - DSDL
NVDL example – HTML + XForms (1) <rules xmlns="purl://dsdl.org/nvdl/ns/structure/1.0" xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"> <namespace ns="http://www.w3.org/2002/06/xhtml2"> <validate schema="xhtml2.rng"> <mode> <namespace ns="http://www.w3.org/2002/xforms"> <validate schema="xforms.rng"> <mode> <namespace ns="http://www.w3.org/2002/xforms"> <attach message="Skipped descendant XForms sections."/> </namespace> <namespace ns="http://www.w3.org/2002/06/xhtml2"> <unwrap message="Skipped descendant XHTML2 sections."/> </namespace> </mode> </validate> … ISO 19757 - DSDL
NVDL example (2) <unwrap> <mode> <namespace ns="http://www.w3.org/2002/xforms"> <unwrap message="Skipped descendant XForms"/> </namespace> <namespace ns="http://www.w3.org/2002/06/xhtml2"> <attach message="Any descendant XHTML2 sections"/> </namespace> </mode> </unwrap> </namespace> </mode> </validate> </namespace> </rules> ISO 19757 - DSDL
Datatypes • Allows multiple datatype sets to be defined • W3C datatypes can be used as the base • Will allow user-defined datatype primitives to be added • Needed for extended date/period formats, etc • Will provide mechanism for defining complex patterns • Patterns based on supertypes will be allowed • Normalization of values, comparing results after normalization • Convert local date formats to ISO 8601 then compare ISO 19757 - DSDL
Possible form for Part 5 <datatype name="price"> <supertype name="decimal"> <cast> <if test="not(sign='-')"> <copy-of select="whole-part"/> <text>.</text> <my:fraction-part> <value-of select(substring(concat(fraction-part, '00'), 1,2)"/> </my:fraction-part> </if> </cast> </supertype> </datatype> ISO 19757 - DSDL
Path-based integrity constraints • Non-hierarchical links between information items in a structured resource can be identified by addressing items within the document tree and then expressing the relationship between them. • Provides a method for identifying information items dependent on ancestry or the use of keys • And a method for describing the role of relationships that are not hierarchical • Allows selection of fragments to be validated • Will include an extensible basis for supporting mechanisms not currently available ISO 19757 - DSDL
Character repertoire validation • User-defined character sets that can be used to validate the contents of elements or attributes • Will be able to check that only characters relevant for a particular language are used, not all those in a particular Unicode character block • Schematron-like rules for associating character repertoires with a particular element or attribute <sch:rule context="*[/*[@xml:lang='nl']]"> <sch:assert test="\p{IsBasicLatin}\p{IsLatin-1Supplement} IJij\p{IsGeneralPunctuation}\p{IsCurrencySymbols}"> If this document is a Dutch document, it should have only characters used in typical Dutch publishing. </sch:assert> </sch:rule> ISO 19757 - DSDL
Declarative document architectures • Allows locally meaningful names to be assigned to schema components • 80/20 rule allows many functions of abstract classes • Allows predefined fragments to be defined within schema • Reintroduces entity definitions in a more controllable form • May contain optional components • Can even re-define entity names • No longer restricted to English-based prompts to reference standard entity references such as • Removing elements/attribute in defined contexts ISO 19757 - DSDL
Datatype/Namespace-aware DTDs • Shows how the ISO 8879/XML Document Type Definition (DTD) syntax can be extended to validate documents that make full use of XML Namespaces and Part 5 Datatypes • May be extended to add character repertoire validation • Will allow DTDs to be used to validate any XML document, including those defined using Part 2 • Will allow SGML documents to be treated as input to ISO 19757 validation processes ISO 19757 - DSDL
Validation management • Includes a mechanism to invoke parsers which read non-XML sources (and XML sources that can't be identified by a single URI) to create XML Infosets that can be used for subsequent processing • Allows pre-validation transformations to be used to normalize and/or subset documents before validation • Multiple validations and transformations may be applied • Transformations will be able to split a document into multiple resulting documents • Includes facilities to generate customized validation reports which can be output as XML document instances that can be processed by other applications ISO 19757 - DSDL
Possible format for Part 10 <framework> <rule> <instance> <transform transformation="normalize.xslt"/> </instance> <assert> <isValid schema="my-schema.rng"/> <isValid schema="my-schema.sch"/> </assert> </rule> </framework> ISO 19757 - DSDL
Current status • Published • Part 2, RELAX-NG • At Committee Draft stage • Part 3, Schematron • Part 4, NVDL • Working Draft under consideration • Part 1, Overview • Part 7, Character repertoire validation • Part 8, Declarative document architectures • Part 10, Validation management • Parts 5, 6 & 9 not yet drafted ISO 19757 - DSDL
Tracking progress • Via your national standards body • IST/41 at BSI • Via XML UK or any ISUG chapter • Martin Bryan is XML UK representative on IST/41 and ISUG representative for SC34/WG1 • Via the DSDL public website • http://www.dsdl.org ISO 19757 - DSDL