200 likes | 479 Views
SchemaPath: a minimal extension to XML Schema for conditional constraints. Paolo Marinelli Claudio Sacerdoti Coen Fabio Vitali University of Bologna (Italy). Validation. Validation is writing correctness rules for an XML document, and verifying that they hold for every document received.
E N D
SchemaPath: a minimal extension to XML Schema for conditional constraints Paolo Marinelli Claudio Sacerdoti Coen Fabio Vitali University of Bologna (Italy)
Validation • Validation is writing correctness rules for an XML document, and verifying that they hold for every document received. • Possible with a number of schema languages, roughly divided in two kinds: • grammar-based languages: DTD, XML Schema (XSD), Relax NG, etc. A whole generative grammar is created, and every document that can be built with this grammar is valid. • Rule-based languages: Schematron, xlinkit, etc. Rules are defined to check for special conditions (required or rejected). Every document that does not violate any of these rules is valid. Next: Why validate XML documents? 2/20
DOM Tree + PSVI XMLdoc rules rules DOM tree downstream application DOM parser Not well-formed Schema validator Invalid Why validate XML documents? • Usually, when receiving data from an unreliable source, programmers intersperse their application code with checks on data values, error handling, remedial procedures, etc. • Validation does all checks before submitting the data to the downstream application, removing the need for most of the checks on data values. Next: The PSVI 3/20
The PSVI • XML Schema adds to the validation of XML structures another concept: the decoration of each structure of the XML document with additional information. This is called the Post-Schema Validation Infoset, or PSVI • This can be useful for the downstream application, that can activate specific code depending on the PSVI data available for each element. • The most important contribution of the PSVI is without doubt the data type: validation code can assess that an element contains a valid date, a valid number, or a valid complex markup structure, so that the downstream application can skip any control on it and call appropriate handling code. Next: Unfortunately… 4/20
Unfortunately… • … most schema languages cannot express all the structure and data constraints that document designers may need. • For example: • Mutual exclusion (“element x may have either the a attribute or the b attribute, but not both) • Deep exclusions (“element x cannot contain, at any level of its subtree, element y”) • Structure-dependent structures (“if the item is gratis (the attribute gratis is present), then no price should be specified (the element price should be absent)”) • Data-dependent structures (“if the address is a PO box, then the address must include a PO box number, otherwise it must include a street name and a street number”) • These kinds of constraints are known as co-constraints, or co-occurrence constraints. Most real life XML document types have one or more of those constraints. Next: Plenty of examples 5/20
Plenty of examples • XHTML • “a elements cannot contain other a elements” (appendix B) • Both the normative DTD and the non normative XML Schema cannot express fully this requirement (they only express a weaker form: “a elements cannot directly contain other a elements”) • XSLT • “In a template element at least one of the match and name attributes must be present” • Again, the DTD and XML schema cannot express this requirement, and specify both attributes as optional. • XML Schema itself • “An element definition must either contain a ref or a name attribute, but not both. Furthermore, if the name attribute is present, then the type attribute or one of the simpleType or complexType elements must be present, but not two.” • The normative XML schema can only specify all these elements and attributes as optional. • … and plenty more… Next: Who cares? 6/20
? ? ? XMLdoc rules rules DOM tree downstream application DOM parser Not well-formed Schema validator DOM Tree + PSVI invalid Who cares? • Documents could contain violations to these rules, and still be considered valid according to the DTD or XML schema. • Three solutions: • Cross your fingers and hope for the best • Provide a default behavior (pick one option and ignore other structures) • Provide validation code within the downstream application incorrect Next: Schematron 7/20
Schematron • Schematron could in fact express most of these requirements (but data- and structure-related structures only through hacks). • Schematron lacks generative rules, and they can be specified with great pain, or by mixing Schematron rules with grammar-based rules of another schema language. • Suggestions to use XML Schema and Schematron together in one schema document exist in literature. • Quite complex in practice, requires competence in both languages, and has problems with PSVI. Next: Extending XML Schema 8/20
Extending XML Schema • Our view is that the only practical solution is to extend XML Schema (or another grammar-based language). • If the extension is minimal, then implementation costs, learning efforts, and impact on existing schemas are also minimal. Next: Our proposal: SchemaPath 9/20
Our proposal: SchemaPath • SchemaPath is our proposal to minimally extend XML Schema to handle co-constraints of all kinds. • The idea is to find a way to conditionally assign types to elements and attributes. • Furthermore, a non-satisfiable type is added for specifying error conditions to avoid. • SchemaPath maintains the XML Schema syntax, adds only ONE construct and ONE pre-defined simple type, maintains important XML Schema properties (the validation theorem and round-tripping and reverse round-tripping properties), and does not impact the PSVI for valid documents. • Its simplest implementation is straightforward and trivial (~15 lines of code) in any language and architecture where an XSLT engine and an XML Schema engine already exist. • Qualified under namespace http://www.cs.unibo.it/SchemaPath/1.0, but the parser accepts also plain XSD schema namespace. Next: SchemaPath syntax (in one slide!) 10/20
SchemaPath syntax (in one slide!) • <xsd:alt>: Expresses a condition in the type assignment of an element or an attribute. Its attributes are: • cond: an optional XPath expressing the condition that must be verified for the type assignment to be performed. Multiple conditions may be verified, in which case a priority mechanism is employed. An alt elements without an explicit cond attribute implicitly has a low-priority, default, always-true condition. • priority: an optional decimal number specifying the priority level of a condition, in case the default priority is unsatisfactory. • type: a required XML Schema type name which is assigned to the element or attribute if the condition holds and has the top priority. • xsd:error: a predefined unsatisfiable simple type. Assigning this type to an element or an attribute always determines a validation error. Next: A few examples 11/20
A few examples • Mutual exclusion • “Element x may have either the a attribute or the b attribute but not both”. Suppose we have defined a type myType with both a and b attributes as optional <xsd:element name=“x”><xsd:alt cond=“(@a and @b)” type=“xsd:error”/><xsd:alt type=“myType”/> </xsd:element> • Data-dependent structures • “The element quantity must be an integer if the unit element is ‘items’, and it must be a decimal value if the unit element is ‘meters’”. Suppose we have already defined the data type for the unit element to only contain the values “meters” or “items”. <xsd:element name=“quantity”><xsd:alt cond=“../unit=‘items’” type=“xsd:integer”/><xsd:alt cond=“../unit=‘meters’” type=“xsd:decimal”/> </xsd:element> Next: Addressing co-constraints: XHTML 12/20
Addressing co-constraints: XHTML • Deep exclusion of a elements within other a elements • “a elements cannot contain other a elements” • Suppose we have defined an inlineType to contain all inline elements that can go inside an a element, as well as inside other elements such as b, i, etc. <xsd:element name=“a”> <xsd:alt cond=“.//a” type=“xsd:error”/> <xsd:alt type=“inlineType”/></xsd:element> Next: Addressing co-constraints: XSLT 13/20
Addressing co-constraints: XSLT • Minimal presence • “In a template element at least one of the match and name attribute must be present” • Suppose we have already defined a templateType type with the match and name attributes both set as optional <xsd:element name=“template”><xsd:alt cond=“@match or @name” type=“templateType”/><xsd:alt type=“xsd:error”/> </xsd:element> Next: Addressing co-constraints: XML Schema 14/20
Addressing co-constraints: XML Schema • Complex mutual exclusions • “An element definition must either contain a ref or a name attribute, but not both. Furthermore, if the name attribute is present, then either the type attribute or one of the simpleType or complexType elements must be present.” • Suppose we have already defined an elementType with a choice of simpleType and complexType, and the type, ref and name attributes as optional. <xsd:element name=“element”><xsd:alt cond=“@name and @ref” priority=“2.0” type=“xsd:error”/><xsd:alt cond=“(@type or @ref) and (simpleType or complexType)” priority=“1.5” type=“xsd:error”/><xsd:alt cond=“../schema and @ref” priority=“1.0” type=“xsd:error”/><xsd:alt cond=“not(@name) and not(@ref)” priority=“0.5” type=“xsd:error”/><xsd:alt priority=“0.0” type=“element”/> </xsd:element> • The conditions could be simpler by using different complex types Next: Implementation: an XSD preprocessor 15/20
X SPrules rules XSDrules X’ downstream application DOM parser XSDpreprocessor ok Nonwell-formed rules Schemavalidator invalid Implementation: an XSD preprocessor • SchemaPath validators can be implemented: • From scratch (but they have a complexity in the order of a XML Schema validator) • Modifying an existing XML Schema validator (breaking the evolution path of the selected validator) • As an XSD preprocessor (i.e. an independent application feeding a plain XML Schema validator) • It can be proved that SP validates X iff XSD validates X’ Next: Our XSLT-based process 16/20
X SPrules rules XSDrules X’ XSLT T’’ XSLT MT XSLT T’ rules Our XSLT-based process • Our test preprocessor is implemented simply with two (rather convoluted) XSLT stylesheets and about 20 lines of real code. • The whole process uses a stylesheet T’ to create an XSD schema out of the SchemaPath, and meta-stylesheet MT to generate a stylesheet T’’ to transform the XML document X. The whole schema looks as follows: Next: An example of the final schema and XML doc 17/20
This used to be the XPath“../unit=‘items’” This used to be the XPath“../unit=‘meters” An example of the final schema and XML doc • <xsd:choice> <xsd:element name="wrquantity0.2E.2E.2Funit.3D.27items.27"> <xsd:complexType><xsd:sequence> <xsd:element name="quantity" type="xsd:integer"/> </xsd:sequence></xsd:complexType> </xsd:element> <xsd:element name="wrquantity0.2E.2E.2Funit.3D.27meters.27"> <xsd:complexType><xsd:sequence> <xsd:element name="quantity" type="xsd:decimal"/> </xsd:sequence></xsd:complexType> </xsd:element></xsd:choice> • <invoiceLine> <unit>meters</unit> <wrquantity0.2E.2E.2Funit.3D.27meters.27> <quantity>2.5</quantity> </wrquantity0.2E.2E.2Funit.3D.27meters.27></invoiceLine> Next: Conclusions 18/20
Conclusions • Support for co-constraints is heavily needed in many situations. • Many schemas and DTDs contain plain language specifications of co-constraints • Some document specifications even lament the lack of support for co-constraints in the schema language • The solution is to extend a schema language • One grammar, one validation, one schema document • The implementation as a pre-processor is a great aid. • Conditional type assignments are much cleaner than conditional types • The PSVI does not change • Good validity properties are preserved • Much simpler to implement Next: Thanks! 19/20
Thanks! Visit us at http://genesispc.cs.unibo.it:3333/schemapath.asp or http://tesi.fabio.web.cs.unibo.it/schemapath/