270 likes | 283 Views
This set of guidelines discusses best practices for designing extensible XML schemas, focusing on creating extensible content models. It explores techniques such as type substitution and extending schemas without modifying them.
E N D
Creating Extensible Content Models XML Schemas: Best Practices A set of guidelines for designing XML Schemas Created by discussions on xml-dev
Definition • An element has an extensible content model if in instance documents that element can contain elements and data above and beyond what was specified by the schema.
Static, Fixed Content Model <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:year"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> Book is rigidly defined to contain five child elements - Title, Author, Date, ISBN, and Publisher. <Book> <Title>The First and Last Freedom</Title> <Author>J. Krishnamurti</Author> <Date>1954</Date> <ISBN>0-06-064831-7</ISBN> <Publisher>Harper & Row</Publisher> </Book> Instance document authors are restricted to just supplying title, author, date, ISBN, and publisher data for a book. Book's content model is static/fixed!
Static, Fixed Content Model • Sometimes it is desirable to explicitly specify an element's content model. • Sometimes, however, we want to give instance document authors more flexibility in what data they can provide for an element. • How do we design a schema such that Book's content model is extensible? • We will look at two methods for implementing extensible content models.
Extensibility via Type Substitution <xsd:complexType name="BookType"> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:year"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType> <xsd:element Book type="BookType"/> <xsd:complexType name="BookTypePlusReviewer"> <xsd:complexContent> <xsd:extensionbase="BookType" > <xsd:sequence> <xsd:element name="Reviewer" type="xsd:string"/> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType>
Extensibility via Type Substitution <Book> -- content -- </Book> Principle of Type Substitutability The content model of Book can be either BookType, or any type which derives from BookType, e.g., BookTypePlusReviewer
Extensibility via Type Substitution <Book xsi:type="BookTypePlusReviewer"> <Title>The First and Last Freedom</Title> <Author>J. Krishnamurti</Author> <Date>1954</Date> <ISBN>0-06-064831-7</ISBN> <Publisher>Harper & Row</Publisher> <Reviewer>Roger L. Costello</Reviewer> </Book> The type substitutability mechanism enables instance document authors to extend Book's content model by substituting its type with a derived type. Here we see that BookType has been substituted by the type: BookTypePlusReviewer. Thus, <Book> now contains a new element, <Reviewer>. Do Lab1
Extend a Schema, without Touching it! • In the last example BookTypePlusReviewer derived from BookType, and both types were in the same schema. • What if we need to extend BookType, but BookCatalogue.xsd is read-only? • In a separate schema, we can create a type which extends BookType. The instance document can do type substitution using the new type. Thus, we are able to extend a schema, without touching it!
Extend a Schema, without Touching it! xmlns=" http://www.publishing.org" xmlns=" http://www.publishing.org" <xsd:complexType name="BookType"> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:year"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType> <xsd:include schemaLocation="BookCatalogue.xsd"/> <xsd:complexType name="BookTypePlusReviewer"> <xsd:complexContent> <xsd:extensionbase="BookType" > <xsd:sequence> <xsd:element name="Reviewer" type="xsd:string"/> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element Book type="BookType"/> BookCatalogue.xsd MyTypeDefinitions.xsd
Extend a Schema, without Touching it! xmlns="http://www.publishing.org" xsi:schemaLocation="http://www.publishing.org MyTypeDefinitions.xsd" <Book xsi:type="BookTypePlusReviewer"> <Title>The First and Last Freedom</Title> <Author>J. Krishnamurti</Author> <Date>1954</Date> <ISBN>0-06-064831-7</ISBN> <Publisher>Harper & Row</Publisher> <Reviewer>Roger L. Costello</Reviewer> </Book> We have type-substituted Book's content with the type specified in the new schema. Thus, we have extended BookCatalogue.xsd without touching it!
Disadvantages of using Type Substitution for Extending an Element's Content Model • Location Restricted Extensibility: • The extensibility is restricted to appending elements onto the end of the content model (after the <Publisher> element). What if we wanted to extend <Book> by adding elements to the beginning (before <Title>), or in the middle, etc? We can't do it with this mechanism.
Disadvantages of using Type Substitution for Extending an Element's Content Model • Unexpected Extensibility: Simply looking at these components you would think that <Book> will always contain just Title, Author, Date, ISBN, and Publisher. It is easy to forget that someone could extend the content model using the type substitution mechanism. Extensibility is unexpected! Consequently, if you create a program to process Book's content you may forget to take into account that Book may contain different content. <xsd:complexType name="BookType"> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:year"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType> <xsd:element Book type="BookType"/>
Here's what we Desire for Extending Content Models • Location Independent Extensibility: • We would like to be able to extend Book's content at any location, not just at the end. For example, we might wish to add elements at the top (before Author), or in the middle (after Date), etc. • Explicit Indication of where Extensibility may Occur: • It would be nice if there was a way to explicitly flag places where extensibility may occur: "hey, instance documents may extend <Book> at this point, so be sure to write your code taking this possibility into account."
Extensibility via the <any> Element <xsd:element name="Book"> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:year"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> <xsd:any namespace="##any" minOccurs="0"/> </xsd:sequence> </xsd:element> "The content of Book is Title, Author, Date, ISBN, Publisher and then (optionally) any well-formed element. The new element may come from any namespace." Note: the <any> element may be inserted at any point, e.g, it could be inserted at the top, in the middle, etc.
Extensibility via the <any> Element xmlns=" http://www.publishing.org" xmlns=" http://www.MyRepository.org" <xsd:element name="Book"> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:year"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> <xsd:any namespace="##any" minOccurs="0"/> </xsd:sequence> </xsd:element> <xsd:element name="Reviewer"> <xsd:complexType> <xsd:sequence> <xsd:element name="Name"> <xsd:complexType> <xsd:sequence> <xsd:element name="First" type="xsd:string"/> <xsd:element name="Last" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> BookCatalogue.xsd MyRepository.xsd In an instance document I can insert this Reviewer element after Publisher.
xmlns="http://www.publishing.org" xmlns:rev="http://www.MyRepository.org" xsi:schemaLocation="http://www.publishing.org BookCatalogue.xsd http://www.MyRepository.org MyRepository.xsd" <Book> <Title>The First and Last Freedom</Title> <Author>J. Krishnamurti</Author> <Date>1954</Date> <ISBN>0-06-064831-7</ISBN> <Publisher>Harper & Row</Publisher> <rev:Reviewer> <rev:Name> <rev:First>Roger</rev:First> <rev:Last>Costello</rev:Last> </rev:Name> </rev:Reviewer> </Book> This instance document author has extended Book with an element that the schema designer may have never even envisioned. We have empowered the author to create instance documents which contains all the data he/she requires.
Alternate Schema for Book <xsd:complexType name="BookType"> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:year"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> <xsd:any namespace="##any" minOccurs="0"/> </xsd:sequence> </xsd:complexType> <xsd:element Book type="BookType"/> This is a better design than the previous version since now we have a nice reusable BookType component. However, we are back to the "unexpected extensibility" problem. Consequently, after the Publisher element there may be any well-formed XML element, and after that anything could be present (due to type substitutability).
Controlling Extensibility using the block Attribute • We can add a block attribute to the element declaration to prohibit type substitution: <xsd:complexType name="BookType"> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:year"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> <xsd:any namespace="##any" minOccurs="0"/> </xsd:sequence> </xsd:complexType> <xsd:element Book type="BookType" block="#all"/>
Control over where and how much Extensibility • We can put the <any> element specifically where we desire extensibility. • If we desire extensibility at multiple locations, we can insert multiple <any> elements. • With maxOccurs we can specify "how much" extensibility we will allow. <xsd:complexType name="BookType"> <xsd:sequence> <xsd:any namespace="##any" minOccurs="0" maxOccurs="2"/> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:year"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType> - We are restricting extensions to occur at the top of the content model. - We are restricting the amount of extensibility to two elements.
Recognizing our Limitations • The <any> element allows a schema designer to recognize that he/she is not able to anticipate all the varieties of data that an instance document author might need to use in creating an instance document: "I'm smart enough to know that I'm not smart enough to anticipate all possible needs!" Do Lab 2
Non-determinism and the <any> element <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:any namespace="##any" minOccurs="0" maxOccurs="2"/> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:year"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> Schema: <Book> <Title>My Life and Times</Title> ... </Book> Does this element correspond to the <any> element, or the Title element declaration? Impossible to determine without "looking ahead" to the next element. The Book element has a non-deterministic content model. Non-determinism content models are illegal. Instance:
Definition of Non-determinism • Defn: A non-deterministic content model is one where, upon encountering an element in an instance document, it is ambiguous which path was taken in the schema document.
Non-determinism and the <any> element <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:any namespace="##other" minOccurs="0" maxOccurs="2"/> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:year"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> Schema: <Book> <Title>My Life and Times</Title> ... </Book> Clearly this element must have come from the Title element declaration, because the <any> element requires new elements to come from a namespace other than the targetNamespace. Thus, this schema has a deterministic content model and is legal. Instance:
Non-determinism and the <any> element <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:any namespace="##other" minOccurs="0" maxOccurs="2"/> <xsd:element ref="bk:Title"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:year"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> Suppose that the Book element is comprised of components from a variety of namespaces. Schema: <Book> <bk:Title>My Life and Times</bk:Title> ... </Book> Does this Title element come from the <any> element, or from the Title element being ref'ed? There is no way of knowing, without looking ahead. Thus, this is non-deterministic, and illegal. Instance:
<any> --> Quite Restricted • As we have seen, the requirement that all elements have deterministic content models imposes serious restrictions on the use of the <any> element. • So, what do you do when you want to enable extensibility at arbitrary locations? • Answer: embed the <any> element within an <other> element.
Embed Additional Elements within an <other> Element <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element name="other" minOccurs="0"> <xsd:complexType> <xsd:sequence> <xsd:any namespace="##any"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:year"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> Now we are forcing any additional elements to be embedded within an optional <other> element. Schema: <Book> <Title>My Life and Times</Title> ... </Book> Instance: No more ambiguity about where this Title element comes from, i.e., no more non-determinism!
Commentary • Requiring instance document authors to nest additional element within an <other> element is poor, at best. • The whole reason that this design pattern was forced upon us is due to XML Schemas rule outlawing non-deterministic content models. • They did this to simplify implementations of Schema validators. • Write to the XML Schema Working Group requesting that they remove the rule outlawing non-deterministic content models.