620 likes | 982 Views
Week5 – Schema. Why Schema? Schemas vs. DTDs Introduction – W3C vs. Microsoft XDR Schema, How To? Element Types – Simple vs. Complex Attributes Restrictions/Facets Data Types An Example – How to design a schema from scratch?. Why Schema? Schema vs. DTD. DTDs
E N D
Week5 – Schema • Why Schema? Schemas vs. DTDs • Introduction – W3C vs. Microsoft XDR • Schema, How To? • Element Types – Simple vs. Complex • Attributes • Restrictions/Facets • Data Types • An Example – How to design a schema from scratch?
Why Schema? Schema vs. DTD • DTDs • Not flexible enough to meet certain programming needs • Cannot be manipulated (searched, transformed, etc.) • Not XML documents (EBNF) • Schemas • Alternative to DTDs • XML documents (using XML syntax) • Two major models • W3C XML Schema • Early development stage (at time of this writing) • Microsoft XML Data-Reduced (XDR)
Why Schema? Schema vs. DTD (cont.) <!DOCTYPE quantity [ !ELEMENT quantity (#PCDATA)]> • DTDs • Cannot ensure proper element content <quantity>hello</quantity>is valid • Use EBNF grammar <ElementType name=“quantity” content=“textOnly” model=“closed” dt:type=“int”> • Schema (in XDR format) • Can ensure proper element content – different data types supported <quantity>hello</quantity> is invalid • Use XML syntax
Why Schema? • There are a number of reasons why XML Schema is better than DTD. • One of the greatest strengths of XML Schemas is the support for data types. • With the support for data types: • It is easier to describepermissible document content • It is easier to validate the correctness of data • It is easier to work with data from a database • It is easier to define data facets (restrictions on data) • It is easier to define data patterns (data formats) • It is easier to convert data between different data types
Why Schema? (cont.) • XML Schemas use XML Syntax • Another great strength about XML Schemas is that they are written in XML. • Because XML Schemas are written in XML: • You don't have to learn another language • You can use your XML editor to edit your Schema files • You can use your XML parser to parse your Schema files • You can manipulate your Schema with the XML DOM (API) • You can transform your Schema with XSLT • XML Schemas Secure Data Communication • Data format agreement from both senders and receivers: the same "expectations" about the content. • With XML Schemas, the sender can describe the data in a way that the receiver will understand. • Data Confusion: A date like this: "03-11-2004" will, in some countries, be interpreted as 3. November and in other countries as 11. March, but an XML element with a data type like this: • <date type="date">2004-03-11</date> • ensures a mutual understanding of the content because the XML data type date requires the format YYYY-MM-DD.
Why Schema? (cont.) • XML Schemas are Extensible • XML Schemas are extensible, just like XML, because they are written in XML. • With an extensible Schema definition you can: (manipulation) • Reuse your Schema in other Schemas • Create your own data types derived from standard types • Reference multiple schemas from the same document • Well-Formed is not Enough • A well-formed XML document is a document that conforms to the XML syntax rules: • must begin with the XML declaration • must have one unique root element • all start tags must match end-tags • XML tags are case sensitive • all elements must be closed • all elements must be properly nested • all attribute values must be quoted • XML entities must be used for special characters • Even if documents are Well-Formed they can still contain errors, and those errors can have serious consequences.
Introduction • XML Schema is an XML-based doc., an alternation to DTD. • An XML schema describes and defines the structure of an XML document. • Schema is used to validate XML document • W3C: The XML Schema language is also referred to as XML Schema Definition (XSD). • What You Should Already Know • Before you study the XML Schema Language, you should have a basic understanding of XML and XML Namespaces. It will also help to have some basic understanding of DTD.
Introduction (cont.) • What is an XML Schema? • The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD. • An XML Schema: • defines elements that can appear in a document • defines attributes that can appear in a document • defines which elements are child elements • defines the order of child elements • defines the number of child elements • defines whether an element is empty or can include text • defines data types for elements and attributes • defines default and fixed values for elements and attributes
Introduction (cont.) • XML Schemas are the Successors of DTDs • XML Schemas will be used in most Web applications as a replacement for DTDs. Here are some reasons: • XML Schemas are extensible to future additions • XML Schemas are richer and more useful than DTDs • XML Schemas are written in XML • XML Schemas support data types • XML Schemas support namespaces • XML Schema is a W3C Recommendation • XML Schema was originally proposed by Microsoft, but became an official W3C recommendation in May 2001, M keeps an XML-Data Reduced (XDR) Schema (as we saw in the textbook) • The specification is now stable and has been reviewed by the W3C Membership. • For a full overview of W3C Activities and Status: http://www.w3schools.com/w3c/default.asp
Schema – How to? • XML documents can have a reference to a DTD or an XML Schema • An Example presented in XML, DTD, and Schema
Schema – How to? (cont.) An XML Schema – W3c XML Schema
Schema – How to? (cont.) XML Schema – Microsoft XDR Schema XML + DTD XML + Schema (W3c) XML + Schema (Mic)
Schema – How to? (cont.) • Referencing a Schema in an XML Document – W3C • xmlns: specifies the default namespace declaration. • This declaration tells the schema-validator that all the elements used in this XML document are declared in the "http://www.w3schools.com" namespace. • xmlns:xsi: specifiesthe XML Schema Instance namespace • xsi:schemaLocation: specifies where the XML schema file is
Schema – How to? (cont.) • Referencing a Schema in an XML Document – Microsoft XDR • xmlns: specifies where the XML schema is
Elements • root element: <schema> • W (as in W3c XML Schema): • M (as in Microsoft XDR Schema): • Attribute: xmlns (XML namespace)
W3C or Microsoft XDR? • W3c is a standard version of XML schema recommendation, We’ll use it through the class. http://www.w3.org • Microsoft XDR spec. can be found through the web site: http://msdn.microsoft.com/library/ en-us/xmlsdk/html/xmconxdr_whatis.asp
Elements (cont.) • XML Schemas define the elements of your XML files • What are Global Elements? • Global elements are elements that are immediate children of the "schema" element! • Local elements are elements nested within other elements • Element Types: • Simple Types • Contain only text with data types • Complex Types • An XML element that contains other elements and/or attributes. • There are four kinds of complex elements: • Empty elements • Elements that contain only other elements • Elements that contain only text • Elements that contain both other elements and text
Elements – Simple Elements • A simple element is an XML element that can contain only text. It cannot contain any other elements or attribute • The text can be of many different types. It can be one of the types that are included in the XML Schema definition (boolean, string, date, etc.), • or it can be a custom type that you can define yourself. • Common XML Schema Data Types: XML Schema has a lot of built-in data types. Here is a list of the most common types: • xs:string • xs:decimal • xs:integer • xs:boolean • xs:date • xs:time
Elements – Simple Elements (cont.) • Declare Default and Fixed Values for Simple Elements • Simple elements can have a default value OR a fixed value set. • A default value is automatically assigned to the element when no other value is specified. In the following example the default value is "red": • A fixed value is also automatically assigned to the element. You cannot specify another value. In the following example the fixed value is "red":
Attributes • All attributes are declared as simple types. • Only complex elements can have attributes • What is an Attribute? • Simple elements cannot have attributes. • If an element has attributes, it is considered to be of complex type. • The attribute itself is always declared as a simple type. • This means that an element with attributes always has a complex type definition. • Define an Attribute
Attributes (cont.) • XML Schema has a lot of built-in data types (same as in Common XML Schema Data Types, slides #19). • A simple attribute definition
Attributes (cont.) • Declare Default and Fixed Values for Attributes • Attributes can have a default value OR a fixed value specified. • A default value is automatically assigned to the attribute when no other value is specified. In the following example the default value is "EN": • A fixed value is also automatically assigned to the attribute. You cannot specify another value. In the following example the fixed value is "EN"
Attributes (cont.) • Creating Optional and Required Attributes • All attributes are optional by default. To explicitly specify that the attribute is optional, use the "use" attribute • To make an attribute required
Restrictions on Content • When an XML element or attribute has a type defined, it puts a restriction on the element's or attribute's content. • If an XML element is of type "xs:date" and contains a string like "Hello Mother", the element will not validate. • With XML Schemas, you can also add your own restrictions to your XML elements and attributes. These restrictions are called facets. • Restrictions/Facets • Restrictions are used to control acceptable values for XML elements or attributes. • Restrictions on XML elements are called facets
Restrictions on Content (cont.) • Restrictions on Values • This example defines an element called "age" with a restriction. The value of age cannot be lower than 0 or greater than 100
Restrictions on Content (cont.) • Restrictions on a Set of Values • To limit the content of an XML element to a set of acceptable values, we would use the enumeration constraint. • This example defines an element called "car“ • The "car" element is a simple type with a restriction. The acceptable values are: Audi, Golf, BMW.
Restrictions on Content (cont.) • Restrictions on a Series of Values • To limit the content of an XML element to define a series of numbers or letters that can be used, we would use the pattern constraint. • The examples defines two elements called “letter”, “word”: • The "letter" element is a simple type with a restriction. The only acceptable value is ONE of the LOWERCASE letters from a to z • The “word”element is a simple type with a restriction. The only acceptable value is Two of the UPPERCASE letters from A to Z
Restrictions on Content (cont.) • Several other examples: • [a-zA-Z] :The only acceptable value is ONE of the LOWERCASE OR UPPERCASE letters • [xyz]: The only acceptable value is ONE of the following letters: x, y, OR z • [0-9][0-9]: The only acceptable value is TWO digits in a sequence, and each digit must be in a range from 0 to 9 • ([a-z])*: The acceptable value is zero or more occurrences of lowercase letters from a to z (applied +, *, and pipe character: |) • [a-zA-Z0-9]{8} :There must be exactly eight characters in a row and those characters must be lowercase or uppercase letters, or a number
Restrictions on Content (cont.) • Restrictions on White Space Characters • To specify how white space characters should be handled, use the whiteSpace constraint. • This example defines an element called "address" • The whiteSpace constraint is set to "preserve", the XML processor WILL NOT remove any white space characters • Alternatively, the whiteSpace constraint can set to “replace", the XML processor WILL REPLACE all white space characters (line feeds, tabs, spaces, and carriage returns) with spaces • The whiteSpace constraint can set to "collapse", which means that the XML processor WILL REMOVE all white space characters
Restrictions on Content (cont.) • Restrictions on Length • To limit the length of a value in an element, we would use the length, maxLength, and minLength constraints.
Elements – Complex (cont.) • What is a Complex Element? • A complex element is an XML element that contains other elements and/or attributes. • There are four kinds of complex elements: • Empty elements • Elements that contain only other elements • Elements that contain only text • Elements that contain both other elements and text • Note: Each of these elements may contain attributes as well!
Elements – Complex (cont.) • 4 types of complex XML elements,
Elements – Complex (cont.) • Define a complex element
Elements – Complex (cont.) • Type I: Define Complex Types for Empty Elements • An empty complex element can contain attributes; but it cannot have any content between the opening and closing tags.
Elements – Complex (cont.) • Type II: Define Complex Types with Elements Only • An "elements only" complex type contains an element that contains only other elements
Elements – Complex (cont.) • Type III: Define Complex Text-Only Elements • A complex text element can contain both attributes and text • This type contains only simple content (text and attributes), • Add a simpleContent element around the content. • Must define an extension OR a restriction within the simpleContent element,
Elements – Complex (cont.) • Examples
Elements – Complex (cont.) • Type IV: Define Complex Types with Mixed Content • A mixed complex type element can contain attributes, elements, and text
Elements – Complex (cont.) • Example: Mixed content + reusable element definition
Elements – Complex (cont.) • Complex Types Indicators • Control HOW elements are to be used in documents with indicators • 7 types: • Order indicators: Define how elements should occur • All • Choice • Sequence • Occurrence indicators: Define how often an element can occur • maxOccurs • minOccurs • Group indicators: Define related sets of elements, attributes • Element Group name • Attribute Group name
Elements – Complex (cont.) Order Indicators Order indicators are used to define how elements should occur.
Elements – Complex (cont.) Occurrence Indicators Occurrence indicators are used to define how often an element can occur Note:For all "Order" and "Group" indicators (any, all, choice, sequence, group name, and group reference) the default value for maxOccurs and minOccurs is 1
Elements – Complex (cont.) • A working example (which type of complex element?) XML
Elements – Complex (cont.) Group Indicators Group indicators are used to define related sets of elements or attributes
Elements – Complex (cont.) • The example of defining an element group and using it in an element definition
Elements – Complex (cont.) • The example of defining an attribute group and using it in an element definition
Elements – The <any> Element (cont.) • The <any> element enables us to extend the XML document with elements not specified by the schema • A declaration for an element is presented. • By using the <any> element we can extend (after general elements) the content of "person" with any element: