640 likes | 814 Views
ITR3 lecture 3: Namespaces, XML Schema & XSL. Thomas Krichel 2002-09-10. Gee…. Birdseye view only, have a look at what these things do. If there is interest, I can teach some more in a separate course. Structure Some XML related standards Namespaces XML Schema XSL. Literature.
E N D
ITR3 lecture 3: Namespaces, XML Schema & XSL Thomas Krichel 2002-09-10
Gee…. • Birdseye view only, have a look at what these things do. • If there is interest, I can teach some more in a separate course. • Structure • Some XML related standards • Namespaces • XML Schema • XSL
Literature • Castro, Elizabeth (2001). XML for the World Wide Web: Visual QuickStart Guide. Peachpit Press. • Duckett, Jon et al. (2001). Professional XML Schemas. Wrox Press (recommended) • Kay, Michael (2001). XSLT (2nd ed.). Wrox Press.
XHTML • This is HTML redefined so that it becomes well-formed XML • Examples • Case-sensitive elements • <p> replaced by <p/> • Verdict: pain without gain
Resource Description Framework (RDF) • A standard issued by the W3C. A framework to encode meaning to make it computer processable. • Uses the approach of a directed graph. • Generalizes an object / property / value approach • Value may be another object. • Objects are URI identified by a URI. • Properties may be identified with a URI • A paper on RDF available at http://openlib.org/home/krichel/papers/anhalter.letter.pdf • RDF XML syntax is defined but currently being reworked. • Verdict: very costly to implement.
Cascading style sheets (CSS) • a non-XML way of writing stylesheets that can be applied to both XML and HTML. Widely supported by browsers. • Written as a sequence of rules. Example compositionyear, recordingyear { color: red; font-family: sans-serif } • Verdict: not flexible
XPath and XPointer • are non-XML syntaxes referring to parts of an XML document, specific • Ranges • points • sets of XML document. • There are used in other XML related standards, in particular, in XSL will be covered as part of XSL. • Verdict: useful
XLinks • is an XML syntax to link XML documents. • They go way beyond the conventional linking capabilities of HTML, but there is no obvious way for the browser to represent them. • Verdict: nonsense
Document Object Model DOM • “a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. The Document Object Model provides a standard set of objects for representing HTML and XML documents, a standard model of how these objects can be combined, and a standard interface for accessing and manipulating them.” • Now at ''Level 3''. • Works by building a tree out of a document. • Verdict: exxxtremly complicated
Simple API for XML (SAX) • SAX is an event-based paring model. It reports parsing events (such as the start and end of elements) directly to the application through callbacks • Does not usually build an internal tree. • A lot less resource-intensive, • when the document is large • when the task is simple. • Verdict: thumbs up!
XML Information Sets • best understood through an example. Consider two XML snippets. • Snippet 1 <person sex="female"> Margarete Krichel</person> • Snippet 2 <person sex='female'>Margarete Krichel </person> • Are they the same?
XML Namespaces • Allow to make XML element names and attribute name globally unique by associating them with a particular URI, usually a URL. • The globally unique name is called the qualified name or qname, for short. • The name without the namespace URI called the local name. • This is done through a namespaces declaration, and a prefix. The namespace declaration associates a short string, called a prefix with the namespace. • The qualified name can then be written as prefix:localname
Namespace syntax • <element xmlns[:prefix]=URI> … </element> • element is the element name • prefix is the prefix • URI is a URI, often a URL, actually. • [ ] indicate that it is optional. If the prefix is missing it means that all elements that have no namespace prefix belong, by default to the declared namespace. • Namespace declaration remains local to the children of element.
Avoiding cerebral indigestion related tonamespaces • Expect nothing if you retrieve the namespace URI, when it is a URL. • Prefixes can be any short string. Some prefixes are customary, like xsi for http://www.w3.org/2001/XMLSchema-instance • Default attributes only apply to elements not attributes. Attributes belong to the namespace of their elements, unless it has an explicit prefix.
XML Schemas http://www.w3.org/TR/xmlschema-0/ (Primer) http://www.w3.org/TR/xmlschema-1/ (Structures) http://www.w3.org/TR/xmlschema-2/ (Datatypes)
What is XML Schema? • XML Schema is vocabulary for expressing constraints for the validity of an XML document. • A piece of XML is valid if it satisfies the constraints expressed in another XML file, the schema file. • The idea is to check if the XML file is fit for a certain purpose.
Example <location> <latitude>32.904237</latitude> <latitude>73.620290</longitude> <uncertainty units="meters">2</uncertainty> </location> To be valid, this XML snippet must meet all the following constraints: 1. The location must be comprised of a latitude, followed by a longitude, followed by an indication of the uncertainty of the lat/lon measurements. 2. The latitude must be a decimal with a value between -90 to +90 3. The longitude must be a decimal with a value between -180 to +180 4. For both latitude and longitude the number of digits to the right of the decimal point must be exactly six digits. 5. The value of uncertainty must be a non-negative integer 6. The uncertainty units must be either meters or feet.
Validating your data XML instance <location> <latitude>32.904237</latitude> <longitude>73.620290</longitude> <uncertainty units="meters">2</uncertainty> </location> XML Schema validator Data is ok! -check that the latitude is between -90 and +90 -check that the longitude is between -180 and +180 - check that the fraction digits is 6 … Etc.. software XML Schema file
History of Schema • Once upon a time, there was SGML • SGML has a “schema” language called a DTD. • It is crap • Different syntax then SGML • Main focus on presence and absence of elements • Very limited capabilties to check contents of elements (datatypes)
XML Schemas can constrain • the structure of instance documents • "this element contains these elements, which contains these other elements“, etc • the datatype of each element/attribute • "this element shall hold an integer with the range 0 to 12,000"
Highlights of XML Schemas • 44 built-in datatypes • Can create your own datatypes by extending or restricting existing datatypes • Written in the same syntax as instance documents • Can express sets, i.e., can define the child elements to occur in any order • Can specify element content as being unique (keys on content) and uniqueness within a region • Can define multiple elements with the same name but different content • Can define elements with nil content • Can define substitutable elements
important schema concepts • simple types: types that can not have child elements • elements that only have text contents and no attributes • attributes • complex type: type of anything that can have child attributes
important schema concepts • global declarations are direct children of the root schema element. They are visible everywhere. • all local declarations are local and are limited in scope to the element that they appear within
important schema concepts • Value space. The range of values that the type can take • Lexical space. The range litterals that represent the value • Set of facets. The defining properties of a type. • Fundamental facets include equality, order, bounds, cardinality, numeric/non-numeric • Constraining facets include ranges for numbers, string lengths, or a regular expressions
Namespaces • XML Schema file mixes vocabulary from the XML Schema language with own vocabulary to be created. • Has to keep both separate using namespaces. • Namespaces associate a URI with names.
http://www.w3.org/2001/XMLSchema http://www.books.org (targetNamespace) complexType element BookStore Author sequence Book schema Title boolean string ISBN Publisher Date integer This is the vocabulary that XML Schemas provide to define your new vocabulary This is the vocabulary for our book store xml description.
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:schema> (explanations on succeeding pages) BookStore.xsd (see example01) xsd = Xml-Schema Definition
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:schema>
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:schema> All XML Schemas have "schema" as the root element.
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:schema> The elements and datatypes that are used to construct schemas - schema - element - complexType - sequence - string come from the http://…/XMLSchema namespace
XMLSchema Namespace http://www.w3.org/2001/XMLSchema complexType element sequence schema boolean string integer
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:schema> Says that the elements defined by this schema - BookStore - Book - Title - Author - Date - ISBN - Publisher are to go in this namespace
Book Namespace (targetNamespace) http://www.books.org (targetNamespace) BookStore Author Book Title ISBN Publisher Date
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:schema> The default namespace is http://www.books.org which is the targetNamespace! This is referencing a Book element declaration. The Book in what namespace?
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:schema> This is a directive to any instance documents which conform to this schema: Any elements that are defined in this schema must be namespace-qualified when used in instance documents.
Referencing a schema in an XML instance document <?xml version="1.0"?> <BookStore xmlns ="http://www.books.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.books.org BookStore.xsd"> <Book> <Title>My Life and Times</Title> <Author>Paul McCartney</Author> <Date>July, 1998</Date> <ISBN>94303-12021-43892</ISBN> <Publisher>McMillin Publishing</Publisher> </Book> ... </BookStore> 1 3 2 1. First, using a default namespace declaration, tell the schema-validator that all of the elements used in this instance document come from the http://www.books.org namespace. 2. Second, with schemaLocation tell the schema-validator that the http://www.books.org namespace is defined by BookStore.xsd (i.e., schemaLocation contains apair of values). 3. Third, tell the schema-validator that the schemaLocation attribute we are using is the one in the XML Schema-instance namespace.
XMLSchema-instance Namespace http://www.w3.org/2001/XMLSchema-instance schemaLocation type noNamespaceSchemaLocation nil
Referencing a schema in an XML instance document targetNamespace="http://www.books.org" schemaLocation="http://www.books.org BookStore.xsd" BookStore.xsd BookStore.xml - uses elements from namespace http://www.books.org - defines elements in namespace http://www.books.org A schema defines a new vocabulary. Instance documents use that new vocabulary.
Note multiple levels of checking BookStore.xml BookStore.xsd XMLSchema.xsd (schema-for-schemas) Validate that the xml document conforms to the rules described in BookStore.xsd Validate that BookStore.xsd is a valid schema document, i.e., it conforms to the rules described in the schema-for-schemas
XSL transforms XML • XSL may be used to generate either HTML, XML, or text XSL XSL Processor XML HTML (or XML or text)
Doing it using Internet Explorer • First, download the latest version of Internet Explorer (at this time it is 6.0) • Write an XSL stylesheet stylish.xsl • Write an XML file, and refer to the xsl stylesheet with a processing instruction <?xml-stylesheet type="text/xsl“ href="stylish.xsl"?> Note: this does not work with other browsers!
XML tree • XSL has a model of XML as a tree. • XSL tree model is similar to the DOM model. • As the processor does its job it looks at elements of the input tree and transforms them to the output tree. • The processor only writes the file to the tree at the end. • End points in the tree are called “nodes”.
in the general section • we examine how XSL looks at an XML document. In fact it builds a tree. • and then we look at a very simple way to look at what the stylesheet does. After that we have Roger showing us the details.
Seven types of nodes • root node: contains all the elements in the document. Not to be confused with the document element of XML. • element node: contains an element • text node: contain an as-large-as-possible area of text. • attribute node: contains attribute name and value • comment node: contains a comment • processing instruction (p-i) node • namespace node: each element node has one namespace node for every namespace declaration
properties of nodes: name • This is empty for the root, text and comment nodes. • for elments and attribute node, it is the name as it appears in the xml file, expanded by namespace declarations. • for p-i nodes, it is the target • for a namespace node, it is the prefix
properties of nodes: string value • for text nodes: the text • for comment nodes: the text of the comment • for p-i nodes: the data part of the p-i. • for an attribute node: the value of the attribute • for a root node: the concatenation of all the string values of all element and text children. • for a namespace node: the URI of the namespace
properties of nodes: base URI • for all nodes: the URI of the XML source document where the node has been found • Only of interest for elements and p-i nodes • for the root node: the URI of the document • for attribute, text and comment nodes: the base URI of its parent node
properties of nodes: children • for element nodes: all the element nodes, text nodes, p-i nodes and comment nodes between its start and end tags. • for root nodes: all the element nodes, text nodes, p-i nodes and comment nodes that are not children of some other node.
parent node • for all nodes except root nodes: the parent of the node. • attribute nodes and namespace nodes have an element node as parent node, but are not considered to be its child.