610 likes | 748 Views
XML, DTD, XML Schema, and XSLT. Jianguo Lu University of Windsor. Where we are. XML DTD XML Schema XML Namespace XPath DOM Tree XSLT. Name Conflict. <table> <tr> <td>Apples</td> <td>Bananas</td> </tr> </table>.
E N D
XML, DTD, XML Schema, and XSLT Jianguo Lu University of Windsor
Where we are • XML • DTD • XML Schema • XML Namespace • XPath • DOM Tree • XSLT
Name Conflict <table> <tr> <td>Apples</td> <td>Bananas</td> </tr></table> <table> <name>African Coffee Table </name> <width>80</width> <length>120</length></table> • Solution: add prefix to the tag names <h:table> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr></h:table> <f:table> <f:name>African Coffee Table </f:name> <f:width>80</f:width> <f:length>120</f:length></f:table>
Name spaces HTML name space Furniture name space table table td width html name tr price body th length height
XML namespace • An XML document may use more than one schema; • Since each structuring document was developed independently, name clashes may appear; • The solution is to use a different prefix for each schema • prefix:name <prod:product xmlns:prod=http://example.org/prod> <prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size> </prod:product>
Namespace names • Namespace names are URIs • Many namespace names are in the form of HTTP URI. • The purpose of a name space is not to point to a location where a resource resides. • It is intended to provide a unique name that can be associated with a particular organization. • The URI MAY point to a schema. <prod:product xmlns:prod=http://example.org/prod> <prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size> </prod:product>
Namespace declaration • A namespace is declared using an attribute starts with “xmlns”. • You can declare multiple namespaces in one instance. <ord:order xmlns:ord=“http://example.org/ord” xmlns:prod=“http://example.org/prod” > <ord:number> 123ABC123</ord:number> <prod:product> <prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size> </prod:product> </ord:order>
Default namespace declaration • Default namespace maps unprefixed element type name to a namespace. <order xmlns=“http://example.org/ord” xmlns:prod=“http://example.org/prod” > <number> 123ABC123 </number> <prod:product> <prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size> </prod:product> </order>
Scope of namespace declaration • Namespace declaration can appear in any start tag. • The scope is in the element where it is declared. <order xmlns=“http://example.org/ord”> <number> 123ABC123 </number> <prod:product xmlns:prod=“http://example.org/prod”> <prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size> </prod:product> </order>
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:schema> The elements and datatypes that are used to construct schemas - schema - element - complexType - sequence - string come from the http://…/XMLSchema namespace Indicates that the elements defined by this schema - BookStore - Book - Title - Author - Date - ISBN - Publisher are to go in the http://www.books.org namespace From Costello
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:schema> The default namespace is http://www.books.org which is the targetNamespace! This is referencing a Book element declaration. The Book in what namespace? Since there is no namespace qualifier it is referencing the Book element in the default namespace, which is the targetNamespace! Thus, this is a reference to the Book element declaration in this schema. From Costello
Import in XML Schema • Now with the understanding of namespace, we can introduce some more advanced features in XML Schema. • The import element allows you to access elements and types in a different namespace. Namespace B Namespace A B.xsd A.xsd <xsd:schema …> <xsd:import namespace="A" schemaLocation="A.xsd"/> <xsd:import namespace="B" schemaLocation="B.xsd"/> … </xsd:schema> C.xsd
Example Pentax.xsd Nikon.xsd Olympus.xsd Camera.xsd From Costello
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.nikon.com" xmlns="http://www.nikon.com" elementFormDefault="qualified"> <xsd:complexType name="body_type"> <xsd:sequence> <xsd:element name="description" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:schema> Nikon.xsd <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.olympus.com" xmlns="http://www.olympus.com" elementFormDefault="qualified"> <xsd:complexType name="lens_type"> <xsd:sequence> <xsd:element name="zoom" type="xsd:string"/> <xsd:element name="f-stop" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:schema> Olympus.xsd <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.pentax.com" xmlns="http://www.pentax.com" elementFormDefault="qualified"> <xsd:complexType name="manual_adapter_type"> <xsd:sequence> <xsd:element name="speed" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:schema> Pentax.xsd From Costello
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.camera.org" xmlns:nikon="http://www.nikon.com" xmlns:olympus="http://www.olympus.com" xmlns:pentax="http://www.pentax.com" elementFormDefault="qualified"> <xsd:import namespace="http://www.nikon.com" schemaLocation="Nikon.xsd"/> <xsd:import namespace="http://www.olympus.com" schemaLocation="Olympus.xsd"/> <xsd:import namespace="http://www.pentax.com" schemaLocation="Pentax.xsd"/> <xsd:element name="camera"> <xsd:complexType> <xsd:sequence> <xsd:element name="body" type="nikon:body_type"/> <xsd:element name="lens" type="olympus:lens_type"/> <xsd:element name="manual_adapter“ type="pentax:manual_adapter_type"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:schema> Here I am using the body_type that is defined in the Nikon namespace Camera.xsd From Costello
<?xml version="1.0"?> <c:camera xmlns:c="http://www.camera.org" xmlns:nikon="http://www.nikon.com" xmlns:olympus="http://www.olympus.com" xmlns:pentax=http://www.pentax.com … … <c:body> <nikon:description>Ergonomically designed casing for easy handling </nikon:description> </c:body> <c:lens> <olympus:zoom>300mm</olympus:zoom> <olympus:f-stop>1.2</olympus:f-stop> </c:lens> <c:manual_adapter> <pentax:speed>1/10,000 sec to 100 sec</pentax:speed> </c:manual_adapter> </c:camera> <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.olympus.com" xmlns="http://www.olympus.com" elementFormDefault="qualified"> <xsd:complexType name="lens_type"> <xsd:sequence> <xsd:element name="zoom" type="xsd:string"/> <xsd:element name="f-stop" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:schema> <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.nikon.com" xmlns="http://www.nikon.com" elementFormDefault="qualified"> <xsd:complexType name="body_type"> <xsd:sequence> <xsd:element name="description" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:schema> Camera.xml From Costello
Include • The include element allows you to access components in other schemas • All the schemas you include must have the same namespace as your schema (i.e., the schema that is doing the include) • The net effect of include is as though you had typed all the definitions directly into the containing schema LibraryEmployee.xsd LibraryBook.xsd <xsd:schema …> <xsd:include schemaLocation="LibraryBook.xsd"/> <xsd:include schemaLocation="LibraryEmployee.xsd"/> … </xsd:schema> Library.xsd From Costello
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.library.org" xmlns="http://www.library.org" elementFormDefault="qualified"> <xsd:include schemaLocation="LibraryBook.xsd"/> <xsd:include schemaLocation="LibraryEmployee.xsd"/> <xsd:element name="Library"> <xsd:complexType> <xsd:sequence> <xsd:element name="Books"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Employees"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Employee" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> These are referencing element declarations in other schemas. From Costello Library.xsd
XML Path • XML • DTD • XML Schema • XML Namespace • XPath • DOM Tree • XSLT
XPath • Language for addressing parts of an XML document. • It operates on the tree data model of XML • XPath is a syntax for defining parts of an XML document • XPath uses paths to define XML elements • It has a non-XML syntax • XPath defines a library of standard functions • Such as arithmetic expressions. • XPath is a major element in XSLT and XML query languages • XPath is a W3C Standard
What is XPath • Like traditional file paths • XPath uses path expressions to identify nodes in an XML document. These path expressions look very much like the expressions you see when you work with a computer file system: • public_html/569/xml.ppt • Books/book/author/name/FirstName • Absolute path • /library/author/book • Relative path • author/book
XML path example • /library • /library/author • //author • /library/@location • //book[@title=“Artificial Intelligence”] <library location="Bremen"> <author name="Henry Wise"> <book title="Artificial Intelligence"/> <book title="Modern Web Services"/> <book title="Theory of Computation"/> </author> <author name="William Smart"> <book title="Artificial Intelligence"/> </author> <author name="Cynthia Singleton"> <book title="The Semantic Web"/> <book title="Browser Technology Revised"/> </author> </library>
XML Path Example • Address all author elements • /library/author • Addresses all author elements that are children of the library element node, which resides immediately below the root • /t1/.../tn, where each ti+1 is a child node of ti, is a path through the tree representation • Address all author elements • //author • Here // says that we should consider all elements in the document and check whether they are of type author • This path expression addresses all author elements anywhere in the document
XPath example • Select the location attribute nodes within library element nodes • /library/@location • The symbol @ is used to denote attribute nodes • Select all title attribute nodes within book elements anywhere in the document, which have the value “Artificial Intelligence” • //book/@title="Artificial Intelligence“ • Select all books with title “Artificial Intelligence” • /library/author/book[@title="Artificial Intelligence"] • Test within square brackets: a filter expression • It restricts the set of addressed nodes. • Difference with previous query. • This query addresses book elements, the title of which satisfies a certain condition. • Previous query collects title attribute nodes of book elements
XPath syntax • A path expression consists of a series of steps, separated by slashes • A step consists of • An axis specifier, • A node test, and • An optional predicate • An axis specifier determines the tree relationship between the nodes to be addressed and the context node • E.g. parent, ancestor, child (the default), sibling, attribute node • // is such an axis specifier: descendant or self • child::book select all book elements that are children of current node • A node test specifies which nodes to address • The most common node tests are element names • /library/author • E.g., * addresses all element nodes • /library/* • comment()selects all comment nodes • /library/commnets()
XPath syntax • Predicates (or filter expressions) are optional and are used to refine the set of addressed nodes • E.g., the expression [1] selects the first node • [position()=last()] selects the last node • [position() mod 2 =0] selects the even nodes • XPath has a more complicated full syntax. • We have only presented the abbreviated syntax
More examples • Address the first author element node in the XML document • //author[1] • Address the last book element within the first author element node in the document • //author[1]/book[last()] • Address all book element nodes without a title attribute • //book[not @title]
Where we are • XML • DTD • XML Schema • XML Namespace • XPath • DOM Tree • XSLT
How to process XML • XML does not DO anything • Process XML using general purpose languages • Java, Perl, C++ … • DOM is the basis • Process XML using special purpose languages • “translate the stock XML file to an HTML table.” • Transform the XML: XSLT • “tell me the stocks that are higher that 100.” • Query XML: XQuery
DOM (Document Object Model) • What: DOM is application programming interface (API) for processing XML documents • http://www.w3c.org/DOM/ • Why: • unique interface. • Platform and language independence. • How: It defines the logical structure of documents and the way to access and manipulate it • With the Document Object Model, one can • Create an object tree • Navigate its structure • Access, add, modify, or delete elements etc
XML tree hierarchy • XML can be described by a tree hierarchy Document Unit Document Sub-unit Parent Unit Child Sub-unit Sibling
Parent Prev. Sibling Next Sibling First Child Node Last Child DOM tree model • Generic tree model • Node • Type, name, value • Attributes • Parent node • Previous, next sibling nodes • First, last child nodes • Many other entities extends node • Document • Element • Attribute • ... ...
DOM class hierarchy DocumentFragment Document Text CDATASection CharacterData Comment Attr Node Element DocumentType NodeList Notation NamedNodeMap Entity DocumentType EntityReference ProcessingInstruction
JavaDoc of DOM API http://xml.apache.org/xerces-j/apiDocs/index.html
Remarks on javadoc • javadoc is a command included in JDK; • It is a useful tool generate HTML description for your programs, so that you can use a browser to look at the description of the classes; • JavaDoc describes classes, their relationships, methods, attributes, and comments. • When you write java programs, the JavaDoc is the first place that you should look at: • For core java, there is JavaDoc to describe every class in the language; • To know how to use DOM, look at the javaDoc of org.w3c.dom package. • If you are a serious java programmer: • you should have the core jdk javaDoc ready on your hard disk; • You should generate the javaDoc for other people to look at. • To run javadoc, type D>javadoc *.java This is to generate JavaDoc for all the classes under current directory.
Methods in Node interface • Three categories of methods • Node characteristics • name, type, value • Contextual location and access to relatives • parents, siblings, children, ancestors, descendants • Node modification • Edit, delete, re-arrange child nodes
XML parser and DOM • When you parse an XML document with a DOM parser, you get back a tree structure that contains all of the elements of your document; • DOM also provides a variety of functions you can use to examine the contents and structure of the document. DOM XML parser Your XML application DOM API DOM Tree
Node TextNode DOM tree and DOM classes <stocks> <stock exchange=“nasdaq”> <stock Exchange=“nyse” > <name> <price> <name> <price> <symbol> amzn 15.45 IBM 105 Amazon inc Element child
Use Java to process XML • Tasks: • How to construct the DOM tree from an XML text file? • How to get the list of stock elements? • How to get the attribute value of the second stock element? • Construct the Document object: • Need to use an XML parser (XML4J); • Remember to import the necessary packages; • The benefits of DOM: the following lines are the only difference if you use another DOM XML parser.
Get the first stock element <?xml version="1.0" ?> <stocks> <stock exchange="nasdaq"> <name>amazon corp</name> <symbol>amzn</symbol> <price>16</price> </stock> <stock exchange="nyse"> <name>IBM inc</name> <price>102</price> </stock> </stocks>
Navigate to the next sibling of the first stock element <?xml version="1.0" ?> <stocks> <stock exchange="nasdaq"> <name>amazon corp</name> <symbol>amzn</symbol> <price>16</price> </stock> <stock exchange="nyse"> <name>IBM inc</name> <price>102</price> </stock> </stocks>
<stocks> <stock exchange=“nasdaq”> <stock Exchange=“nyse” > <name> <price> <name> <price> <symbol> amzn 16 IBM inc 102 Amazon inc Be aware the Text object in two elements <?xml version="1.0" ?> <stocks> <stock exchange="nasdaq"> <name>amazon corp</name> <symbol>amzn</symbol> <price>16</price> </stock> <stock exchange="nyse"> <name>IBM inc</name> <price>102</price> </stock> </stocks> Question: How many children does the stocks node have? text text text text text text text text text text
Remarks on XML parsers • There are several different ways to categorise parsers: • Validating versus non-validating parsers; • It takes a significant amount of effort for an XML parser to process a DTD and make sure that every element in an XML document follows the rules of the DTD; • If only want to find tags and extract information - use non-validating; • Validating or non-validating can be turned on or off in parsers. • Parsers that support the Document Object Model (DOM); • Parsers that support the Simple API for XML (SAX) ; • Parsers written in a particular language (Java, C++, Perl, etc.).
Where we are • XML • DTD • XML Schema • XML Namespace • XPath • DOM Tree • XSLT
History XSL (low-precision graphics, e.g.,HTML, text, XML) XQuery (high-precision graphics, e.g., PDF) XLink/ XPointer XSL XSLT XMLSchemas XPath
XSLT 1 XML XML XSLT 2 HTML XSLT 3 TEXT XSLT(XML Stylesheet Language Transformation) • XSLT Version 1.0 is a W3C Recommendation, 1999 • http://www.w3.org/Style/XSL/ • XSLT is used to transform XML to other formats
XSLT XSLT processor XML XSLT basics • XSLT is an XML document itself • It is a tree transformation language • It is a rule-based declarative language • XSLT program consists of a sequence of rules. • It is a functional programming language.
XSLT Example: transform to another XML • Rename the element names • Remove the attribute and the symbol element • Change the order between name and price. • Change the US dollar to CAD. <?xml version="1.0" ?> <stocks> <stock exchange="nasdaq"> <name>amazon corp </name> <symbol>amzn</symbol> <price>16</price> </stock> <stock exchange="nyse"> <name>IBM inc</name> <price>102</price> </stock> </stocks> <?xml version="1.0“> <companies> <company> <value>24 CAD </value> <name>amazon corp</name> </company> <company> <value>153 CAD </value> <name>IBM inc</name> </company> </companies> ? stock.xml output