500 likes | 645 Views
e X tensible M arkup L anguage. Jesús Ibáñez, Toni Navarrete, Josep Blat Universitat Pompeu Fabra. e X tensible M arkup L anguage. New Internet mark-up metalanguage Previously: SGML, HTML, DHTML’s Extensibility, structure and validation SGML adaptation for WWW.
E N D
eXtensible Markup Language Jesús Ibáñez, Toni Navarrete, Josep Blat Universitat Pompeu Fabra
eXtensible Markup Language • New Internet mark-up metalanguage • Previously: SGML, HTML, DHTML’s • Extensibility, structure and validation • SGML adaptation for WWW
eXtensible Markup Language • Defined as standard by W3C (Generic SGML Editorial Review Board - XML Working Group) • XML != HTML++ ; XML == SGML-- • XML, DTD (Document Type Definition) and XSL (eXtensible Style Language)
Main Characteristics • Describing semantically document content • Uncoupling semantic description from presentation • Allowing each user community to define its own labels, for instance: <PRICE>, <AUTHOR>, <SECTION>, <DATE>, <IMPORTANCE LEVEL="Expert">
XML Example (without DTD) <?XML version="1.0" standalone="yes"?> <conversation> <greeting>Hello world!</greeting> <answer>Stop it, I’m getting off!</answer> </conversation>
Example with DTD (1) <!DOCTYPE Book[ <!ELEMENT Book(Title, Author, Date, ISBN, Publisher) <!ELEMENT Title(#PCDATA)> <!ELEMENT Author(#PCDATA)> <!ELEMENT Date(#PCDATA)> <!ELEMENT ISBN(#PCDATA)> <!ELEMENT Publisher(#PCDATA)> ]>
Example with DTD (2) <?xml version="1.0"? standalone=“no”> <!DOCTYPE Book SYSTEM "file://localhost/xml-course/xsl/Book.dtd"> <Book> <Title>My Life and Times</Title> <Author>Paul McCartney</Author> <Date>July, 1998</Date> <ISBN>94303-12021-43892</ISBN> <Publisher>McMillan Publishing</Publisher> </Book>
DTDs • Allow to create new sets of labels • Examples: • <!ELEMENT Title (#PCDATA)> • <!ELEMENT Disk (Disk)+> (1 or more) • <!ELEMENT Book (Book)*> (0 or more) • ? (0 or 1) , (sequence) | (option) • Attributes: • <!ATTLIST ARTICLE DATE CDATA> (CDATA means Character Data) • <!ATTLIST PERSON GENDER (male | female) #IMPLIED> (optional) • <!ATTLIST PERSON GENDER (male | female) “male” #REQUIRED> (required)
DTDs <!DOCTYPE Discography[ <!ELEMENT Discography (disk)*> <!ELEMENT Disk (Title, Group, Song*)> <!ELEMENT Title(#PCDATA)> <!ELEMENT Group(#PCDATA)> <!ELEMENT Song (titleS, Duration> <!ELEMENT titleS(#PCDATA)> <!ELEMENT Duration(#PCDATA)> ]>
DTDs < Discography> < Disk> < Title>Brother in arms</ Title> < Group>Dire Straits</ Group> < Song> < titleS>Money for nothing</ titleS> < Duration>5:20</ Duration> </ Song> < Song> <titleS>So far away</titleS> <duration>4:10</duration> </ Song> ... </Disk> <Disk> <Title>On every street</Title> <Group>Dire Straits</Group> <Song> ... </Disk> </Discography>
DTDs <!DOCTYPE publications[ <!ELEMENT publications (disk | book)*> <!ELEMENT book ... > <!ELEMENT disk ... > ]>
DTDs <publications> <disk> <titledisk>Brother in arms</titledisk> <group>Dire Straits</group> <song> <titleS>Money for nothing</titleS> <duration>5:20</duration> </song> ... </disc> <book> <titlebook>Cien años de soledad</titlebook> <writer>Gabriel García Márquez</writer> ... </book> <book> <titlebook>La ciudad de los prodigios</titlebook> <writer>Eduardo Mendoza</writer> ... </book> </publications>
DTDs <?xml version="1.0"?> <!DOCTYPE file [ <!ELEMENT file (name+, surname+, address+, picture?)> <!ELEMENT name (#PCDATA)> <!ATTLIST name sex (male|female) #IMPLIED> <!ELEMENT surname (#PCDATA)> <!ELEMENT address (#PCDATA)> <!ELEMENT picture EMPTY> ]> <file> <name sex=“male”>Toni</name> <surname>Navarrete</surname> <surname>Terrasa</surname> <address>Rambla 32</address> </file>
Well formed vs valid • Valid XML: the content conforms to the rules of the associated DTD. • Completeness, good format and attribute values of the XML data is ensured. • Well formed: adjusted to XML syntax • An XML document without DTD can be well formed but, of course, cannot be valid.
XML Schemata • XML Schemata to define the structure of XML documents (same as DTDs) • BUT in XML syntax. Advantage: same parser to validate, tools for dynamic creation • Use of Namespaces • Improved data type definition (41 instead of 10, plus user-defined) • Object orientation allows new types by extension or restriction of previous ones • Validation (a document wrt a scheme, a scheme wrt scheme of schemes)
Schema definition • An XML document whose root is “schema” and within it elements and attributes are defined: <?xml version="1.0“?> <schema> ... elements and attributes definition </schema> • element definition <element name=“name of the element” type=“type of the element” [options...] >
Simple types of elements • string: characters chain • boolean (false, 0, true, 1) • float (32 bits) • double (64 bits) • decimal (integer) • timeDuration • recurringDuration (several subtypes) • binary • uriReference (Uniform Resource Indicator) And derived from these basic ones
Example XML document previous to schema definition <?xml version="1.0“ encoding="ISO-8859-1“?> <bookshop> <book isbn="84-111-1111-1"> <title>El Quijote</ title> <author>Miguel de Cervantes</author> <publisher>Plaza y Janés</publisher> <character>Don Quijote</character> <character>Sancho Panza</character> <character>Dulcinea</character> <character>Rocinante</character> </book> <book isbn="84-222-2222-2"> <title>La ciudad de los prodigios</ title> <author>Eduardo Mendoza</author> <publisher>Seix-Barral</publisher> <character>Onofre Boubila</character> <character>Efren Castells</character> </book> <book isbn="84-333-3333-3"> <title>Cien años de soledad</title> <author>Gabriel García Márquez</author> <publisher>Planeta</publisher> <character>Aureliano Buendía</character> </bookshop>
Building blocks: simple elements and cardinality • Simple elements: <element name=“title" type="string" /> <element name="author" type="string" /> <element name=“publisher" type="string" /> <element name=“character" minOccurs="0" maxOccurs="unbounded" /> • A DTD would be like: <!ELEMENT title (#PCDATA)> • In the cardinality definition we replace the DTD symbols ?, *, +
Building blocks: Complex types • The element book is composite, thus we define it as a complex type: <element name=“book"> <complexType> <sequence> <element name=“title" type="string" /> <element name="author" type="string" /> <element name=“publisher" type="string" /> <element name=“character" minOccurs="0" maxOccurs="unbounded" /> </sequence> </complexType> </element>
Alternative: naming complex types • We could also define a complex type with a name: <element name=“book” type=“Booktype” /> <complexType name=“Booktype”> <element name=“title" type="string" /> <element name="author" type="string" /> <element name=“publisher" type="string" /> <element name=“character" minOccurs="0" maxOccurs="unbounded" /> </complexType>
Remark: the combination of both is not allowed <element name=“book” type=“Booktype”> <complexType name=“Booktype”> <element name=“title" type="string" /> <element name="author" type="string" /> <element name=“publisher" type="string" /> <element name=“character" minOccurs="0" maxOccurs="unbounded" /> </complexType> </element>
Building blocks: empty elements • Elements such as HTML tags <hr> or <img ...> are empty <hr /> <img src=“image.gif” /> • Empty has to be declared as an implicit complex type <element name=“hr”> <complexType content=“empty” /> </element> <element name=“img”> <complexType content=“empty”> <attribute name=“src” type=“string” /> </complexType> </element>
A level upwards ... • Let us define “bookshop”: <element name=“bookshop"> <complexType> <element name=“book" minOccurs="0” maxOccurs="unbounded"> <complexType> ... </complexType> </element> </complexType> </element> A schema definition is a BOTTOM-UP process
Attribute definition • Elements can have attributes associated to them • In DTDs, we would write: <!ATTLIST book isbn #REQUIRED> In XML Schema: <attribute name=“name of the attribute” type=“type of the attribute” [options of the attribute ...] >
Attribute definition • At the end of the element definition <element name=“book" minOccurs="0" maxOccurs="unbounded"> <complexType> <element name=“title" type="string" /> <element name="autor" type="string" /> <element name=“publisher" type="string" /> <element name=“character" minOccurs="0" maxOccurs="unbounded" /> <attribute name="isbn" type="string" /> </complexType> </element>
General ordering • The definitions are ordered for a better legibility: • 1) Simple types definition • 2) Attributes definition • 3) Complex types definition
Referencing the schema • We then add the schema reference in the XML document: assume it is book.xml and bookshop is book.xsd then we would write: <?xml version="1.0" encoding="ISO-8859-1"?> <bookshop xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance” xsi:noNamespaceSchemaLocation=“book.xsd” > ... </bookshop>
Namespaces • An XML Namespace is a collection of names (of elements and attributes) identified by an URI • Namespaces are a very flexible tool. The re-use of schemata, names, mixing them is promoted. • For instance we could use elements from two name spaces < BOOKS> <bk: BOOK xmlns:bk="urn: BookLovers.org:BookInfo“ xmlns:money="urn:Finance:Money"> <bk:TITLE>A Suitable Boy</bk:TITLE> <bk:PRICE money:currency="US Dollar">22.95</bk:PRICE> </bk:BOOK> </BOOKS>
Namespaces • http://www.w3.org/2000/10/XMLSchema • This is the Namespace for the schemata. Suffix xsd is used; if none, it is the default namespace • http://www.w3.org/2000/10/XMLSchema-instance • Namespace for the documents instantiated from a schema. The prefix xsi is usually used.
Example <schema xmlns="http://www.w3.org/2000/10/XMLSchema ” 1 targetNamespace="http://www.upf.es/namespaces/Book” 2 elementFormDefault="qualified” 3 xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance” xsi:schemaLocation= "http://www.w3.org/2000/10/XMLSchema http://www.w3.org/2000/10/XMLSchema.xsd" xmlns:bk="http://www.publishing.org/namespaces/Book"> 1 Indicates the default namespace, which is XMLSchema 2 Indicates that the elements and attributes in this schema are defined upon the namespace http://www.upf.es/namespaces/Book 3 Indicates that all the elements created in this namespace and used in the instantiated documents have to be qualified with a prefix (if we had used unqualified, only the global elements could go)
Example (2) <schema xmlns="http://www.w3.org/2000/10/XMLSchema ” targetNamespace="http://www.upf.es/namespaces/Book” elementFormDefault="qualified” xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance” 4 xsi:schemaLocation= 5 "http://www.w3.org/2000/10/XMLSchema 6 http://www.w3.org/2000/10/XMLSchema.xsd" 7 xmlns:bk=" http://www.upf.es/namespaces/Book"> 4 Indicates that this XML document is instantiated from the general Schema on Schemata 5 This is the namespace where the attribute schemaLocation is defined 6 The namespace for the general Schema on Schemata 7 URI of this Schema on Schemata
Example (3) <schema xmlns="http://www.w3.org/2000/10/XMLSchema ” targetNamespace="http://www.upf.es/namespaces/Book” elementFormDefault="qualified” xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance” xsi:schemaLocation= "http://www.w3.org/2000/10/XMLSchema http://www.w3.org/2000/10/XMLSchema.xsd" xmlns:bk="http://www.upf.es/namespaces/Book"> 8 8 We give a prefix to the target namespace to facilitate the use in documents, for instance: <element ref=“bk:Title" minOccurs="1" maxOccurs="1"/>
Example (and 4) • In the instantiated document: <bookshop xmlns ="http://www.upf.es/namespaces/Book” 1 xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance” 2 xsi:schemaLocation=“http://www.upf.es/namespaces/book.xsd"> 3 1 We define the default namespace of the document 2 We include the namespace where schema instantiation is defined (xsi) 3 With schemaLocation we specify where is the Schema for this document (book.xsd)
Other important concepts • ID and IDREFS • DOM (Document Object Model) • X-path • X-pointer • X-link
ID and IDREFS • ID attribute for unique identification of element. Similar role of URI. Example assigning the identity “attack”: <paragraph id=“attack”>Suddenly the skies were filled with aircraft</paragraph> • IDREFS (identity reference) easiest way of referring to an ID. Example: In a DTD defined attributes of employee “empnumber” as an ID and “boss” as IDREFS; here we say that Hank’s ID is 126 and his boss is 124 (defined earlier): < employee empnumber=“emp126” boss=“emp124”> Hank</employee>
DOM (Document Object Model) • DOM is a technology for accessing and manipulating parts of an XML document • DOM models a document as a tree whose nodes are its elements • Then some properties and methods exist for the objects, allowing the access and manipulation
X-PATH • X-Path is a language for referencing parts of an XML document • It is used, for instance, to transform a document through XSL • X-Path is based upon DOM; and uses paths (similar to URLs) to reference parts of a document
X-POINTER • X-Pointer is a language for pointing at a part of an XML document • X-Pointer uses X-path for pointing • X-Pointer enables linking
Linking using XML: X-LINK • X-Link is a language for describing how to link resources in XML • We use attributes for the element link in the NameSpace xlink at "http://www.w3.org/XML/XLink/1.0" • The attributes are used to describe end-points, traversal, effect, resources
Tools • XML Browsers (visualisers) • XML Editors • XML Parsers • XML Servers • Relational DB to XML converters • XSL Editors • XSL Processors
XSL Allows to incorporate a design into an XML document, generating HTML, PDF, mail, SMS message, ... Using CSS and DSSSL (SGML)
XSL <?xml version="1.0"?> <!DOCTYPE BookCatalogue SYSTEM "file://localhost/xml-course/xsl/BookCatalogue.dtd"> <BookCatalogue> <Book> <Title>My Life and Times</Title> <Author>Paul McCartney</Author> <Date>July, 1998</Date> <ISBN>94303-12021-43892</ISBN> <Publisher>McMillin Publishing</Publisher> </Book> <Book> <Title>Illusions The Adventures of a Reluctant Messiah</Title> <Author>Richard Bach</Author> <Date>1977</Date> <ISBN>0-440-34319-4</ISBN> <Publisher>Dell Publishing Co.</Publisher> </Book> <Book> <Title>The First and Last Freedom</Title> <Author>J. Krishnamurti</Author> <Date>1954</Date> <ISBN>0-06-064831-7</ISBN> <Publisher>Harper & Row</Publisher> </Book> </BookCatalogue>
XSL Document / PI <?xml version=“1.0”?> DocumentType <!DOCTYPE BookCatalogue ...> Element BookCatalogue Element Book Element Book Element Book ... ... Element ISBN Element Publisher Element Author Element Date Element Title Text McMillin Publishing Text 94303-12021-43892 Text July, 1998 Text My Life ... Text Paul McCartney
<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/"> <xsl:apply-templates/> </xsl:template> <xsl:template match="BookCatalogue"> <xsl:apply-templates/> </xsl:template> <xsl:template match="Book"> <xsl:apply-templates/> </xsl:template> <xsl:template match="Title"> <xsl:apply-templates/> </xsl:template> <xsl:template match="Author"> <xsl:apply-templates/> </xsl:template> <xsl:template match="Date"> <xsl:apply-templates/> </xsl:template> <xsl:template match="ISBN"> <xsl:apply-templates/> </xsl:template> <xsl:template match="Publisher"> <xsl:apply-templates/> </xsl:template> <xsl:template match="text()"> <xsl:value-of select="."/> </xsl:template> </xsl:stylesheet> XSL
<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/"> <HTML> <HEAD> <TITLE>Book Catalogue</TITLE> </HEAD> <BODY> <xsl:apply-templates/> </BODY> </HTML> </xsl:template> <xsl:template match="BookCatalogue"> <xsl:apply-templates/> </xsl:template> <xsl:template match="Book"> <xsl:apply-templates/> </xsl:template> <xsl:template match="Title"> <xsl:apply-templates/> </xsl:template> <xsl:template match="Author"> <xsl:apply-templates/> </xsl:template> <xsl:template match="Date"> <xsl:apply-templates/> </xsl:template> <xsl:template match="ISBN"> <xsl:apply-templates/> </xsl:template> <xsl:template match="Publisher"> <xsl:apply-templates/> </xsl:template> <xsl:template match="text()"> <xsl:value-of select="."/> </xsl:template> </xsl:stylesheet> XSL added these BookCatalogue.xsl
XML-based formats • XML is an architecture not an application • SMIL (Synchronized Multimedia Integration Language) • RDF (Resource Description Framework) for metadata • CDF (Channel Definition Format) canales Microsoft • MathML (Mathematical Markup Language) • CML (Chemical Markup Language) • BSML (Bioinformatic Sequence Markup Language) • JML • WIDL (B2B integration)
Processing • Two orientations to process XML documents using Java as programming language: • DOM (Document Object Model) • tree structure (nodes, elements and text), most used • SAX (Serial Access with the Simple API for XML) • event based • Fastest, less memory requirements, more difficult to program
Some references • http://www.w3.org/ • Official web with all the standards • http://www.xml.com/ • Web from O’Reilly publishers. A lot of good documentation and resources. • http://www.xfront.com/ • Very good tutorials of XSL and XML-Schema • http://xml.apache.org • Apache parsers and documentation (Xerces, Xalan, ...) • XML and Java. B. McLAUGHLIN. O’Reilly, 2000 • Interesting about their combination using Apacheparsers