370 likes | 482 Views
eXtensible Markup Language version 1.0 Recommendation, February 1998. An Introduction to XML. Patrice Bonhomme & Laurent Romary Lucid-IT LORIA bonhomme@lucid-it.com romary@loria.fr. Objectives. Understanding the basic concepts of XML Elements, attributes and content DTD (, Schemas)
E N D
eXtensible Markup Language version 1.0 Recommendation, February 1998 An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-IT LORIA bonhomme@lucid-it.com romary@loria.fr
Objectives • Understanding the basic concepts of XML • Elements, attributes and content • DTD (, Schemas) • Namespaces • An overview of the main associated recommendations: • XML path language (XPath) • XML pointers and links (Xpointer and XLink) • The transformation language of XSL (eXtensible Stylesheet Language)
XML in the document chain Conception Edition Transformation Consultation DTD/ Schema Structures XML Data XML XSL/XSLT Data processing HTML XHTML User perspective
A quick historical overview • 1986 • SGML (Standard Generalized Markup Language) • ISO standard: ISO:8879:1986 • 1987 • TEI (Text Encoding Initiative) • 1990 • HTML 1.0 (HyperText Markup Language) • 1997/1998 • XML 1.0 (eXtensible Markup Language)
What XML is: • XML: eXtended Markup Language • A W3C (World Wide Web Consortium) Recommendation • A meta-language: it allows one to define his own markup language • A simplification of the SGML standard • SGML was intended to represent the “logical” structure of a document • HTML was conceived as an application of SGML
A simplified SGML • An XML document is an SGML document • With some slight (but essential) differences... • XML has the expressive power of SGML without its complexity • Opens the door to the transmission of structured documents on the web • Databases also entered the game...
What can we do with it? • Data modeling (in complement to UML for instance) • Publication of structured data on the web • Separation of the logical structure of a document from its actual presentation • Distributed applications (cf. well-formed vs. valid documents) • Integrating data from heterogeneous sources
Why can’t we avoid it? • Simplicity, which makes it simple to integrate into any kind of application • XML specifications = 36 pages • SGML standard, ISO-8879 = 250 pages • Wide variety of application already implemented • Industry: Publishing, Databases, Cataloguing, e-business etc. • Science, research: genomics, astronomy, maths, etc. • Consequence: • a lot of software available: editors, parsers, bridges from and to existing editing environment or DBMSs
From HTML to XML - 1 • A simple HTML document: <B> Patrice Bonhomme </B> <P> Patrice.Bonhomme@loria.fr <BR> tél : 03 83 59 30 52 <BR> fax : 03 83 41 30 79 <BR> équipe : Langue et Dialogue (<I>LORIA</I>)<BR>
From HTML to XML - 2 • The XML way: <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE MEMBRE SYSTEM "http://…/MEMBRE.dtd"> <!-- Un membre du LORIA --> <MEMBRE TYPE="IE" ID="M28"> <NOM> BONHOMME </NOM> <PRENOM> Patrice </PRENOM> <MEL> Patrice.Bonhomme@loria.fr </MEL> <TEL> 03 83 59 30 52 </TEL> <FAX> 03 83 41 30 79 </FAX> <EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE> </MEMBRE>
Some properties of XML • Emphasis should be put on the “semantics” of a document • Underlying model: tree structure • Possibility to imagine a script language to access any part of an XML document e.g.: DB/MEMBRE[28]/MEL/text() • XML supports Unicode character encodings
Elements and their content Opening tag <MEMBRE TYPE="IE" ID="M28"> <LOGIN ID="bonhomme"/> <NOM>BONHOMME</NOM> <PRENOM>Patrice</PRENOM> <MEL> Patrice.Bonhomme@loria.fr </MEL> <TEL>03 83 59 30 52</TEL> <FAX>03 83 41 30 79</FAX> <EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE> </MEMBRE> Empty element Element Textual content Closing tag
Elements and their attribute Attribut value Attribut name <MEMBRETYPE="IE"ID="M28"> <LOGIN ID="bonhomme"/> <NOM> BONHOMME </NOM> <PRENOM> Patrice </PRENOM> <MEL> Patrice.Bonhomme@loria.fr </MEL> <TEL> 03 83 59 30 52 </TEL> <FAX> 03 83 41 30 79 </FAX> <EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE> </MEMBRE>
Other features • XML declaration <?xml version=“1.0"?> <?xml version="1.0" encoding="UTF-8" standalone="yes"?> • Commentaries <!-- ceci est un commentaire --> • CDATA section <![CDATA[Langue & Dialogue]]> • Processing instruction (application specific) <?edit line="wrap"?>
From one document to a class… How do I know the structure of my document? How may I share this structure with others?
Document Type Definition • Expresses constraints on: • Allowed element and attribute names • Possible content of a given element (“content model”) • To which elements a given attribute can be attached • Similar to the traditional SGML approach, but: • Simplified syntax • The DTD is optional for a document
Example <!ELEMENT MEMBRE (LOGIN, NOM?, PRENOM?,MEL, TEL+, FAX*, EQUIPE)> <!ELEMENT LOGIN EMPTY> <!ATTLIST LOGIN ID ID #REQUIRED> <!ELEMENT NOM (#PCDATA)> ... <!ENTITY W3C "World Wide Web Consortium"> <!ENTITY chap1 SYSTEM "http://…/chapitre-1.xml"> <!ENTITY img2 SYSTEM "image2.gif" NDATA gif> ...
Using a DTD <!DOCTYPE MEMBRE SYSTEM "http://…/MEMBRE.dtd"> <MEMBRE TYPE="IE" ID="M28"> … </MEMBRE> <!DOCTYPE MEMBRE [ <!ELEMENT MEMBRE … > … ]> <MEMBRE TYPE="IE" ID="M28"> … </MEMBRE>
Valid vs. Well-formed • Well-formed documents • Syntactic bracketing is preserved, without a DTD • Empty element: <toto></toto> = <toto/> • Valid documents • With a DTD (à la SGML) • Essential difference with SGML • Extracting and re-using document fragments • One usually produce valid document and distribute well-formed ones
XML namespaces • Objectives: avoid conflicts between element and attribute names coming from various sources • Composite documents • XSLT instructions, Schema declarations • Declaration: <DOCxmlns:mml="http://www.w3.org/Math/MathML/" xmlns="http://www.ua99.net/DOC/1.0"> <P>blah blah : <mml:fn mml:definitionURL="mydef.xml"> … </mml:fn>re blah blah</P> </DOC>
Reserved namespaces • The xml: prefix is reserved by the W3C for specific attributes: <titlexml:space="default">...</title> <p xml:lang="FR">…</p>
XPath • XML Path Language 1.0 REC 29012000 • Wide purpose syntax for addressing sub-parts of an XML document • Joint specification used by XML Pointers (XPointer recommendation) and the XSLT transformation language • Allows one to access, select and filter XML fragments (cf. Tree representation of an XML document)
Addressing nodes in XPath • Absolute addressing • Given: a URL • id(M28), root() • Relative addressing along axes • Given: a node • ancestor, child • descendant • psibling, fsibling
An XML document represents a hierarchical structure The only view you should ever, ever have of an XML document
XPath - Exemples <DB> <MEMBRE TYPE="IE" ID="M28"> <LOGIN ID="bonhomme"/> ... <EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE> </MEMBRE> <MEMBRE TYPE="CR" ID="M14"> <LOGIN ID="romary"/> ... </MEMBRE> </DB> / ou /DB /DB/MEMBRE /DB/MEMBRE[2] /DB/MEMBRE[@ID=‘M28’]/EQUIPE[1]/text() /DB/MEMBRE/LOGIN[@ID=‘romary’]/../@ID
XPointer • Cf. HTML, anchors are needed: <A NAME="TOTO"> http://www.titi.fr/index.html#toto • In XML, pointers can directly address a document component: http://…/doc.xml#xptr(id(M28)) http://…/doc.xml#xptr(/DB/MEMBRE[28]/MEL) • Advantage: no need to modify the target document (notion of primary source)
XLink • In HTML: the elements which may carry links are known: <A>, <IMG>, ... • In XML: any element may carry a simple or complex link • This is done by using pre-defined attributes: <a xlink:type="simple" xlink:href="http://www.w3.org/">W3C</a>
Visualizing XML documents • Basically, an XML document does not provide any information about its presentation • Visualizing a document may depend on the target audience, device etc. • Stylesheets: • Casdading Style Sheets (CSS 1 et 2) • Extensible Style Language (XSL) >> XSLT
XSL XML eXtensible Style Language • Describes the way a document will be shown, printed or verbalized… +
XSL: a two-fold proposal • XSL = Transformations + Visualizing properties • XSLT : Transformation of XML documents • Allows one to transform an XML document into another XML document • Use this to produce well-formed (!) HTML documents • XSL FO: formatting XML data • FO = Formatting Objects • Is supposed to be application independent (Word/RTF, PS, PDF, MIF, …) • Not a recommendation yet :-(
<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> … <xsl:template match="/"> … </xsl:template> <xsl:template match="NOM"> … </xsl:template> </xsl:stylesheet> General structure of an XSL document
Declarative approach • Sequence of rules (templates) specifying: • The pattern (XPath) of nodes to which the rule can be applied • Actions to be undertaken: • Elements to be generated in the target document • Selection of the elements to be further explored in the source document • Additional functionalities: testing, sorting, etc.
A simple rule HTML element to be produced pattern (XPath) <xsl:template match='/DB/MEMBRE/NOM'> <B> <xsl:apply-templates/> </B> </xsl:template> The content of <B> will be the one produced by the instruction
Creating a HTML core document <xsl:template match=“/”> <HTML> <HEAD> <TITLE>My directory</TITLE> </HEAD> <BODY><xsl:apply-templates/></BODY></HTML> </xsl:template>
Selecting the nodes to be explored <xsl:template match=“MEMBRE”> <P> <xsl:apply-templates select=“NOM”/> <xsl:text> - </xsl:text> <xsl:apply-templates select=“EQUIPE”/> </P> </xsl:template>
Conclusion • XML - a practical format (protocol) • Next steps: • Sharing DTD, resources tools • Generic mechanisms for handling families of documents (cf. Nancy’s presentation)
References www.oasis-open.org/cover/ www.w3.org/XML/ www.w3.org/TR www.w3.org/TR/REC-xml babel.alis.com/web_ml/xml/REC-xml.fr.html www.xml.com www.xmlinfo.com xml.apache.org