1 / 37

An Introduction to XML

eXtensible Markup Language version 1.0 Recommendation, February 1998. An Introduction to XML. Patrice Bonhomme & Laurent Romary Lucid-IT LORIA bonhomme@lucid-it.com romary@loria.fr. Objectives. Understanding the basic concepts of XML Elements, attributes and content DTD (, Schemas)

finian
Download Presentation

An Introduction to XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. eXtensible Markup Language version 1.0 Recommendation, February 1998 An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-IT LORIA bonhomme@lucid-it.com romary@loria.fr

  2. Objectives • Understanding the basic concepts of XML • Elements, attributes and content • DTD (, Schemas) • Namespaces • An overview of the main associated recommendations: • XML path language (XPath) • XML pointers and links (Xpointer and XLink) • The transformation language of XSL (eXtensible Stylesheet Language)

  3. XML in the document chain Conception Edition Transformation Consultation DTD/ Schema Structures XML Data XML XSL/XSLT Data processing HTML XHTML User perspective

  4. A quick historical overview • 1986 • SGML (Standard Generalized Markup Language) • ISO standard: ISO:8879:1986 • 1987 • TEI (Text Encoding Initiative) • 1990 • HTML 1.0 (HyperText Markup Language) • 1997/1998 • XML 1.0 (eXtensible Markup Language)

  5. What XML is: • XML: eXtended Markup Language • A W3C (World Wide Web Consortium) Recommendation • A meta-language: it allows one to define his own markup language • A simplification of the SGML standard • SGML was intended to represent the “logical” structure of a document • HTML was conceived as an application of SGML

  6. A simplified SGML • An XML document is an SGML document • With some slight (but essential) differences... • XML has the expressive power of SGML without its complexity • Opens the door to the transmission of structured documents on the web • Databases also entered the game...

  7. What can we do with it? • Data modeling (in complement to UML for instance) • Publication of structured data on the web • Separation of the logical structure of a document from its actual presentation • Distributed applications (cf. well-formed vs. valid documents) • Integrating data from heterogeneous sources

  8. Why can’t we avoid it? • Simplicity, which makes it simple to integrate into any kind of application • XML specifications = 36 pages • SGML standard, ISO-8879 = 250 pages • Wide variety of application already implemented • Industry: Publishing, Databases, Cataloguing, e-business etc. • Science, research: genomics, astronomy, maths, etc. • Consequence: • a lot of software available: editors, parsers, bridges from and to existing editing environment or DBMSs

  9. From HTML to XML - 1 • A simple HTML document: <B> Patrice Bonhomme </B> <P> Patrice.Bonhomme@loria.fr <BR> tél : 03 83 59 30 52 <BR> fax : 03 83 41 30 79 <BR> équipe : Langue et Dialogue (<I>LORIA</I>)<BR>

  10. From HTML to XML - 2 • The XML way: <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE MEMBRE SYSTEM "http://…/MEMBRE.dtd"> <!-- Un membre du LORIA --> <MEMBRE TYPE="IE" ID="M28"> <NOM> BONHOMME </NOM> <PRENOM> Patrice </PRENOM> <MEL> Patrice.Bonhomme@loria.fr </MEL> <TEL> 03 83 59 30 52 </TEL> <FAX> 03 83 41 30 79 </FAX> <EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE> </MEMBRE>

  11. Some properties of XML • Emphasis should be put on the “semantics” of a document • Underlying model: tree structure • Possibility to imagine a script language to access any part of an XML document e.g.: DB/MEMBRE[28]/MEL/text() • XML supports Unicode character encodings

  12. Elements and their content Opening tag <MEMBRE TYPE="IE" ID="M28"> <LOGIN ID="bonhomme"/> <NOM>BONHOMME</NOM> <PRENOM>Patrice</PRENOM> <MEL> Patrice.Bonhomme@loria.fr </MEL> <TEL>03 83 59 30 52</TEL> <FAX>03 83 41 30 79</FAX> <EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE> </MEMBRE> Empty element Element Textual content Closing tag

  13. Elements and their attribute Attribut value Attribut name <MEMBRETYPE="IE"ID="M28"> <LOGIN ID="bonhomme"/> <NOM> BONHOMME </NOM> <PRENOM> Patrice </PRENOM> <MEL> Patrice.Bonhomme@loria.fr </MEL> <TEL> 03 83 59 30 52 </TEL> <FAX> 03 83 41 30 79 </FAX> <EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE> </MEMBRE>

  14. Other features • XML declaration <?xml version=“1.0"?> <?xml version="1.0" encoding="UTF-8" standalone="yes"?> • Commentaries <!-- ceci est un commentaire --> • CDATA section <![CDATA[Langue & Dialogue]]> • Processing instruction (application specific) <?edit line="wrap"?>

  15. From one document to a class… How do I know the structure of my document? How may I share this structure with others?

  16. Document Type Definition • Expresses constraints on: • Allowed element and attribute names • Possible content of a given element (“content model”) • To which elements a given attribute can be attached • Similar to the traditional SGML approach, but: • Simplified syntax • The DTD is optional for a document

  17. Example <!ELEMENT MEMBRE (LOGIN, NOM?, PRENOM?,MEL, TEL+, FAX*, EQUIPE)> <!ELEMENT LOGIN EMPTY> <!ATTLIST LOGIN ID ID #REQUIRED> <!ELEMENT NOM (#PCDATA)> ... <!ENTITY W3C "World Wide Web Consortium"> <!ENTITY chap1 SYSTEM "http://…/chapitre-1.xml"> <!ENTITY img2 SYSTEM "image2.gif" NDATA gif> ...

  18. Using a DTD <!DOCTYPE MEMBRE SYSTEM "http://…/MEMBRE.dtd"> <MEMBRE TYPE="IE" ID="M28"> … </MEMBRE> <!DOCTYPE MEMBRE [ <!ELEMENT MEMBRE … > … ]> <MEMBRE TYPE="IE" ID="M28"> … </MEMBRE>

  19. Valid vs. Well-formed • Well-formed documents • Syntactic bracketing is preserved, without a DTD • Empty element: <toto></toto> = <toto/> • Valid documents • With a DTD (à la SGML) • Essential difference with SGML • Extracting and re-using document fragments • One usually produce valid document and distribute well-formed ones

  20. XML namespaces • Objectives: avoid conflicts between element and attribute names coming from various sources • Composite documents • XSLT instructions, Schema declarations • Declaration: <DOCxmlns:mml="http://www.w3.org/Math/MathML/" xmlns="http://www.ua99.net/DOC/1.0"> <P>blah blah : <mml:fn mml:definitionURL="mydef.xml"> … </mml:fn>re blah blah</P> </DOC>

  21. Reserved namespaces • The xml: prefix is reserved by the W3C for specific attributes: <titlexml:space="default">...</title> <p xml:lang="FR">…</p>

  22. XPath • XML Path Language 1.0 REC 29012000 • Wide purpose syntax for addressing sub-parts of an XML document • Joint specification used by XML Pointers (XPointer recommendation) and the XSLT transformation language • Allows one to access, select and filter XML fragments (cf. Tree representation of an XML document)

  23. Addressing nodes in XPath • Absolute addressing • Given: a URL • id(M28), root() • Relative addressing along axes • Given: a node • ancestor, child • descendant • psibling, fsibling

  24. An XML document represents a hierarchical structure The only view you should ever, ever have of an XML document

  25. XPath - Exemples <DB> <MEMBRE TYPE="IE" ID="M28"> <LOGIN ID="bonhomme"/> ... <EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE> </MEMBRE> <MEMBRE TYPE="CR" ID="M14"> <LOGIN ID="romary"/> ... </MEMBRE> </DB> / ou /DB /DB/MEMBRE /DB/MEMBRE[2] /DB/MEMBRE[@ID=‘M28’]/EQUIPE[1]/text() /DB/MEMBRE/LOGIN[@ID=‘romary’]/../@ID

  26. XPointer • Cf. HTML, anchors are needed: <A NAME="TOTO"> http://www.titi.fr/index.html#toto • In XML, pointers can directly address a document component: http://…/doc.xml#xptr(id(M28)) http://…/doc.xml#xptr(/DB/MEMBRE[28]/MEL) • Advantage: no need to modify the target document (notion of primary source)

  27. XLink • In HTML: the elements which may carry links are known: <A>, <IMG>, ... • In XML: any element may carry a simple or complex link • This is done by using pre-defined attributes: <a xlink:type="simple" xlink:href="http://www.w3.org/">W3C</a>

  28. Visualizing XML documents • Basically, an XML document does not provide any information about its presentation • Visualizing a document may depend on the target audience, device etc. • Stylesheets: • Casdading Style Sheets (CSS 1 et 2) • Extensible Style Language (XSL) >> XSLT

  29. XSL XML eXtensible Style Language • Describes the way a document will be shown, printed or verbalized… +

  30. XSL: a two-fold proposal • XSL = Transformations + Visualizing properties • XSLT : Transformation of XML documents • Allows one to transform an XML document into another XML document • Use this to produce well-formed (!) HTML documents • XSL FO: formatting XML data • FO = Formatting Objects • Is supposed to be application independent (Word/RTF, PS, PDF, MIF, …) • Not a recommendation yet :-(

  31. <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> … <xsl:template match="/"> … </xsl:template> <xsl:template match="NOM"> … </xsl:template> </xsl:stylesheet> General structure of an XSL document

  32. Declarative approach • Sequence of rules (templates) specifying: • The pattern (XPath) of nodes to which the rule can be applied • Actions to be undertaken: • Elements to be generated in the target document • Selection of the elements to be further explored in the source document • Additional functionalities: testing, sorting, etc.

  33. A simple rule HTML element to be produced pattern (XPath) <xsl:template match='/DB/MEMBRE/NOM'> <B> <xsl:apply-templates/> </B> </xsl:template> The content of <B> will be the one produced by the instruction

  34. Creating a HTML core document <xsl:template match=“/”> <HTML> <HEAD> <TITLE>My directory</TITLE> </HEAD> <BODY><xsl:apply-templates/></BODY></HTML> </xsl:template>

  35. Selecting the nodes to be explored <xsl:template match=“MEMBRE”> <P> <xsl:apply-templates select=“NOM”/> <xsl:text> - </xsl:text> <xsl:apply-templates select=“EQUIPE”/> </P> </xsl:template>

  36. Conclusion • XML - a practical format (protocol) • Next steps: • Sharing DTD, resources tools • Generic mechanisms for handling families of documents (cf. Nancy’s presentation)

  37. References www.oasis-open.org/cover/ www.w3.org/XML/ www.w3.org/TR www.w3.org/TR/REC-xml babel.alis.com/web_ml/xml/REC-xml.fr.html www.xml.com www.xmlinfo.com xml.apache.org

More Related