670 likes | 925 Views
Introduction to XML. Valérie Bellynck EFPG-INPG France. mailto:Valerie.Bellynck@efpg.inpg.fr. What is XML ?. means : eXtensible Markup Language (in French « langage à balises extensible », or « langage à balises extensibles » ; in spanish ?)
E N D
Introduction to XML Valérie Bellynck EFPG-INPG France mailto:Valerie.Bellynck@efpg.inpg.fr
What is XML ? • means : eXtensible Markup Language (in French « langage à balises extensible », or « langage à balises extensibles » ; in spanish ?) • 1996 : clarification by the XML Working Group, under World Wide Web Consortium (W3C) supervision • XML ~ generalisation of HTML wherefixed semantic predefined tags author « invented » own tags • 1998 : official evolution to standardXML 1.0 specifications recommandations From "XML in Micro-Application", e-Poche collection http://www.w3c.org/XML/
HTML ? XML ? SGML XML comes from SGML, not from HTML From XML in Micro-Application e-Poche collection
SGML Standard Generalized Markup Language • defined in 1986 by ISO 8879 standard • dissociates completely in a document : content / presentation / structure description • used in - industry for technical documents- electronic document management (GED) • problems : - doesnot aimed at Internet use- complex and heavy description to follow http://www.sgmlsource.com/Goldfarb/history/index/htm
HTML HyperText Markup Language • is an extension of SGML • is a language of document descriptionsection titles, bookmarks, anchors, linguistic elements to format text, to describe tables... • is interpreted by a browser (a client application for Internet requests) • the display is browser-independent • problems : - content and presentation are mixed http://www.w3c.org/HTML/
Targets to XML : it must be... • used without difficulty in Internet • defined quickly • described in a formal and concise way • auto-describing • able to extent its-self • deal with an arborecent data description • treatable with any application equiped with a text parser • able to support UNICODE and any other police codage for linguistic universality • support a large panel of applications • compatible with SGML • make easier writing software aimed to document processing • a way of representing data as human-readable documents • easy to use for creating documents
Markup Languages ? • Markups are pairs of expressions (tags) which surround a block of text, to indicate some characteristics ex : in HTML, the tag <B> commands beginning of bold display and </B> commands its end <B> Text in Bold </B> Text in Bold • Tags can be parametrised by attributes ex : in HTML, - the tag <a> allows to define a hypertext link - the URL of the link is defined by the attribute href - the clickable text is surrounded by the tags <a> and </a> <a href="http://www.3ie.org/xml"> click here </a> click here
<HTML> <HEAD> <TITLE>Lime Jello Marshmallow Cottage Cheese Surprise</TITLE> </HEAD> <BODY> <H3>Lime Jello Marshmallow Cottage Cheese Surprise</H3> My grandma's favorite (may she rest in peace). <H4>Ingredients</H4> <TABLE BORDER="1"> <TR BGCOLOR="#308030"> <TH>Qty</TH><TH>Units</TH><TH>Item</TH> </TR><TR> <TD>1</TD><TD>box</TD><TD>lime gelatin</TD> </TR><TR> <TD>500</TD><TD>g</TD><TD>multicolored tiny marshmallows</TD> </TR><TR> <TD>500</TD><TD>ml</TD><TD>cottage cheese</TD> </TR><TR> <TD></TD><TD>dash</TD><TD>Tabasco sauce (optional)</TD> </TR> </TABLE> <P> <H4>Instructions</H4> <OL> <LI>Prepare lime gelatin according to package instructions...</LI> <!-- and so on --> </BODY> </HTML> HTML code
<?xmlversion="1.0"?> <Recipe> <Name>Lime Jello Marshmallow Cottage Cheese Surprise</Name><Description>My grandma's favorite (may she rest in peace).</Description><Ingredients><Ingredient><Qty unit="box">1</Qty><Item>lime gelatin</Item></Ingredient><Ingredient><Qty unit="g">500</Qty><Item>multicolored tiny marshmallows</Item></Ingredient><Ingredient><Qty unit="ml">500</Qty><Item>Cottage cheese</Item></Ingredient> <Ingredient><Qty unit="dash"/><Item optional="1">Tabasco sauce</Item></Ingredient></Ingredients><Instructions><Step>Prepare lime gelatin according to package instructions</Step><!-- And so on... --></Instructions> </Recipe> XML example code
XML heading informations Every XML file should begin with a header defining which version of XML is used in the document <?xml version="1.0"?> This is done through the version attribute. Other attributes can define global properties, such as : - encoding attribute, which defines the character encoding <?xml version="1.0" encoding="ISO-8859-1"?> The encoding specific to French characters is ISO-8859-1The international universal encoding for all characters is UTF-8
Well-formed XML means« parsable » • A well-formed XML document is a document that follows all the notational and structural rules for XML, otherwise it is meaningless By analogy, the expression 2 ( + + 5 (=) 9 > 7 is meaningless even if it looks (sort of) like math • The most important rules are : • No unclosed tags : a block can’t be "opened" with a tag <TAG>without being "closed" afterwards with </TAG> • Use of closed empty elements :they must have either a closing tag <EMPTY type="example"></EMPTY>or a single tag with slash " /" before the closing " >" : <EMPTY type="example" /> • No overlapping tags : a tag that opens inside another tag must close before the containing tag closes : <INCLUDING-TAG> <CONTAINING-TAG> </CONTAINING-TAG> </INCLUDING-TAG> • Enclosing quotes for attribute values : <TAG type="example">
Valid XML A document is valid because it matches its Document Type Definition (DTD) • A DTD is a grammar for some class of documents using a markup language, that is, a set of rules to describe the authorized sequences and embeddings of tags • The language to write DTDs is a special language, not XML but there is a more complex syntax to define DTs in XML (schemas) • A DTD specifies • what elements may exist, • which attributes the elements may have, • what structural organisation of elements is attempted : what element may or must be found inside other elements, and in what order. due to DTD, XML is eXtensible
Power of DTD • Wrinting a DTD is how you actually define a new markup language -- often called a dialect of XML. • At present, DTDs are being written for an enormous number of different problem domains, and each DTD defines a new markup language. • New markup languages now exist, or are being designed, • to mark up specific domains such as the plays of Shakespeare or business data in the footwear industry (FDX) ... • to define general data resources (RDF); • to model information in the health care industry (HL7 SGML/XML); • to typeset, display, and actively use mathematical equations (MathML); • and to perform electronic data interchange (XML/EDI).
DTD for the example <!-- This is the example DTD for the example XML --><!ELEMENTRecipe (Name, Description?, Ingredients?, Instructions?)><!ELEMENTName (#PCDATA)><!ELEMENTDescription (#PCDATA)><!ELEMENTIngredients (Ingredient)*><!ELEMENTIngredient (Qty, Item)><!ELEMENTQty (#PCDATA)><!ATTLISTQtyunitCDATA #REQUIRED><!ELEMENTItem (#PCDATA)><!ATTLISTItemoptionalCDATA "0" isVegetarianCDATA "true"><!ELEMENTInstructions (Step)+>
DTD : defining tags • <!ELEMENT Recipe (Name, Description?, Ingredients?, Instructions?)>The <!ELEMENT...>statement defines a tag in the document. This tag defines a <Recipe> tag, stating that it can contain • - a <Name> , - an optional <Description> (the question mark [?] denotes optionality), • - an optional <Ingredients> tag, • - and an optional <Instructions> tag. • <!ELEMENT Name (#PCDATA)>This simply states that a <Name> tag can contain character data and nothing else. • <!ATTLIST Item optional CDATA "0" isVegetarian CDATA "true">This section states that the <Item> tag has two possible attributes: - optional , whose default value is 0; and • - isVegetarian , whose default value is true . • <!--- This is a comment --> the text « This is a comment » won’t be interpreted.
DTD : other definitions <!ENTITY Utterance "example of sentence or value">This defines an internal entity.It associates a value to a name which will be more explicit than a tag in the document.. The browser will replace the entity &Utterance; by the text : example of sentence or value There are external entities too which can either be some XML content or not, and are all defined in XML language. <!ENTITY TextPresentation SYSTEM "http://foo.com/presentation/text.xml">It allows the document to reference the content of the file saved in the URL.The browser will replace the entity &TextPresentation; by the content of the file placed at http://foo.com/presentation/text.xml <!NOTATION gif SYSTEM "usr/local/bin/display"><!ENTITY ImagePresentation SYSTEM "http://foo.com/img/lion.gif" NDATA gif>For not XML content, as gif files, for example, the notation definition allows to specify the authorized application <imagePres src= "ImagePresentation"> which will include the image in the document through the browser
DTD file call in XML file • in the XML file, • a document type declaration tells the parser • to start looking for a <Recipe>tag as the top-level tag (root) of the document. • that the DTD is in the system file personne.dtd • <!DOCTYPE Recipe SYSTEM "example.dtd"> <?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE personne SYSTEM "personne.dtd"><personne> <prenom>Alain</prenom> <nom>Connu</nom></personne>
DTD directly included in file <!DOCTYPE personne [directDTDcontent]> <?xml version="1.0" encoding="ISO-8859-1" ?> <!--DTD declaration and definition --> <!DOCTYPE personne [<!ELEMENT personne (prenom, nom)><!ELEMENT prenom (#PCDATA)><!ELEMENT nom (#PCDATA)>]><!--end of DTD declaration and definition --> <personne> <prenom>Alain</prenom> <nom>Connu</nom></personne>
What is a « NameSpace » ? • It allows to share tags between XML-authors of documents • It allows to choose between own-defined tags and someone-else-defined tags • It concerns DTD : used for elements and for attributes • Some NamesSpace can become a W3C norm : - XMLSchema (eXtensible Markup Language Schema)- Xlink (eXtensible link)- XSL (eXtensible Stylesheet Language)- XHTML- versions of HTML (3.0, 4.0...)
Example of HTML Namespace <?XML version="1.0"?><!--Every elements are in HTML Namespace--><html:html xmlns:html= "http://www.w3.org/TR/REC-html40"> <html:head> <html:title>Namespace Example use</html:title> </html:head> <html:body> <html:p> Text and Links <html:a href= "http://foo.com">here</html:a> </html:p> </html:body></html:html> This example uses the XML name space of HTML defined in the W3C recommendations REC-html40 for HTML version 4.0
Example of using 2 Namespaces <?XML version="1.0"?><ls:livre xmlns:lv= "unr:loc.gov:livres" xmlns:isbn= "unr:ISBN:0-395-36341-6"> <lv:titre>Harry Potter et la coupe de feu</lv:titre> <isbn:number>0747554420</isnb:number></ls:livre> This example commands the browser to load 2 namespaces using respectively lv and isbn as prefixes
Case of schema structure representation in XML XML Schema • is an XML based alternative to DTD • has support for Data types (more than only PCDATA) • use XML syntax (=> editable with an XML editor, parseable by any XML parser, manipulate with the XML DOM, transformable with XSLT) • is extendible just like XML (=> reusability, derivability for own data types from standard types , multiple schema referenciation from the same document) • secure data communication (sender and receiver can both have same « expectation » about the content by sharing its structural representation : link to interoperability) http://www.w3schools.com/default.asp/
Exemple de schéma <?XML version="1.0" encoding="iso-8859-1" ?><xsd:schema xmlns:xsd= "http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" > <xsd:element name="film" type="typeFilm" /> <xsd:complexType name="typeFilm" > <xsd:sequence > <xsd:element name="titre" type="xsd:string" /> <xsd:element name="acteurs" type="typeActeur" /> <xsd:element name="realisateur" type="xsd:string" /> <xsd:element name="annee" type="xsd:decimal" /> <xsd:element name="texte" type="xsd:string" /> <xsd:element name="note" type="xsd:string" minOccurs="0" maxOccurs="1" /> </xsd:sequence > </xsd:complexType ><xsd:complexType name="typeActeur" > <xsd:sequence > <xsd:element name="personne" type="xsd:string" minOccurs="0" maxOccurs="unbounded" /> </xsd:sequence > </xsd:complexType ></xsd:schema>
Presentation : CSS and XSL for general control over formatting, use • Cascading Style Sheet • eXtensible Stylesheet Language Both are declarative languages XSL is more recent than CSS XSL is described in XML, using namespace power
CSS for HTML and XML • exists as a current recommendation from the W3C, usable with HTML or XML • Is simpler to use and less powerful than XSL • is supported by most current-generation browsers (to varying degrees) http://www.W3.org/TR/html401/present/styles
Cascading Style Sheets In the small example next, <HTML> contains <BODY> contains <H1> contains text : <HTML> <HEAD> </HEAD> <BODY> <H1>A Theory About the Brontosaurus</H1> My theory about the brontosaurus is... </BODY></HTML> The whole idea of a style sheet is to use these structural relationships to indicate where changes in text style, spacing, and so on should occur. <STYLE TYPE="text/css"><!--H1 { color: red; font-size: 16pt; text-decoration: underline }--></STYLE>
Example of CSS file html\:body { background-color: rgb(255, 230, 230) } article { display: block; font-family:helvetica,sans-serif; background-color: rgb(230, 230, 255) } titre { display: block; font-size: 200%; text-align: center; border-width: medium; border-style: groove } auteur { display: block; font-size: 80%; font-weight: bold } date { display: inline; font-size: 80%; font-style: italic } lieu { display: inline; font-size: 80%; font-weight: bold } texte { display: block } grand { display: inline; font-variant: small-caps; font-size: 120%; font-weight: bold } image { display: block; border-width: thin; text-align: center; border-style: solid; content: url(attr(site)); } legende { display: block; text-align: center; padding-right: 2mm; padding-top: 2mm; padding-bottom: 2mm; padding-left: 2mm }
External CSS The CSS to use can be defined • using <LINK> element (in the <HEAD> for default use) <HTML> <HEAD> <LINK href="special.css" rel="stylesheet" type="text/css"> </HEAD> <BODY> <H1>A Theory About the Brontosaurus</H1> My theory about the brontosaurus is... </BODY></HTML> • in the <META> declaration (only for default use) ... <HEAD> <META http-equiv="Content-Style-Type" content="text/css"> </HEAD> ...
How do browsers apply CSS ? The browser will determine which style to use as follows • select the last CSS <META> declaration • otherwise, select the last other CSS declaration (for example, by <LINK> ) • otherwise, the default stylesheet language is "text/css"
Why CSS is named CSS ? • These style sheets are called cascading style sheets, because styles (like fonts, colors, and so on) for one markup element "cascade" down, and apply to all of the element's contents. • For example, if a paragraph tag (<P>) is set to show its text in red, all text and any other element inside that paragraph will be displayed in red, unlessone sub-element of the paragraph specifies a color for its contents.
XSL for XML and SGML • used exclusively to format XML or SGML • more complex and powerful than CSS http://nwalsh.com/docs/tutorials/webtek2000/xsl/ie/frames.html
XSL : Why Stylesheets for XML ? From Norman Walsh http://nwalsh.com/docs/tutorials/webtek2000/xsl/ie/frames.html because : • XML is not a fixed tag set (like HTML) and has no (application) semantics • XML markup does not (usually) include formatting information • Reuse: the same content can look different in different contexts • Multiple output formats: different media (paper, online), different sizes (manuals, reports), different classes of output devices (workstations, hand-held devices) • Styles tailored to the reader's preference (e.g., accessibility): print size, color, simplified layout for audio readers
What does a StyleSheet do ? It specifies the presentation of XML information using two basic categories of techniques: • An optional transformation of the input document into another structure • generation of constant text • suppression of content • moving text (e.g., exchanging the order of the first and last name) • duplicating text (e.g., copying titles to make a table of contents) • executing more complex transformations that "compute" new information in terms of the existing information • A description of how to present the transformed information • i.e., a specification of what properties to associate to each of the various parts of the transformed information
Needs to present information Description of how to present the (possibly transformed) dataincludes three levels of formatting information: • Specification of the general screen or page (or even audio) layout • Assignment of the transformed content into basic "content container types" (e.g., lists, paragraphs, inline text) • Specification of formatting properties (spacing, margins, alignment, fonts, etc.) for each resulting "container"
Components of XSL The full XSL language logically consists of three component languages which are described in three W3C (World Wide Web Consortium) recommendations: • XPath: XML Path Language a language for referencing specific parts of an XML document • XSLT: XSL Transformationsa language for describing how to transform one XML document (represented as a tree) into another • XSL: Extensible Stylesheet LanguageXSLT plus a description of a set of Formatting Objects and Formatting Properties
XML to Result Tree An XSLT "stylesheet" transforms the input (source) document tree into a structure called a result tree consisting of result objects Transform to Another Vocabulary
What is an XSL Stylesheet ? • XSLT Stylesheets are XML documents; namespaces are used to identify semantically significant elements. • Most stylesheets are stand-alone documents rooted at <xsl:stylesheet> (or <xsl:transform>). It is possible to have "single template" stylesheet/documents. Note that it is the mapping from namespace abbreviation to URI that is important, not the literal namespace abbreviation "xsl: " that is used most commonly
Understanding a template Most templates have the following form: <xsl:template match=" para "> <p> <xsl:apply-templates/> </p> </xsl:template> • The whole <xsl:template> element is a template • The match patterndetermines where this template applies • Literal result elements come from non-XSL namespace(s) • XSLT elementscome from the XSL namespace
Style sheet example A small, complete style sheet: <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html"/> <xsl:template match="doc"> <html> <head><title><xsl:value-of select="title"/></head> <body><xsl:apply-templates/></body> </html> </xsl:template> <xsl:template match="title"> <h1><xsl:apply-templates/></h1> </xsl:template> <xsl:template match="para"> <p><xsl:apply-templates/></p> </xsl:template> </xsl:stylesheet>
Transformation is application of templates Templates transformportions of the source tree into portions of the result tree. The ordered accumulation of all the transformed portions forms the complete result tree. Individual templates are free to process elements from anywhere in the source tree.
Match Patterns (locating elements) critical capability of a stylesheet language : locate source elements to be styled For example, - CSS, does this with "selectors". - FOSIs do it with "e-i-c's", elements in context. - XSLT does it with "match patterns" defined in XPath.
XPath XPath has an extensible string-based syntax inspired, in part, by the common "path/file" file system syntax: para matches all <para> children in the current context para/emphasis matches all <emphasis> elements that have a parent of <para> ancestor-or-self::*/@sepchar matches the sepchar attribute on the current element or any ancestor of the current element numberedlist/listitem[position() mod 2 = 0] matches odd list items in a numbered list.
Applying style recursively The process is allowed to run recursively, driven primarily by the document. A series of templates is created, such that if there is a template to match each context, then these templates are recursively applied starting at the root of the document. <xsl:templatematch="section/title"> <h2><xsl:apply-templates/></h2></xsl:template> • <xsl:templatematch="..."> • <xsl:apply-templates> <xsl:apply-templatesselect="th|td"/> 2 obstacles appear when using the recursive model, • how to arbitrate between multiple patterns that match and • how to process the same nodes in different contexts. These are solved by conflict resolution and modes, respectively.
Applying style proceduraly This process for applying style, is to select each action procedurally. A series of templates is created, such that each template explicitly selects and processes the necessary elements. <xsl:for-each> <xsl:for-each select="row"> <tr> <xsl:for-each select="entry"> <td><xsl:value-of select="."/></td> </xsl:for-each> </tr></xsl:for-each> <xsl:template name="..."> <xsl:template name="admonition"> <xsl:param name="type">warning</xsl:param> ...</xsl:template> <xsl:call-template> <xsl:call-template name="admonition"> <xsl:with-param name="type">caution</xsl:with-param></xsl:call-template>
Conditional processing Simple conditional (no "else") <xsl:if> <xsl:if test="{$somecondition}"> <xsl:text>this text only gets used if $somecondition is true()</xsl:text></xsl:if> Select among alternatives with <xsl:when>and <xsl:otherwise> <xsl:choose> <xsl:choose> <xsl:when test="$count > 2"> <xsl:text>, and </xsl:text> </xsl:when> <xsl:when test="$count > 1"> <xsl:text> and </xsl:text> </xsl:when> <xsl:otherwise> <xsl:text> </xsl:text> </xsl:otherwise></xsl:choose>
Variables Variables can be used to save computed values. • Variables are created with <xsl:variable> . • Variables are "single assignment" (no side effects) • Variables are lexically scoped Once created, variables can be used to generate content: <a href="{$file}">...</a> And control conditional processing: <xsl:if test="$count= 3">...</xsl:if> >
Creating the resulting tree Literal Result ElementsAny element in a template rule that is not in the XSL (or other extension) namespace is copied literally to the result tree <p>...</p> XSL ElementsElements in the XSL namespace: <xsl:text> <xsl:value-of> <xsl:element> <xsl:attribut> ...