1 / 37

The ABCs of XML: Creating Languages to Describe Data

This presentation by Howard Rosenbaum discusses the basics of XML, including its relationship to HTML and other markup languages, how it works, and its potential impact on the web. It also covers examples of XML applications.

jmonson
Download Presentation

The ABCs of XML: Creating Languages to Describe Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The ABCs of XML <title> </title> <author>Howard Rosenbaum</author> <email> hrosenba@indiana.edu</email> <address>School of Library and Information Science Center for Social Informatics Indiana University</address> <conference>Indiana Library Federation Annual Meeting</conference> <date>April 10, 2001</date> <location>http://www.slis.indiana.edu/hrosenba/www/Pres/ilf-01_xml</location>

  2. What is XML? • • XML and HTML • • Where does it fit in with other markup languages? • II. How does it work? • • Your own private language • • DTDs and schemas • • XSLT: Extensible style sheet transformation language • • Xpath, Xlink, Xpointer, Xforms • III. How will it change the web? • • Examples of XML applications

  3. I. What is XML? XML is Extensible Markup Language It is a meta-language It is a language used to create languages that can describe data It is extensible Authors can define their own tags and attributes that can be easily processed and displayed across platforms XML became a World Wide Web Consortium (W3C) Recommendation 2/10/98, corrected 10/6/00 http://www.w3.org/TR/REC-xml

  4. So what’s wrong with HTML? It’s simple enough for children to use This is because it is rigid and inflexible It does a good job representing the structure and format of documents It can’t tell us anything about the meaning of documents It can be used across platforms It is rife with proprietary markup It can be searched The inability of search engines to capture the meaning of content leads to poor performance

  5. XML is used to create specialized markup languages by defining sets of tags and attributes It is a subset of SGML and allows “generalized markup” It is useful for storing structured data that will be published in a variety of media By itself, XML does not define any tags You create your own tags (your own markup language) CML: Chemical Markup Language MathML: Mathematical Markup Language ebXML: Electronic Business Markup Language Properly done, XML documents can be viewed across platforms

  6. XML describes data in a human readable and machine understandable format This format is intended to capture the meaning of the data There is no indication of how the data are to be displayed It is a database-neutral and device-neutral language Data marked up in XML can be targeted to different formats XML can also be used to publish data on different platforms

  7. Some relationships among markup languages (and one tool) SGML XML HTML CML HTML 3.2 XSLT ebXML CSS HTML 4.01 MML XHTML

  8. What is XML? • • XML and HTML • • Where does it fit in with other markup languages? • II. How does it work? • • Your own private language • • DTDs and schemas • • XSLT: Extensible style sheet transformation language • • Xpath, Xlink, Xpointer, Xforms • III. How will it change the web? • • Examples of XML applications

  9. II. How does it work? An XML document us actually composed of three different files 1. The raw XML file (.xml) This file has the basic data marked up with XML tags It will contain markup that will link the file to both the DTD(or “schema”) and the XSL stylesheet It must follow certain rules to be considered “well formed” and “valid” This is necessary if the document is to be displayed by a browser or parser

  10. Here's a simple HTML document: <html> <head> <title>Memo form</title> </head> <body> <b>4.10.01</b><br /> <b>TO:</b> John Doe<br /> <b>CC:</b> Jane Doe<br /> <b>FROM:</b> Bozo T. Clown<br /> <p>Please take note our phone number has changed.</p> <p>Yours in clownitude,<br /> Bozo</p> </body> </html>

  11. XML reflects the structure of the data by creating tags identifying: The type of document as a <memo> Its content divisions: a <header> and a <memotext> When it was sent: <date> An addressing scheme with two types of actions: <to> and <cc> The sender of the message as <sender> The name of the recipient as <name> The text of the memo: <memotext> The signature as an entity called &sig;

  12. <?xml version="1.0" standalone="no"?> <!DOCTYPE MEMO SYSTEM "http://www.site.com/dtds/memo.dtd"> <memo> <header type=“informative”> <date>04.10.01</date> <to> To: <name>John Doe</name> </to> <cc> CC: <name> Jane Doe </name> </cc> <from> From: <sender>Bozo T. Clown</sender> </from> </header> <memotext> Please take note our phone number has changed. &sig; </memotext> </memo> Here’s the same document as an XML file

  13. Rules for writing XML There must be a “root element” Documents must be “well formed” Elements must be properly nested If a DTD is used, documents must be “valid” Markup on the document must conform to the DTD Every tag must be closed Empty tags are closed with a slash <picture /> XML is case sensitive All attribute values must be in quotation marks All entity references must be declared in a DTD before being used in a document

  14. 2. A Document Type Definition (DTD) It is a set of rules that defines the tags, elements, entities, attributes and other elements that can be used in XML files It determines how they can be used It also specifies how they are logically related Elements in a DTD are hierarchical and nested DTDs can be internal (within the document) or external (.dtd extension) For the XML document to be “valid,” it must conform to the rules laid out in the DTD to which it is linked

  15. DTDs have Elements These are the basic tags used in the markup One must be a “root element” and is the most inclusive container All other elements are nested with it An element can be defined by using other elements It can also be defined as containing text (#PCDATA) The sequence determines the nesting Elements defined in the DTD must appear in the document There is special markup that allows choice

  16. The generic form of an element is: <!ELEMENT element_name rule> The “rule” is the “content model” of the element It specifies the nested elements used to define the main element It also specifies the order in which the elements must appear In our example the root element is <memo> It is defined in terms of <header> and <memotext> It is written as: <!ELEMENT memo (header, memotext)>

  17. DTDs have Attributes These contain additional information associated with the element The information is a form of metadata It is “about” the element rather than part of the element They are useful for enumerated data (ex: product id #) There is a small predefined set of attributes that can be used Attributes and their values appear in the opening tag of a paired tag (or in the unpaired tag)

  18. The generic form of an attribute is: <!ATTLIST element_name attribute_name attribute_type default_value attribute_name attribute_type default_value attribute_name attribute_type default_value> The element name is required because attributes must be attached to elements There is a set of attribute types that can be used to specify categories of content (for example) CDATA: Character data (anything except markup) ID: unique value (only appear once in a document) NOTATION: provides processing instructions (how to open a binary file)

  19. In our example there is an attribute called “type” that is placed in the opening <memo> tag The value is “informative” Assume this is one of several types of memos that could be sent In a DTD, it might look like this: <!ATTLIST memo (informative | directive | scheduling) The “|” (pipe) is a separator It sets a condition where one one value from the sequence may appear in the document markup

  20. Entities provide a type of shorthand in XML markup They reference text or other elements and call them when used in the DTD or document General entities place data into the document Internal means that they are used only within the document External means that they are in an external DTD and can be reused Parameter entities are used in the DTD They can refer to another element or group of elements and can be reused in the same or different DTDs

  21. The entity has the generic form: <!ENTITY entity_name “text string”> In the example, it appears in the DTD as: <!ENTITY sig “Yours in clownitude, Bozo”> In our example, we represented a text string with an entity “Yours in clownitude, Bozo” was represented in the document with: &sig; The entity is expanded when the document is parsed This is a convenient way to include large blocks of text that only have to be entered once

  22. Here’s what a DTD (memo.dtd) would look like for this memo <!ENTITY sig “Yours in clownitude, Bozo”> <!ELEMENT memo (header, memotext)> <!ELEMENT header (date, to, cc?, from)> <!ATTLIST header type (informative | directive | scheduling)> <!ELEMENT date (#PCDATA)> <!ELEMENT to (name+)> <!ELEMENT name (#PCDATA)> <!ELEMENT cc (name*)> <!ELEMENT from (sender+) <!ELEMENT sender (#PCDATA)> <!ELEMENT memotext (#PCDATA)> + = must appear at least once or many times ? = may be omitted or can appear once * = may be omitted or can appear many times | = one or the other but only one may appear #PCDATA = text

  23. Schemas XML Schema are an alternative to DTDs DTDs are “global,” so an element can only be defined once This is a problem if the element is used differently in two different contexts Schemas allow global (the same everywhere) and local (differ in different contexts) elements DTDs cannot specify the data type of an element Schemas can specify data types DTDs are not written in XML Schemas are

  24. Schemas divide content into two types Simple types These contain only text In DTDs these are represented by the attribute_type “PCDATA” (a name, integer, date…) Complex types These elements define the structure of the document Some will contain other elements Some will contain elements and text Some will contain only text Some will be empty

  25. <?xml version=“1.0”?> <xml:schema xmlns:xsd=“http://www.w3c.org/2000/10/XMLSchema”> <xsd:element name=“name” type=“xsd:string”> <xsd:complexType name=“memo”> <xsd:sequence> <xsd: complexType=“header”> <xsd:element name=“date” type =“xsd:date”> <xsd: complexType=“to”> <xsd:element ref=“name”/> </xsd:complexType> <xsd: complexType=“cc”> <xsd:element ref=“name”/> </xsd:complexType> <xsd: complexType=“from”> <xsd: element=“sender” type =“xsd:string”> </xsd:complexType> <xsd:element name=“memotext” type =“xsd:string”> </xsd sequence> <xsd:attribute name=“type” value=“informative | directive | scheduling”> </xsd:complextype> </xsd:schema> Here is the memo DTD as a schema:

  26. <?xml version=“1.0”?> <memo xmlns:xsi=“http://www.w3c.org/2000/10/XMLSchema-instance”> xsi:noNamespaceSchemaLocation=“/xml/ns/memo.xsd”> <memo> <header type=“informative”> <date>04.10.01</date> <to> To: <name>John Doe</name> </to> <cc> CC: <name> Jane Doe </name> </cc> <from> From: <sender>Bozo T. Clown</sender> </from> </header> <memotext> Please take note our phone number has changed. &sig; </memotext> </memo> Here is how the memo calls the schema

  27. 3. An XSL stylesheet This file contains transformation rules that determine how the components of an XML file will be rendered and displayed in a range of formats (.xsl extension) With XSL-FO, specific formatting or style rules can be applied to specific components of a DTD This language is not supoprted by any browsers yet With XSLT, a transformation process can be specified to convert XML documents into other formats (HTML, RTF, LaTeX, text) This can be used An XSL stylesheet is also an XML document and must be "well formed"

  28. The process begins with an XML document and an XSLT style sheet The XSLT parser translates both into trees The XML document is the source tree The XSLT style sheet is the style tree Trees consist of nodes Root node Element nodes Text nodes Attribute nodes Processing instruction nodes Namespace node The XSLT processor uses these trees to create a result tree This becomes the final or result document

  29. XML Memo as a source tree Memo Header Memotext Date To CC From #PCDATA #PCDATA Name Name Sender #PCDATA #PCDATA #PCDATA

  30. And here’s what the XSL stylesheet might look like <?xml version=“1.0”> <xsl stylesheet xmlns=“http://www.w3c.org/1999/XSL Transform” version=“1.0”> <xsl template match=“/”> <html> <head> <title>Memo form</title> </head> <body> <xsl:template match=“header”> <b><xsl:apply-templates select=“date” /><b><br /> <b><xsl:apply-value-of select=“to/name” /><b><br /> <b><xsl:apply-value-of select=“cc/name” /><b><br /> <b><xsl:apply-value-of select=“from/sender” /></b> </xsl:template> <p><xsl:apply value-of select=“memotext”></p><p><xsl:apply value-of select=“&sig;”></p> </body></html> </xsl:template> </xsl:stylesheet>

  31. There are other components of XML that greatly extend its power and flexibility Xpath This is a syntax that locates nodes in the hierarchical structure of an XML document It is used in XSLT <xsl:template match=“node_name”> This specifies the current node It uses patterns: these can be repeated throughout the document It also uses expressions: these are context specific This syntax is a sophisticated shorthand to when writing processing instructions

  32. Xlink This is extensible linking language It allows more complex type of linking Here’s a simple link <logo xlink:type=“simple” xlink:href=“../images/logo.gif” xlink:role=“image” xlink:title=“logo” xlink:show=“embedded” xlink:actuate:”onload” /> Xlink defines “linksets” or extended links A set of files can be connected through a chain of links moving from the first to the last file in the linkset replace new onLoad

  33. Xpointer This is a syntax for linking to specific locations within XML documents It used Xpath expressions to define the locations #xpointer(element_name[position()=1] This is appended to the end of a URL in an Xlink expression Xforms This is a subset of XML that is going to be used someday to allow more complex forms to be created in XHTML

  34. What is XML? • • XML and HTML • • Where does it fit in with other markup languages? • II. How does it work? • • Your own private language • • DTDs and schemas • • XSLT: Extensible style sheet transformation language • • Xpath, Xlink, Xpointer, Xforms • III. How will it change the web? • • Examples of XML applications

  35. III. How will it change the web? XML has interesting potential to change a portion of the web It is expected to move us closer to write once display anywhere (XSLT) It will be an important component of the “semantic web” Search engines that can process XML should be much more precise and return more relevant results It can improve business processes, particularly if professions develop their own markup languages

  36. Examples of XML applications Resource Description Framework (RDF) This is a framework that allows the description and interchange of metadata Because it is designed to be platform independent, it becomes a hub for metadata activity RDF provides a model for metadata, and a syntax so that independent parties can exchange it and use it RDF makes it possible to use multiple pieces of software to process the same metadata It also allows a single piece of software to process (at least in part) many different metadata vocabularies

  37. Extensible Hypertext Markup Language (XHTML) Synchonized MultiMedia Markup Language (SMIL) Math Markup Language (MathML) Chemical Markup Language (CheML) Commerce Markup Language (CML) Electronic Business XML (ebXML) National Library of Medicine XML Data formats Electronic Component Information Exchange (ECIX) Geography Markup Language (GML) Research Information Exchange Markup Language (RIXML) MARC to XML conversion

More Related