1 / 24

XML

XML. D Nathan. Intro and formalism. Roots. A computer is not a typewriter electronic texts are more than sequences of characters they have structure, and context they also have multiple reading s Markup provides a means of making structure, context and readings explicit

grant
Download Presentation

XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML D Nathan Intro and formalism

  2. Roots • A computer is not a typewriter • electronic texts are more than sequences of characters • they have structure, and context • they also have multiple readings • Markup provides a means of making structure, context and readings explicit • only that which is symbolically explicit can be digitally processed • digital processing is about more than reproducing paper

  3. Textual ontologies • As annotations, markup adds value to data • Facilitate multiple readings and multiple usages • different contexts • different formats • different audiences • different purposes • There’s more: texts can not only be read but also analysed and manipulated

  4. What is markup, again? • A way of naming and identifying the parts of a document in a sharable and consistent way • A way of making explicit the distinctions we want a computer to make when it processes a sequence of characters • Making the document “machine readable” (computers can read and process it as if they understand it)

  5. ... and again? • “A set of codes that tell an agent how to interpret, process or display content” • Thus, it’s usually more useful to markup what things really are than what they look like

  6. Example James Bond1007 Fast DriveAston Martin420HP DB501865 007 08025/10/06 Dear Mr Khazakstanspy It is with some regret that .... • What is “25/10/06”? • How do we know? • What does the software know?

  7. Design principles • XML came out of SGML - a system for incremental and collaborative “enrichment” of texts • XML design principles 1. XML shall be straightforwardly usable over the Internet. 2. XML shall support a wide variety of applications. 3. XML shall be compatible with SGML. 4. It shall be easy to write programs which process XML documents. 5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero. 6. XML documents should be human-legible and reasonably clear. 7. The XML design should be prepared quickly. 8. The design of XML shall be formal and concise. 9. XML documents shall be easy to create. 10. Terseness is of minimal importance.

  8. XML • eXtensible Markup Language: a generic markup language • Simplifies the representation of structured data as linear character strings, i.e. can be thought of as: • as a stream of text and/or as a (tree) structure • XML looks like HTML, except that it: • is extensible • must be well-formed • can be validated • is application-, platform-, and vendor- independent

  9. XML landscape Grammars SGML DTD XML Schema Markup languages Related technologies XML languages HTML XHTMLMathMLSMILIXFSVGCBML XHTMLMathMLSMILIXFSVGCBML CSSXSL:FOXPathXQueryXSLTXLink layout navigate, query transform link

  10. XML Formalism • Create explicit formal structures using only plain text • structures are defined by tags in angle brackets: eg: <noun> • tags are usually in pairs: • a start/open tag, and an end/close tag: the <noun> dog </noun> chased ... • but can also be single and closed: the dog <pause /> sat down

  11. Elements, tags and content • Elements • Tags (opening, closing, empty) • Content <a></a> is not empty; it has no content

  12. Attributes and values • Tags can have attributes with values : the <noun num=“1”> dog </ noun> sat down • Attribute names within elements are unique • Order of attribute/value pairs insignificant: the <noun num=“1” cl=“anim”> dog </ noun> sat the <noun cl=“anim” num=“1”> dog </ noun> sat • Often attributes values have to be drawn from a closed set, e.g. consider: <dog breed=“corgi” color=“noun”> Fifi </dog> ?

  13. Names • You can name your elements, attributes or values (almost) anything, but ... • Names should begin with “a-z” or “_”

  14. Characters • XML must be ASCII or Unicode • XML is case sensitive; in general use lower case • Reserved characters <, >, &, “

  15. Character entity references • “Stand in” for reserved characters • e.g. &lt; • Provide standardised references • e.g. &t-pal; • Provide “short cuts” for strings • e.g. &n; • Have to be declared, but can be created to purpose

  16. Syntax • Nesting (hierarchy), but no overlap: <a>the<b><c>cat</c> sat</b> on the mat</a> <a>the<b><c>cat</b> sat</c> on the mat</a>

  17. More syntax • All elements must be closed • Most attributes have values; values must be enclosed in (plain) double quotes • There are no size or number limits

  18. The XML document • A plain text file • Main parts: prolog, body • Body has a single root node (= element) • Comments <!-- this comment may be ignored --> • Processing instructions (PI) This (optional) special PI also called the XML declaration: <?xml version=“1.0” ?> • Document type declaration <!DOCTYPE IXF SYSTEM "IXF.DTD" [<!ENTITY LEXFILE "..\DXF\PaakaDraft.xml">]>

  19. XML document layout • Is unimportant! • ... in most circumstances, but some applications might treat the white space differently

  20. This is the same as ... <panel n="3"> <panelDescription characters="cap" /> <caption> <paragraph> Before the hammer descends on cap, his shield <emphasis style="bold"> demolishes </emphasis> the evil mechanism! </paragraph> </caption> <soundEffect> KRAK! </soundEffect> </panel> <panel n="4"> <panelDescription characters="cap anon_man" /> <caption> <paragraph> The screaming suddenly <emphasis style="bold"> stops-- </emphasis> and, in the ensuing silence, <emphasis style="bold"> both </emphasis> men sink <emphasis style="bold"> slowly </emphasis> to the ground... </paragraph> </caption> </panel>

  21. ... this! <panel n="3"> <panelDescription characters="cap" /> <caption> <paragraph> Before the hammer descends on cap, his shield <emphasis style="bold"> demolishes </emphasis> the evil mechanism! </paragraph> </caption> <soundEffect> KRAK! </soundEffect> </panel> <panel n="4"> <panelDescription characters="cap anon_man" /> <caption> <paragraph> The screaming suddenly <emphasis style="bold"> stops-- </emphasis> and, in the ensuing silence, <emphasis style="bold"> both </emphasis> men sink <emphasis style="bold"> slowly </emphasis> to the ground... </paragraph> </caption> </panel>

  22. Putting it together

  23. ... in XML <story> <metaDataField>(The Guardian, </metaDataField> <metaDataField>July 1, 1997, </metaDataField> <metaDataField>Andrew Higgins in Hong Kong) </metaDataField> <headLine>A last hurrah and an empire closes down </headLine> <p>With a clenched-jaw nod from the Prince of Wales, a last rendition of <title>God Save the Queen</title>, and a wind machine to keep the Union flag flying for a final 16 minutes of indoor pomp...</p> </story>

  24. XML capable software (other than displaying “raw” XML) • most browsers • including XML, CSS, XSLT • software using XML-based data formats • e.g. Transcriber • may keep XML hidden but you can often manipulate it • software that exports data in some XML format • e.g. MS Excel, Toolbox, Filemaker Pro • dedicated XML editing software • e.g. oXygen

More Related