1.33k likes | 1.54k Views
Putting XML to Work. Henry S. Thompson HCRC Language Technology Group University of Edinburgh. When you see this, it means there’s accompanying information in the Additional Materials handbook. 2. Overview of the tutorial. What is an XML application? Content, Form, Function Namespaces
E N D
Putting XML to Work Henry S. Thompson HCRC Language Technology Group University of Edinburgh
When you see this, it means there’s accompanying information in the Additional Materials handbook 2 Overview of the tutorial • What is an XML application? • Content, Form, Function • Namespaces • Ownership of names • XSL(T) • Style for XML • DOM • Standard abstract API for XML • XML Schema • Specifying the structure of document families • RDF • Defining and using Data Models
What is an XML application? • Putting XML to work means designing an XML application • SGML defines an application as having • A syntax: what do all the documents involved in this application share in terms of structure == markup? • A semantics: what do the components of that markup mean • You already know the basic story about defining a syntax • You can use English (or French or . . .) • You should use a DTD • Or even better a Schema
An aside about structure • Formal definitions of document structure are a kind of contract • The user undertakes to structure his/her documents per the DTD • The application undertakes to process documents which conform to the DTD • As is the case with real contracts, both sides benefit by using them • Users know what they have to do to get the results they want • Developers can depend on parsers for a lot
The W3C • A word from our sponsors • The W3C is responsible for all the XML family • The W3C is The World Wide Web Consortium, a voluntary association of companies and non-profit organisations. Membership costs serious money, confers voting rights. Complex procedures, with the Director (Tim Berners-Lee) holding all the high cards, but the big vendors (e.g. Microsoft, Adobe, Netscape) have a lot of power.
. . . and its WGs • The XML recommendation was written by the W3C’s XML Working Group • Which split itself into pieces, each of which handles a part of the ongoing work • Core WG (XML itself, Namespaces, Infoset) • Schema WG (XML Schema) • Linking WG (XLink, XPointer) • Query WG (XML Query) • Protocols WG (XML Protocols) • XSL WG (XSLT, XSL-FO)
XML Namespaces Henry S. Thompson HCRC Language Technology Group University of Edinburgh
Namespaces for XML • Where did those colons come from? • xsl:this, fo:that, xml:the_other • Two communities pushed for namespaces • Vendors, to manage the composition of document fragments • E.g. the inclusion of mathematical formulae in a document • Working groups, to reserve names without compromising users' freedom to name things • E.g. it wouldn't do for XML-link to reserve LINK for simple links, or XSL to reserve TEXT
4 Namespaces, cont'd • A W3C Recommendation was endorsed in January 1999 • There was a lot of vendor pressure to get something in place, which caused political tension and at least one resignation from the WG • The example illustrates how namespaces are declared, scoped and used
Namespaces defined • You can use prefixed names, consisting of two simple names separated by a colon (:) • The namespace prefix is an abbreviation for a URI which uniquely identifies the owner/meaning/identity of the source of the name • Using a namespace essentially cedes responsibility for the meaning of the qualified names to the owner of the URI
Declaring a namespace • The association between namespace prefixes and URIs is declared using reserved attributes <doc xmlns:mml='http://www.w3.org/TR/REC-MathML/'>...</doc> • Anywhere inside the above doc element mml is a legal namespace prefix, standing for the URI given • There is also a mechanism for defining the default (unprefixed) namespace • Declarations are scoped • Qualified names can be used for • Element type names • Attribute names
Namespace limitations • An add-on for, not a rewrite of, the XML spec • Validation is unchanged • Declarations must match instances character by character • Indeed there's no place for associating prefixes with URIs in DTDs • There is no provision for merging DTDs • The rules are confusing • Unprefixed attributes are never qualified • Unprefixed elements are qualified if and only if there is a default namespace declaration in scope
From Structure to Appearance:Style for XML Henry S. Thompson HCRC Language Technology Group University of Edinburgh
When you see this, it means there’s accompanying information in the Additional Materials handbook 2 Overview of the material • Why a style language? • Two approaches to style for XML: • CSS for simple cases • XSL for complex cases • Hands-on Exercises
Why a style language? • Separating form from content • Separating structure from appearance • Single source, multiple delivery media
Three stages on the way • Document Compilers: ASCII text with formatting instructions and body text intermixed • nroff, Scribe, TeX • WYSIWYG Word Processors: Out-of-band formatting instructions change appearance on-screen; proprietary file formats. • Word, Word Perfect • (Semi-)Structured Markup: Markup has either intrinsic or extrinsic rendering consequences. • SGML, HTML, XML
Is this progress? • The old document compilers • had complex procedural semantics, which made debugging and maintenance very tricky for documents of any sophistication. • made authoring and reading tedious, with obtrusive annotations everywhere. • The use of scoped annotations in Scribe and TeX was a big improvement over _roff, but the annotations were still resolutely about appearance, not structure. • LaTeX tried to fix this, but paid an unacceptable price in terms of complexity and fragility.
Is this progress?, cont'd • The WYSIWYG systems • are lovely to look at, and there's no problem with obtrusive annotations. • but even with the addition of paragraph and character styles, generalisation and consistency are hard to come by. • and there's the built-in obsolescence of proprietary formats to worry about.
SGML . . . • SGML solved the proprietary format problem • It's an ISO standard (8879) • It's human-readable (and understandable!) • But for a long time there was no standard way of formatting SGML documents for printing or viewing
. . . and HTML • So HTML (nearly/post-hoc an SGML application), by mandating a rendering semantics for all its semi-structural markup, filled a real need. • But it was • not extensible (fixed tag set) • not customisable (fixed appearance per tag)
9 Three Problems; Three Solutions: Electronic Style! • Style standard for SGML? • DSSSL • Customise HTML page appearance? • CSS • Extend HTML tag-set and control style? • XML and • CSS • XSL Style for XML, London 1998-11-25 Technology Appraisals Henry S. Thompson
Cascading Style Sheets • Level 1 Accepted Recommendation per W3C, December 1996 • Level 2 Accepted Recommendation, May 1998 • Addresses the problems of: • customising the appearance of HTML documents • minimal styling for XML • Initially driven by the need for site designers to differentiate the appearance of their pages from one another • Focus accordingly is on controlling the colour, size and shape of regions and fonts
6 CSS rules • CSS style rules associate properties with elements in your documents which match selectors • The basic structure of a rule looks like this: selector[, selector ...] {pname: pvalue[; pname: pvalue ...]} • Simple examples: verbatim {white-space: pre} H1 {text-align: center; font-variant: small-caps} • The first would provide style for an XML doc't • The second would change HTML's H1
CSS: Cascading Style Sheets • Customising HTML • formatting <P> elements by means of simple instruction: • Formatting XML • formatting <foobar> elements by means of similar instruction: • P {font-weight: bold; font-size: 14pt; • font-family: sans-serif} • foobar {display: block; • border-style: solid; background-color: green}
6 7 Associating rules with documents • Contents of STYLE element in the HTML header • Destination of an appropriate LINK element • In STYLE attributes on any HTML element
CSS selectors • Rules can have one or more selectors, separated with commas • Simple names select elements by name • In addition to element type names, other selector syntax includes • Space-separated lists, indicating (non-immediate) ancestry • Qualification with period or hash, indicating class or id attribute matching • Qualification with colon (pseudo-classes), for link state and typographic sensitivity
CSS: Cascading Style Sheets • Store CSS instructions in separate style file <?xml version="1.0" standalone="yes"?> <?xml-stylesheet type="text/css" href="mystyle.css"?> <article> <title>An example</title><text> <quote>It was the best of times, it was the worst of times</quote>, wrote <author>Charles Dickens</author> in <book>Tale of Two Cities</book>. </text> </article>
8 Using classes to get ready for XML • You can cheat with your HTML to make it look more like XML <DIV CLASS='MESSAGE'>Some text which is<SPAN CLASS='EMPH'>really</SPAN> important.</DIV> • And use class selectors in your stylesheet .MESSAGE {display:block; margin-top: 6pt}.EMPH {display:inline; font-style:italic}
CSS selectors: Vertical context • Sometimes you need context-sensitive selectors • For depth-sensitive rendering OL {list-style-type: lower-alpha} OL OL {list-style-type: lower-roman} • For context-appropriate rendering H1 {font-weight: bold;font-size: large} H2 {font-weight: bold;font-style: italic} H3 {font-style: italic} H2 EM,H3 EM {font-style: normal} • Note that in the last rule we have two selectors, separated by commas, sharing the same result
CSS boxes • CSS and HTML 4.0 et seq use a nested-boxes rendering model, and every block element is rendered into a box • Boxes all have margins, borders and padding (outside in) • All four margins and paddings (left-,right-, top-, bottom-) have width properties, and a shorthand property for setting them all together
CSS borders • Borders, in addition to widths, have colours and styles, plus shorthand properties for various combinations • There are also float and clear properties to allow a modest amount of displacement and flow-around. • CSS2 goes a lot further with this
CSS box example P { margin: 3ex; border-width: thin; border-style: solid; border-left: double; text-align: justify; border-color: blue; padding: 2ex 4ex} • gives the following for a sample paragraph
CSS property values • Some are symbolic, e.g. font-style: italic • URLs appear in a few places, e.g. background-image: url(http://www...) • Most are • lengths, e.g. 3em, 2px • percentages, e.g. 110% • numbers • colours, e.g. red, #fd0
CSS • In HTML you can “invent” your own tags using classes • And you can define how they should be rendered <div class="newsarticle">Here is some text. It’s a news article. It mentions <span class="company">Reuters</span>.</div> • <STYLE>div.newsarticle {text-align: left; font-style: italic}span.company {color: red}</STYLE>
Exercise: >> cat exa12.xml <?xml version="1.0" standalone="yes"?> <newsarticle> <headline>A newsarticle.</headline> <author>Marc Moens</author> <newsbody> Here is some text. It's a news article. It mentions <company>Reuters</company>. And since <company>Reuters</company> is a company, we would like it to come out slightly differently. </newsbody> </newsarticle>
7 Exercise: >> cat exa12.xml • Change this into an HTML file, keeping our own “invented” tags (like “headline”, “newsbody”, “company”) as classes • For example <headline> <div class="headline"> • call it exa12.html • See slide 29 for ideas • Put the style instructions in a separate file • call it exa12.css • See page 7 for how to connect the two files
Solution: >> exa12.html <HTML> <HEAD> <TITLE>A simple example</TITLE> <LINK rel=stylesheet type="text/css" href="exa12.css"> </HEAD> <BODY> <DIV class="headline">A newsarticle</DIV> <DIV class="author">Marc Moens</DIV> <DIV class="newsbody"> Here is some text. It’s a news article. It mentions <SPAN class="company">Technology Appraisals</SPAN>. And since <SPAN class="company">Technology Appraisals</SPAN> is a company, we would like it to come out slightly differently. </DIV></BODY> </HTML>
Solution: >> exa12.css div.headline {text-align: center; color: blue; border-style: dashed; font-size: xx-large} div.author {text-align: center; color: blue; font-size: large} div.newsbody {text-align: left; font-style: italic} span.company {color: red}
Style • This was CSS as applied to HTML • Later: • CSS to render XML • other ways of rendering XML
The 'Cascade' in CSS • What happens when there is more than one rule which provides a value for a property on a given element? • The highest priority value assignment wins • When no assignment is found, the value is either inherited or defaulted • This explains why our original H1 example was bold
CSS priority • A number of things contribute to determining priority • Origin, in increasing order of importance • browser • user • author • Specificity, in increasing order of importance • Number of element types • Number of CLASS selectors • Number of ID selectors • Importance, marked with !important
CSS cascade example • The following are in increasing order of priority LI UL LI UL OL LI LI.special OL LI.special #hotone
9 CSS for XML for real • In principle, it's easy • Just use your own element type names instead of HTML's • In practice • IE and Mozilla support it • Functionality often insufficient for complex document types • Style sheet linkage is via a PI <?xml-stylesheet type="text/css" href="…"?>
Ecise: >exa13.xml (=exa12.xml) <?xml version="1.0" standalone="yes"?> <newsarticle> <headline>A newsarticle.</headline> <author>Marc Moens</author> <newsbody> Here is some text. It's a news article. It mentions <company>Technology Appraisals</company>. And since <company>Technology Appraisals</company> is a company, we would like it to come out slightly differently. </newsbody> </newsarticle>
Exercise: >> exa13.xml • Create CSS style sheet for exa13.xml • call it exa13.css • you can probably reuse material from exa12.css(if you didn’t do that exercise, use exa12style.css) • Remember to use display:block or display:inline in every style rule • Link the document to the stylesheet with <?xml-stylesheet type='text/css' href='exa13.css'?> • View it • in Mozilla
CSS: Summary • Easy to learn • Also useful for HTML • Works in most browsers
What is DSSSL? • An ISO standard (ISO 10179:1996) • A style language • How do I format my SGML documents? • A transformation language • How do I transform my SGML documents? • A hopeless acronym • Document Style Semantics and Specification Language • A lost opportunity! • Sunk by webhead round-paren allergies
XSL: Extensible Stylesheet Language • A style language specifically for XML • W3C recommendation, Nov 1999 • Synthesis of the best of CSS and DSSSL • DSSSL processing and formatting models • CSS properties • XSL is XML • A declarative specification of both the "pattern" and the "action" of template rules. • More generic than CSS • style and rendering are just a special case of more general tree transformation processes • can be used for other transformations (XSL-T)