590 likes | 679 Views
Life after HTML. an introduction to the future of electronic publication. Lou Burnard Humanities Computing Unit Oxford University http://users.ox.ac.uk/~lou. What went wrong?. The web today!!!. who cares?. application developers and maintainers (the desperate perl hacker)
E N D
Life after HTML an introduction to the future of electronic publication Lou Burnard Humanities Computing Unit Oxford University http://users.ox.ac.uk/~lou
What went wrong? The web today!!!
who cares? • application developers and maintainers (the desperate perl hacker) • tools builders (the mythical CS grad student) • document creators and conservators • document managers • you and me, anxious to communicate
Information Interchange (1) A B E C D 20 translations required (n2-n)
Information Interchange (2) A CommonInterchangeStandard B E C D 10 translations required (2n)
What is XML? • eXtensible Markup Language • An activity of the World Wide Web Consortium (W3C) • original goal: delivering SGML on the web • new goals: refocus web development • Rewriting the rules of the game? • Adding intelligence to data • Database exchange • Client-side processing • Access to richer data • Better data management http://www.w3.org/pub/WWW/Markup/Activity
The XML WG Hall of Fame Jon Bosak, Sun (Chair) Paula Angerstein, Texcel Tim Bray, Textuality & Netscape James Clark Dan Connolly, W3C Steve DeRose, INSO Dave Hollander, HP Eliot Kimber, Isogen Tom Magliery, NCSA • Eve Maler, ArborText • Murray Maloney, Muzmo &Veo Systems • Makoto Murata, Fuji Xerox • Joel Nava, Adobe • Conleth O'Connell, Vignette • Jean Paoli, Microsoft • Peter Sharpe, SoftQuad • C. M. Sperberg-McQueen, UIC • John Tigue, DataChannel (plus a cast of hundreds on the SIG)
What is a document? • content: the components (words, images etc). which make up a document • structure: the organization and inter-relationship of the components • presentation: how a document looks and what processes are applied to it
Separating these things means... • the content can be re-used • the structure can be formally validated • the presentation can be customized for • different media • different audiences • … in short, the information can be uncoupled from its processing • This is not a new idea! But it’s a good one...
The XML family • XML (Extensible Markup Language): • A subset of SGML (ISO 8879) designed for easy implementation • XLink (Extensible Linking Language): • A set of standard hypertext mechanisms based on HyTime (ISO/IEC 10744) and the Text Encoding Initiative (TEI) • XSL (Extensible Stylesheet Language): • A standard stylesheet language for structured information derived from DSSSL (ISO/IEC 10179) and key CSS concepts
like HTML, XML must... • be usable on the net (but not restricted to it!) • support a wide variety of applications • be compatible with SGML • be easy to process • have few optional features (ideally none) • be human-legible and reasonably clear • be specifed in a way that is both formal and concise
unlike HTML... • XML is an extensible markup language • XML markup can be verified • XML markup reflects themeaning of your data, not its appearance
Some intelligent questions... Perec, Georges Life - a users manual. Collins, 1988. Translated from the French [La vie mode d’emploi] by David Bellos. xviii+581 pp. 841.941 Literature - French - 20th century • what’s the author’s name? • what titles have the classification …? • what authors have the name… ? • what translators are there ? • which books have more than 400 pages?
… which non-extensible markup doesn’t help us answer <p><b>Perec, Georges</b> <I>Life - a users manual. Collins, 1988. Translated from the French </I>[La vie mode d’emploi] <I> by David Bellos. xviii+581 pp. 841.941</I> Literature - French - 20th century Perec, Georges Life - a users manual. Collins, 1988. Translated from the French [La vie mode d’emploi] by David Bellos. xviii+581 pp. 841.941 Literature - French - 20th century
Extensible (user-defined) markup <author>Perec, Georges</author> <title>Life - a users manual</title><publisher>Collins</publisher><publDate>1988</publDate><note>Translated from the French [<title>La vie mode d’emploi</title>] by <translator>David Bellos</translator></note> <pages>xviii+581</pages> <ddc>841.941</ddc><keywords><term>Literature</term> <term>French</term> <term>20th century</term></keywords>
Verifiable markup • well-formed XML markup • tags (etc.) are syntactically correct • every tag has an end-tag • tags are properly nested • valid XML markup • only declared tags are used • all tag occurrences conform to specified positional constraints
Well-formedness <?xml version=“1.0” standalone=“yes”?> • <greeting>hello world!</greeting> • <greeting>hello world!</Greeting> • <grunting> <greeting>hello</greeting> world!</grunting>> • <greeting><grunting>hello</greeting> world!</grunting> • <greeting type=“loud”>ho!</greeting> • <greeting type=loud>ho!</greeting> • <greeting file=“ho.wav”/> • <greeting file=“ho.wav”>
A Valid XML Document • invokes a Document Type Declaration (dtd) • a dtd specifies • names for all your tags • names and default values for their attributes • rules about how tags can nest • names for re-usable pieces of data (entities) • and a few other things • XML dtds are much simpler than SGML dtds
A simple dtd <!ELEMENT greeting (#PCDATA)> a greeting consists of character data... <!ELEMENT name (#PCDATA)> <!ATTLIST name reg CDATA #IMPLIED> as does a name, which can also have an attribute called reg <!ELEMENT grunting (#PCDATA|greeting|name)* > a grunting contains zero or more of the other things, possibly mixed up with some character data
When do you need a dtd? • at document preparation time (definitely) • validation, checking, consistency • at document processing time (probably) • simplifies generic/specific processing • may clarify intended semantics • at document delivery time (possibly) • strictly unnecessary for wf docs • but reduces processing effort
Where do I get a dtd? • flood of industry announcements • some recent examples • Resource Description Framework (for metadata) • Channel Definition Format (for push technologies) • Electronic Data Interchange (banking etc.) • Handheld Device Markup Language (sic) • Chemical Markup Language (chemical modelling) • Math Markup Language (maths!) • Text Encoding Initiative (scholarly texts)
The meaning of markup • ontologically speaking… • markup may be performative or descriptive • markup asserts an intention or interpretation which cannot be formally defined • tags have no predefined meaning • presentation or behaviour of an XML document is specified elsewhere
Where is the behaviour of an XML document defined? • in a stylesheet • using XSL or CSS • possibly embedded in a program applet, or script, or JAVA bean • defined for that particular dtd, tagset, or tag • by reference to pre-existing mutual agreement amongst user communities • aka “namespaces” • by reference to a Document Object Model
Xlink: the future of hypertext We believe in the interconnectedness of all things F. Braudel
Some linking terminology • a link asserts a relationship between linkends • links may be typed • link behaviour is what happens when a link is activated • transclusion: new content appears without displacing current content • linkends may be single or multiple resources • linkends may be target or source with respect to each other
Linking in HTML • link behaviour is tied to particular tags • only two types • <A> replace in same (or new) window • <IMG> transclude inline (usually) • link targets are always whole documents • cannot reassemble fragments • cannot add links to read-only documents • linkends are inherently fragile
Xlink aims to do better • formerly XLL, formerly XML-Link • two components • Xlink • XPointer • working drafts at http://www.w3.org/TR/WD-xlink http://www.w3.org/TR/WD-xptr • WARNING: This is all subject to change!
XLink goals (1) Provide advanced linking constructs within XML documents(XLink) • To anything
Xlink goals (2) • Provide advanced addressing into XML document structure(XPointer) • From anything
XPointer is… • for pointing to subparts of XML resources (even if they don’t have IDs) • based on the Text Encoding Initiative (TEI)“extended pointer” notation • usable in association with URLs/URIs <a href="http://some.url.com/Thing/foo.xml#id(foo)"> <!ENTITY bar SYSTEM "http://some.url.com/Thing/foo.xml#id(foo)">
An XPointer consists of • a series of location terms in the form termname(parms) • terms are separated by a dot id(foo).child(3,SEC).child(4,LIST) • each term is the location source for the next • you can also use terms which point at strings, attributes, etc.
XPointer advantages • a compact syntax which scales well • as robust as possible • any changes “off the path” won’t (necessarily) break the link • IDs are as safe as it gets... • if there’s an ID nearby, point to it and walk down/up • if not, walk down from the root
Xpointers: a flavour • An Xpointer addresses the tree that the markup represents, not the markup itself • Location terms address particular nodes in the tree e.g. • absolutely eg id(), html() • relatively eg child(), descendant(),ancestor(), psibling(), fsibling() • string and attribute matches • can also specify spans
id() and html() id(concepts) html(baz)
child() and descendant() child(1,chapter).child(2,section) descendant(1,abstract)
Xpointer examples id(intro).child(3,div1) the third <div1> within the element with identifier INTRO html(foo).child(2,div1).(4,p).child(1,quote,lang,”LAT”) the first <quote> whose LANG attribute is set to “LAT” within the fourth <p> of the second <div1> of whatever element contains an HTML <A NAME=“#foo”> descendant(#all,para) every <para> within the currentlocation source span(child(1,pb,n,”14”),child(1,pb,n,”23”)) everything between the first <pb> whose N attribute is “14” and the first one whose N attribute is “23”
Xlink proper • allows you to invent your own linking elements and define their behaviour • the xml:link attribute is used to specify the linking properties of your element • allows you to create link databases • “standoff” markup allows you to link to non-modifiable documents • inline vs out-of-line links
Link behaviours • show attribute • new/replace/embed • actuate attribute • user/auto • behavior attribute • “for other instructions”
The importance of XLink • Not just about fancy capabilities and new ways of associating information • Promotes the creation of advanced information structures and site management • Makes possible an industry devoted to knowledge management (that's us!) • For example: OED + LION
script Transformation Tool valid XML documents XML or non- XML documents transforming xml documents
XSL: the final piece • Standard Style Sheet Language • Combines DSSSL “flow objects” and CSS objects • Uses XML syntax (rather than Scheme) • Also uses ECMAscript for extensions • Automatic conversion from CSS
XSL is the next step for publishing • XSL is not just about translation • user-configurability • enhanced clients • Single source for print and online delivery • XSL is intended to complete the internationalization of publishing
Tools you can use now • Editing/creating documents • emacs + psgml; XED; any SGML editor • Parsers • free standing: SP • java applets: (many) • embedded in applications http://www.stud.ifi.uio.no/ ~larsga/linker/XMLtools.html
Tools you can use now • Browsers and viewers • Hybrick; IE5; Netscape 4; Amaya, Xmetal… • Toolkits • DOM support now in Perl, TCL… • Transformers • Jade
The wider picture • XML is not just about exchanging data between machines • It's also about communication between humans • XML is not just about the web • It's about information in general • XML is not just about technology • It's also about the relationship between content creators and software vendors
How we will use XML (1) xml Heterogenous clients interfacing with a single database