450 likes | 464 Views
Explore essential XML-related standards such as XML syntax, Namespaces, Schemas, XML linking, XPath, XPointer, and XLink, including XML Specifications, Namespaces, XML Schemas, and XML Information Set. Understand the significance and application of these standards in various domains.
E N D
Overview of XML-related standards Steven J. DeRose, Ph.D. Brown University Scholarly Technology GroupSteven_DeRose@brown.eduhttp://www.stg.brown.edu/~sjd
XML and related specs • XML: The basic syntax • Plus Namespaces, Schemas, InfoSet • DOM: API to the Information Set • XML Linking • XPath: Expressions to find XML nodes • XPointer: XPath++ for addressing • XLink: hypermedia connections • Stylesheet Attachment • XSL: stylesheets and transforms
XML specification • A “Recommendation” since 2/1998 • The highest level for a W3C specification • Defines the syntax/grammar • Not any particular processing/semantics • Schemas or DTDs define applications (poem, manual, eCommerce,...) • All these can be parsed by generic XML, just as new words can be readily fitted into existing sentence structures • Schemas are political as well as technical
XML Namespaces • Disambiguate element type names <head><html:title>Oncataloging</html:title>…<biblio><entry id='DeRo98'> <loc:title>Navigation, Access, Control… • Declaring prefixes <sec xmlns:loc="http://foo.com/mynamesp”xmlns:html='http://www.w3.org/1999/xhtml' xmlns="http://…"> <loc:title>… • Declaration without prefix sets default • Attributes can have namespaces • No renaming (x:foo to y:bar)
XML Schemas • Let you define a document type • What elements/attributes are defined? • Where can they occur? • What content is allowed? • What datatypes are represented? • Required for validation • Similar to DTDs, but • More powerful (esp. for datatyping) • Use XML syntax
XML Information Set • What data in XML document “counts”? • Elements, attributes, content • Order and hierarchy of nodes • Required for interoperability • Applications must count nodes consistently • Not whitespace inside tags • Not which kind of quotes around attributes • Candidate recommendation 2001-05-14 • http://www.w3.org/TR/xml-infoset
These arealwaysleaves in the Infoset 7 types of Infoset Nodes • Root: Above the document • <?foo ?> <doc>…</doc> <!-- hi --> • Element: Main structure • <div n='1'>…</div> • Text: Spans of unbroken text • Attribute: Properties of elements • Namespace: Prefixes/URIs • Processing Instr: <?…?> • Comment: <!-- … -->
Root Example ROOT doc title abstract chapter chapter chapter ID= 'intro' ID='summary' ID='concepts' Introduction title section section Attribute p title p p list Text node (others omitted) ... ID='p37' a xref Element name='baz' href='#id(intro)'
More Infoset details Namespace node http://www.w3.org/1999/xhtml item Mixed content PI p Comment TEX:pgbrk Added 7/00 Everything is intertwingled em deeply XML Comment Processing instruction
DOM • "Document Object Model" • An API for accessing the Infoset • Many tools use this • Level 1 complete • http://www.w3.org/TR/REC-DOM-Level-1 • Level 2 core complete • http://www.w3.org/TR/DOM-Level-2-Core
XML Base • Similar to the HTML <base> element • Useful for keeping URIs simpler and uniform. • Applies to relative URLs <html><head><base href="http://www.example.com/">…</head> <body>… <a href="fig/mosquito.png"> • The hrefs combine to make whole URI: http://www.example.com/fig/mosquito.png
XML Base • XML Base provides similar feature • By a reserved attribute <?xml version="1.0"?> <doc xml:base="http://eg.org/today/"> See <link xlink:type="simple" xlink:href="new.xml">the news</link> • Applies to attributes & descendants • Can be overridden on descendants • Final REC as of 2001-06-27 • http://www.w3.org/TR/xmlbase/
Stylesheet attachment • Lets documents point to stylesheets • Based on HTML <link type='stylesheet'> • Multiple, anywhere in XML prolog • May point to CSS, XSL, etc. • Example: • <?xml-stylesheet alternate="yes" href= "mystyle.css" title="Medium" type="text/css"?> • Equivalent of HTML:<LINK href="mystyle.css" title="Medium" rel="alternate stylesheet" type="text/css"> • REC: http://www.w3.org/TR/xml-stylesheet
XSL specification • Stylesheet language • Based on ISO DSSSL and W3C CSS • 2 major pieces: • XSLT: document transformation • Builds on XPath (more later) • Match elements, then construct output • XSL-FO: Formatting objects • To actually render blocks, fonts, tables, etc. • Hypermedia support unfinished (=CSS) • http://www.w3.org/TR/xsl/
W3C Architecture UI Canonical Plenary Infoset Core Fragments Schema Query XSL Linking DOM Namespaces Assoc. Style XML Base XPointer XPath XLink XInclude Current XML organization XML Plenary coordinates several WGs Some related WGs have liaisons
XML-Linking specifications • XPath: expressions on infoset nodes • REC: http://www.w3.org/TR/xpath • XPointer: XPath + ranges, in URIs • CR: http://www.w3.org/TR/WD-xptr • XLink: gather locations to make links • REC: http://www.w3.org/TR/xlink/ • (XML Base)
XML-Linking goals: end user • Links from un-writable documents • Which is most of the Web, for any person • Perhaps the most important single feature • ->Bidirectional and multi-ended links • ->Annotations and annotation sharing • Dynamic updates, patches, highlighting • Precise link attachment in any media • Large sets/databases of managed links • An entirely new market for links per se • Anyone can publish/sell their commentary
Pointing vs. linking • In HTML, many things are combined: <a href="eg.org/foo">wow</a> • Technically: • "eg.org/foo" is a pointer (namely a URI) • The abstract connection itself is the link • The <a> element is a link representation • "wow" is the localanchor • Anchors are also called link-ends • Data at eg.org is the remoteanchor • HTML specifies the link behavior
ROOT doc title abstract chapter chapter chapter intro concepts summary Introduction title section section p title p p list ... p37 a xref name='baz' href='#id(intro)' XPointer: locators <xml>… <xref target="http://z.com/foo.xml#id('p37')">See Section 1.</xref> A way of locating data in XML structure — used to attach link end(s) to data A pointer identifies or locates some part of a document -- this is only the yellow part above
XLink: connections Someplace Someplace • Describes a relationshipof referenced location(s), • To each other • To descriptions • XLink providessome key ones A link connects data and meta-data portions, including their relationship -- really just the lines role role role A link may be expressed at a unique source end, or out in a link database Someplace Someplace Someplace
XPointer… • Locates parts of XML resources • Even things without IDs • Even things that aren't whole nodes • XPointer adds (beyond XPath): • Way to refer to point and range selections • Way to use inside URI fragment identifiers • TEI “extended pointer” notation plus XPath logical expressions • Typically, a browser might load a document and scroll to/highlight the part
Anatomy of a URI reference URI reference URI http://example.com/foo.htm#bing scheme domain path fragment identifier XPointer defines this part
Fragment identifiers • Part of URIs after "#" • Says where in document is actual target • Separate form for each media type • Identifiers for graphics for text • IETF MIME definition specifies form • HTML • To scroll to <a name="coyote"> http://example.com/hello.html#coyote
The 3 XPointer/XPath forms • Bare names • An XML "name"* finds element with that ID • For (X)HTML compatibility • HTML uses "NAME", not ID • Child sequences • Stepwise down through elements: /1/4/27/2 • May start with an ID: intro/4/3/2 • Full XPointers • scheme1(args) scheme2(args)… • For now, the only "scheme" is "xpointer" *Name: Letters, digits, hyphen, underscore, period.
XPointer's 2 parts • Provide 'scheme' mechanism • Identify media-specific pointer types • Allow multiple ones to co-exist • Pointing methods for XML • Point to ranges, sets, id's, coords… • Point descriptively
XPointer schemes • Each media type needs pointer type • pngRect(0,10 100,200) • vrml(camera=1,2,3 light=4,50,500) • map(W010’/ N5130’) • Xml(…) • Schemes label fragment identifier types • #scheme1(args) scheme2(args)… • Escape any extra ( ) -- tlg('^(apax') • XPointer() is the first scheme
Multiple schemes in a URL? • When a server responds to a URI, it • Checks what media the client can handle • Picks one of those to send • “content negotiation” • If a visually-impaired user clicks • <a href="http://www.example.com/foo.gif# gif(0,0 1,1) xpointer(id(chap1))"> • The server may fall back to an XML file • The client tries fragment identifiers left-to-right, and uses the first one that works
Anatomy of a location step predicate child::para[@type="weak"][3] axisname node test literalstring position test attributereference Case matters Finds the third child of the current node that (a) is an element of type 'para' and (b) has a 'type' attribute whose value is 'weak'
Summary: axes and functions • root( ), id( ) • parent, self, child • ancestor, ancestor-or-self • descendant, descendant-or-self • preceding-, following-sibling • preceding, following • attribute, namespace • here( ), origin( ) • String-range(), range-to() Absolute Relative Absolute
Points and Ranges Hello, world. • Point • What you get by click-selection • Gap before/after node or char • Range • What you get by drag-selection • From a start point to an end point • Not generally a WF XML subtree • May partially contain some elements: • <p>Hello, world.</p><p>Hi, back</p> • Crucial for creating hypertext links • How often do you click/drag exactly one entire element? Hello, world.
XLink is a language that... • Lets you invent your own linking elements and their meanings • In keeping with XML approach overall • Lets you create link databases • Links become first-class objects in the model • Provides some basic traversal behavior • E.g., “Open the target in a new window” • The rest is left to a style mechanism such as XSL
XLink terminology • Linking element • Identifies, connects, and describes anchors • Locator • Locateses some link end (anchor)’s data • Link end or anchor • A data portion reachable as part of a link • Arc • Explicit connection between two link ends • Resource • Anything you can point at on the Web • Using an arc is called Traversal
What links do with link-ends • A link identifies where its ends are • Using some kind of locators • URI#XPointer will be the locator for XML • URI#scheme()scheme() in general • A link attaches metadata to each end • Its formal role in relation to the other ends • A title by which to refer to it (say, in menus) • Some traversal behaviors • Arcs to say which traversals happen • Link itself can also have type, other info
Inline links • Linking element itself (better, the origin() end) is one of the link’s ends
Out-of-line links • Linking element itself isn't automatically made into one of its own resources Requires that there be a way to find link databases in the first place
Link need not be“at” a link-end Anatomy of an XML link <html>Knuth’s right.</html> <link type="annotated-reference"> <loc role="ref" href="xptr.xml#child(2,div)"> <loc role="src" href="knut73.tex#s4.2.2"> <loc role="com" href="http://x.com/note.html"> <link> Each link-end can be described Link may have any number of ends <!DOCTYPE spec...<spec><div>…</div><div> <head>...</head>... Link-ends need not be XML \{… 4.2.2: A tree is a set of nodes where each node has one parent, except for a root node, which has none….} Link-ends need not be marked up
Arcs • Arcs specify traversal rules • Multi-ended links may restrict travel among their endpoints • Restrictions generic or app-specific • Arcs enable the description of both • An arc is a pair of roles, plus metadata • Enables traversal between ends with the given roles • May be multiple locators per role (useful for document assembly, multiple-choice travel)
Arc example: fuel-type annotations Warning: explosive Warning:toxic gasoline fuel-type warning warning ARCS: vehicle fuel-type fuel-type warning Link body 1 vehicle vehicle vehicle
How to detect links • Could have any name and content at all • <footnote>, <criticism>, … • xlink:type attribute marks linking elements for applications to find: <!ELEMENT footnote EMPTY><!ATTLIST footnotexlink:type CDATA #FIXED "simple" xlink:href CDATA #REQUIRED> • For example: ...has studied the issue.<footnote href="http://www.doctools.com" /> Defaultvalue forattribute
Arcs and Traversals • Traversal is split into: • Behavior • Author's intention for behavior of a link. • Input to style mechanism • Not a presentation command • Actuation • Defines the event that triggers a link • Events are very generic, intentionally
Two kinds of behavior policies • show attribute • new to traverse and provide new “context” • replace to display in existing “context” • embed to display in the body of the initiating resource • Some semantic details are left unspecified: combining multiple ends, style inheritance, etc. • actuate attribute • onRequest to require external request • onLoad to traverse when link processed
Link databases let you… • Attach descriptive information from afar • Annotate other people's stuff • Maintain links more easily • When a destination changes, you don’t have to touch documents with links to it • Engage in online commerce in links • Express, package, and sell point-of-view • Collect out of line links as databases
External Linksets • Users will have persistent linkdbs • Subscriptions, interest groups, private,... • Document can specify relevant link dbs • Linked by special type of extended link • Included within regular documents too • LinkDBs enable link management • Needed to author using external links • Example: Public annotations on….
An external Linkset Instance <xls><linkbase xlink:href="linkset1.xml" /><linkbase xlink:href="linkset2.xml" /><linkbase xlink:href="linkset3.xml" /> </xls>