280 likes | 284 Views
Internet-based Application Architectures for the 21 Century: The Role of XML. Henry S. Thompson HCRC Language Technology Group Division of Informatics University of Edinburgh. Introduction. It's the Web, stupid! E-commerce, E-business, E-finance Web servers in your microwave
E N D
Internet-based Application Architectures for the 21 Century:The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University of Edinburgh
Introduction • It's the Web, stupid! • E-commerce, E-business, E-finance • Web servers in your microwave • ADSL by Easter • Seriously, in the Academy • Virtual communities of effort multiply our effectiveness • Shared resources • Shared tools • Enfranchisement is the key • Everybody can play • Whether Bill lets them or not
XML is ASCII for the 21st century • ASCII (ISO 646) solved a fundamental interchange problem for flat text documents • What bits encode what characters • (For a pretty parochial definition of 'character') • UNICODE/ISO 10646 extends that solution to the whole world • XML thought it was doing the same for simple tree-structured documents • The emphasis in the XML design was on simplifying SGML to move it to the Web • XML didn't touch SGML's architectural vision • flexible linearisation/transfer syntax • for tree-structured documents with internal links
Digression: Just what is XML? • It's a markup language used for annotating text • It is concerned with logical structure • to identify sections, titles, section headers, chapters, paragraphs,… • It is not concerned with appearance • you say 'this is a subtitle'not 'this is in bold, 14pt, centered' • you say 'this is an example'not 'this is in verbatim, indented by 5pts, ragged right'
Why is XML a big deal? • It is an official W3C Recommendation • It is vendor-independent, platform independent, application independent,… • unlike Word documents, RTF documents, PDF documents, Postscript documents,… • It is human readable • ditto (for most values of 'human')
Unformatted text Internet-based Application Architectures for the 21st Century:The Role of XMLLet's skip straight to an example of XML syntax for a simple bit of structure: <tip><emph>Never</emph> stand up in a canoe!</tip>
Formatted text Internet-based Application Architectures for the 21st Century:The Role of XML Let's skip straight to an example of XML syntax for a simple bit of structure: <tip><emph>Never</emph> stand up in a canoe!</tip>
XML marked up text <article> <title> Internet-based Application Architectures for the 21st Century: </title> <subtitle>The Role of XML</subtitle> <section> <para> Let's skip <emph>straight</emph> to an example of XML syntax for a simple bit of structure:</para><example><tip><emph>Never</emph> stand up in a canoe!</tip></example></para> </section></article>
Connecting structure and form • There is a stylesheet langauge called XSLT which will allow us to write simple style rules which will produce the formatted presentation from the structured version • For example <template match='emph'> <I><apply-templates/></I></template> will do part of the Transformation job
XML vs HTML • Isn’t XML just like HTML? • Like XML, you can sometimes use HTML to mark up things for content rather than appearance, • e.g. <H2>This is a subtitle</H2>appearance of <H2> is defined elsewhere, usually by someone else • but • a lot of HTML markup is for appearancee.g. <I>this is italics</I>, <B>this is in bold</B> • you couldn’t markup <RECIPIENT>Some names</RECIPIENT> • If you know HTML, easy to understand basic XML syntax
XML vs SGML • SGML is more complicated than it need be • because it was designed in the old days (12 years ago!) • XML is a simplified subset of SGML • much less minimisation • makes processing easier • qua complexity: sits somewhere between HTML and SGML
Who is in charge of XML? • XML is a W3C Recommendation • The W3C is The World Wide Web Consortium, a voluntary association of companies and non-profit organisations. Membership costs serious money, confers voting rights. Complex procedures, with the Chairman (Tim Berners-Lee) having ultimate authority, guided by a committee of the whole called the Advisory Council. • The XML recommendation was written by the W3C’s XML Working Group.
Digression, v.2: Just what is XML? • It's a markup language used for transferring data • It is concerned with data models • to convert between application-appropriate and transfer-appropriate forms • It is not concerned with human beings • It's produced and consumed by programs
XML as UI • A slogan of Adam Bosworth • I interpret it in two ways: • At the client end • Use XML plus XSL as the basis for what the user sees on his/her screen • Use XLinks from a master document to pull together disparate sources of information • At the server end • Use XML as a uniform interface for any data source onto the web • Not just documents, but E.g. Databases, process control information, stock quotes
Structured markup <POORDERHDR><DATETIME qualifier="DOCUMENT"> <YEAR>1996</YEAR> <MONTH>06</MONTH> <DAY>30</DAY> <HOUR>23</HOUR> <MINUTE>59</MINUTE> <SECOND>59</SECOND> <SUBSECOND>0000</SUBSECOND> <TIMEZONE>+0100</TIMEZONE> </DATETIME> <OPERAMT qualifier="EXTENDED" type="T"> <VALUE>670000</VALUE> <NUMOFDEC>2</NUMOFDEC> <SIGN>+</SIGN> <CURRENCY>USD</CURRENCY>. . .
What just happened!? • The whole transfer syntax story just went meta, that's what happened! • XML has been a runaway success, on a much greater scale than its designers anticipated • Not for the reason they had hoped • Because separation of form from content is right • But for a reason they barely thought about • Data must travel the web • Tree structured documents are a useable transfer syntax for just about anything • So data-oriented web users think of XML as a transfer mechanism for their data
The Cambridge Communiqué • A W3C Note resulting from a meeting this August (http://www.w3.org/TR/schema-arch) • Signalled a widespread acceptance of layering: "XML has defined a transfer syntax for tree-structured documents; "Many data-oriented applications are being defined which build their own data structures on top of an XML document layer, effectively using XML documents as a transfer mechanism for structured data; "
The Communiqué, cont'd • Called for support in XML Schema for specifying mapping between the XML document data model (or XML Infoset) and application-specific data models • XML Schema is a W3C recommendation-in-progress for definiing the structure of document families • A grammar for markup structure • E.g. artice -> title, subtitle?, section+ or POORDERHDR -> DATETIME, ORDERAMT
XML Schema: some details • Fortunately, XML Schema is actually notated in XML itself • So there are elements defined for use in schemas to define. . . • Elements :-) • Attributes • Types • A type is a collection of constraints on element content and attribute values • A type may be either • simple, for constraining string values • complex, for constraining elements which contain other elements
A simple example <!ELEMENT text (#PCDATA|emph|name)*> <!ATTLIST text timestamp NMTOKEN #REQUIRED> <xs:element name="text"> <xs:complexType content="mixed"> <xs:element ref="emph"/> <element ref="name"/> <xs:attribute name="timestamp" type="date" minOccurs="1"/> </xs:complexType></xs:element>
Richer type definition example <xs:complexType name='personName'> <xs:element name='title' minOccurs='0'/> <xs:element name='forename' minOccurs='0' maxOccurs='unbounded'/> <xs:element name='surname'/> <xs:attribute name='id' type='integer'/></xs:complexType> <xs:element name='owner' type='personName'/>
Mapping between layers • We can think of this in two ways • In terms of an abstract data modelling language • Entity-Relation • UML • RDF • In concrete implementation terms • Tables and rows • Class instances and instance variables • The first is more portable • The second more immediately useful
Mapping between layers 2 • Regardless of what approach we take, we need • A vocabulary of data model components • An attachment of that vocabulary to schema components • Sample vocabularies • entity, relationship, collection • table, row, column • instance, variable, list, dictionary • Where should attachment be specified? • In the schema • convenient • Outside it • modular
Specifying mapping in the schema • Probably reasonable if done in high-level (e.g. RDF, UML, ER) terms • See example infoset-xmpl.xml, infoset-uml.xsd
Specifying mapping outside • Requires some duplication of structural information • Encourages cross-language working • XSLT is the obvious candidate • See example infoset-xmpl.xsl
Compile the Mapping • Perhaps we can get the benefits of both approaches • Annotate the schema • Compile the annotations into XSLT • With bindings for separate implementations • Semi-structured data has a role to play here • Particularly when the data model antecedes the document model
Take-home message • The point at which idiosyncratic scripting takes over can be moved one layer up • Using public consensual declarative standards is a Good Thing • Interoperability makes things better for everyone