1 / 28

Internet-based Application Architectures for the 21 Century: The Role of XML

Internet-based Application Architectures for the 21 Century: The Role of XML. Henry S. Thompson HCRC Language Technology Group Division of Informatics University of Edinburgh. Introduction. It's the Web, stupid! E-commerce, E-business, E-finance Web servers in your microwave

Download Presentation

Internet-based Application Architectures for the 21 Century: The Role of XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Internet-based Application Architectures for the 21 Century:The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University of Edinburgh

  2. Introduction • It's the Web, stupid! • E-commerce, E-business, E-finance • Web servers in your microwave • ADSL by Easter • Seriously, in the Academy • Virtual communities of effort multiply our effectiveness • Shared resources • Shared tools • Enfranchisement is the key • Everybody can play • Whether Bill lets them or not

  3. XML is ASCII for the 21st century • ASCII (ISO 646) solved a fundamental interchange problem for flat text documents • What bits encode what characters • (For a pretty parochial definition of 'character') • UNICODE/ISO 10646 extends that solution to the whole world • XML thought it was doing the same for simple tree-structured documents • The emphasis in the XML design was on simplifying SGML to move it to the Web • XML didn't touch SGML's architectural vision • flexible linearisation/transfer syntax • for tree-structured documents with internal links

  4. Digression: Just what is XML? • It's a markup language used for annotating text • It is concerned with logical structure • to identify sections, titles, section headers, chapters, paragraphs,… • It is not concerned with appearance • you say 'this is a subtitle'not 'this is in bold, 14pt, centered' • you say 'this is an example'not 'this is in verbatim, indented by 5pts, ragged right'

  5. Why is XML a big deal? • It is an official W3C Recommendation • It is vendor-independent, platform independent, application independent,… • unlike Word documents, RTF documents, PDF documents, Postscript documents,… • It is human readable • ditto (for most values of 'human')

  6. Unformatted text Internet-based Application Architectures for the 21st Century:The Role of XMLLet's skip straight to an example of XML syntax for a simple bit of structure: <tip><emph>Never</emph> stand up in a canoe!</tip>

  7. Formatted text Internet-based Application Architectures for the 21st Century:The Role of XML Let's skip straight to an example of XML syntax for a simple bit of structure: <tip><emph>Never</emph> stand up in a canoe!</tip>

  8. XML marked up text <article> <title> Internet-based Application Architectures for the 21st Century: </title> <subtitle>The Role of XML</subtitle> <section> <para> Let's skip <emph>straight</emph> to an example of XML syntax for a simple bit of structure:</para><example>&lt;tip>&lt;emph>Never&lt;/emph> stand up in a canoe!&lt;/tip></example></para> </section></article>

  9. Connecting structure and form • There is a stylesheet langauge called XSLT which will allow us to write simple style rules which will produce the formatted presentation from the structured version • For example <template match='emph'> <I><apply-templates/></I></template> will do part of the Transformation job

  10. XML vs HTML • Isn’t XML just like HTML? • Like XML, you can sometimes use HTML to mark up things for content rather than appearance, • e.g. <H2>This is a subtitle</H2>appearance of <H2> is defined elsewhere, usually by someone else • but • a lot of HTML markup is for appearancee.g. <I>this is italics</I>, <B>this is in bold</B> • you couldn’t markup <RECIPIENT>Some names</RECIPIENT> • If you know HTML, easy to understand basic XML syntax

  11. XML vs SGML • SGML is more complicated than it need be • because it was designed in the old days (12 years ago!) • XML is a simplified subset of SGML • much less minimisation • makes processing easier • qua complexity: sits somewhere between HTML and SGML

  12. Who is in charge of XML? • XML is a W3C Recommendation • The W3C is The World Wide Web Consortium, a voluntary association of companies and non-profit organisations. Membership costs serious money, confers voting rights. Complex procedures, with the Chairman (Tim Berners-Lee) having ultimate authority, guided by a committee of the whole called the Advisory Council. • The XML recommendation was written by the W3C’s XML Working Group.

  13. Digression, v.2: Just what is XML? • It's a markup language used for transferring data • It is concerned with data models • to convert between application-appropriate and transfer-appropriate forms • It is not concerned with human beings • It's produced and consumed by programs

  14. XML as UI • A slogan of Adam Bosworth • I interpret it in two ways: • At the client end • Use XML plus XSL as the basis for what the user sees on his/her screen • Use XLinks from a master document to pull together disparate sources of information • At the server end • Use XML as a uniform interface for any data source onto the web • Not just documents, but E.g. Databases, process control information, stock quotes

  15. Application data

  16. Structured markup <POORDERHDR><DATETIME qualifier="DOCUMENT"> <YEAR>1996</YEAR> <MONTH>06</MONTH> <DAY>30</DAY> <HOUR>23</HOUR> <MINUTE>59</MINUTE> <SECOND>59</SECOND> <SUBSECOND>0000</SUBSECOND> <TIMEZONE>+0100</TIMEZONE> </DATETIME> <OPERAMT qualifier="EXTENDED" type="T"> <VALUE>670000</VALUE> <NUMOFDEC>2</NUMOFDEC> <SIGN>+</SIGN> <CURRENCY>USD</CURRENCY>. . .

  17. What just happened!? • The whole transfer syntax story just went meta, that's what happened! • XML has been a runaway success, on a much greater scale than its designers anticipated • Not for the reason they had hoped • Because separation of form from content is right • But for a reason they barely thought about • Data must travel the web • Tree structured documents are a useable transfer syntax for just about anything • So data-oriented web users think of XML as a transfer mechanism for their data

  18. The Cambridge Communiqué • A W3C Note resulting from a meeting this August (http://www.w3.org/TR/schema-arch) • Signalled a widespread acceptance of layering: "XML has defined a transfer syntax for tree-structured documents; "Many data-oriented applications are being defined which build their own data structures on top of an XML document layer, effectively using XML documents as a transfer mechanism for structured data; "

  19. The Communiqué, cont'd • Called for support in XML Schema for specifying mapping between the XML document data model (or XML Infoset) and application-specific data models • XML Schema is a W3C recommendation-in-progress for definiing the structure of document families • A grammar for markup structure • E.g. artice -> title, subtitle?, section+ or POORDERHDR -> DATETIME, ORDERAMT

  20. XML Schema: some details • Fortunately, XML Schema is actually notated in XML itself • So there are elements defined for use in schemas to define. . . • Elements :-) • Attributes • Types • A type is a collection of constraints on element content and attribute values • A type may be either • simple, for constraining string values • complex, for constraining elements which contain other elements

  21. A simple example <!ELEMENT text (#PCDATA|emph|name)*> <!ATTLIST text timestamp NMTOKEN #REQUIRED> <xs:element name="text"> <xs:complexType content="mixed"> <xs:element ref="emph"/> <element ref="name"/> <xs:attribute name="timestamp" type="date" minOccurs="1"/> </xs:complexType></xs:element>

  22. Richer type definition example <xs:complexType name='personName'> <xs:element name='title' minOccurs='0'/> <xs:element name='forename' minOccurs='0' maxOccurs='unbounded'/> <xs:element name='surname'/> <xs:attribute name='id' type='integer'/></xs:complexType> <xs:element name='owner' type='personName'/>

  23. Mapping between layers • We can think of this in two ways • In terms of an abstract data modelling language • Entity-Relation • UML • RDF • In concrete implementation terms • Tables and rows • Class instances and instance variables • The first is more portable • The second more immediately useful

  24. Mapping between layers 2 • Regardless of what approach we take, we need • A vocabulary of data model components • An attachment of that vocabulary to schema components • Sample vocabularies • entity, relationship, collection • table, row, column • instance, variable, list, dictionary • Where should attachment be specified? • In the schema • convenient • Outside it • modular

  25. Specifying mapping in the schema • Probably reasonable if done in high-level (e.g. RDF, UML, ER) terms • See example infoset-xmpl.xml, infoset-uml.xsd

  26. Specifying mapping outside • Requires some duplication of structural information • Encourages cross-language working • XSLT is the obvious candidate • See example infoset-xmpl.xsl

  27. Compile the Mapping • Perhaps we can get the benefits of both approaches • Annotate the schema • Compile the annotations into XSLT • With bindings for separate implementations • Semi-structured data has a role to play here • Particularly when the data model antecedes the document model

  28. Take-home message • The point at which idiosyncratic scripting takes over can be moved one layer up • Using public consensual declarative standards is a Good Thing • Interoperability makes things better for everyone

More Related