290 likes | 371 Views
XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001. Email p.johnston@ukoln.ac.uk URL http://www.ukoln.ac.uk/. Pete Johnston UKOLN, University of Bath Bath, BA2 7AY. UKOLN is supported by:. XML: a brief introduction.
E N D
XML :a brief introductionManaging networks : understanding new technologies, Birmingham, 13 September 2001 Email p.johnston@ukoln.ac.uk URL http://www.ukoln.ac.uk/ Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported by:
XML: a brief introduction • Markup & markup languages • SGML & XML • Two perspectives on XML • Some features of XML • XML & HTML • Uses of XML Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
Markup & markup languages • Markup • text added to the data content of a document in order to convey information about data • markup pre-dates computers! • Marked-up document contains • data and • information about that data (markup) Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
Markup & markup languages • Markup language • formalised system for providing markup • Definition of markup language specifies • what markup is allowed • how markup is distinguished from data • what markup means Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
Exercise 1 • From your own experience, can you suggest • some instances of where markup is used? • some examples of markup languages? Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
Standard Generalized Markup Language ISO 8879 : 1986 General, flexible, powerful Used (mainly but not exclusively) in large publishing environments Extensible Markup Language Recommendation of W3C, 1998, 2000 Subset of SGML Less flexible; easier to implement, use Used (increasingly) everywhere…often invisibly… SGML and XML Define means of describing tree-structured data in text format, using markup embedded in data Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
SGML and XML • SGML and XML • not strictly markup languages! • “meta-languages” - languages for describing markup languages • can define unlimited number of markup languages • All conforming languages can be processed by single program (“parser”) • Rules made public so any programmer can write parser • Many parsers available for application developer • Data independent of platform, vendor Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
A document perspective (1) • Individual documents have structure • component parts • relationships between parts • Physical structure • depends on medium • Logical structure • hierarchical, tree structure • independent of physical rendition • Document types • set of documents sharing common logical structural model Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
A document perspective (2) • Logical structure communicated to human reader through presentational conventions • Presentation defined by “procedural” markup • instructs “agent” what to do with text • e.g. how to format it • Problems • markup specific to processing system • specific to delivery medium • human interprets logical structure but software can’t Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
A document perspective (3) • Descriptive markup • identifies the logical components of a document • does not specify what procedures are to be applied to text • so e.g. how to format it must be specified separately • Benefits • markup (potentially) independent of processing system • permits reuse and delivery to multiple media • makes logical structure available to software • N.B. exchange requires consensus on what markup means! Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
Exercise 2 • HTML • conceived as describing the logical structure of hypertext document • acquired features which described presentation • extended by browser vendors • In the HTML examples, can you see • where markup describes presentation? • where markup describes logical structure? Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
A data perspective (1) • The structured document is just one type of structured data • Other types of structured data can be represented as tree-structures • A “serialization” syntax is useful for various sorts of structured data (relational, object etc.) • for exchange between application programs on different platforms, across networks etc. • SGML too complex, “heavyweight” - but XML ideal Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
A data perspective (2) • A “document” might be any collection of information processed as a unit • a report • a patient record • a purchase order transaction • a configuration file for an operating system • some “structured information about a resource” (a metadata record) • … • etc! • Applications less concerned with publishing, formatting, presentation Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
XML : elements • XML uses embedded tags to delimit and label parts of document • tags <…..> • Elements • containers delimited by tags which include element type name • start tag <element> • end tag </element> • Elements may contain • character data • other elements • both of the above • nothing (empty elements) <element/> • Document element as root of element tree Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
XML : attributes • Attributes • pairs of names and values • occur inside element start tag, after element type name • <element attribute=“value”> • Element can contain only one occurrence of each attribute • Attribute values may contain • character data only • Attribute values must be surrounded by quotes Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
XML : elements & attributes • Nouns and adjectives? • use character data for “content” • use attributes for “information about content” • Document-centric view? • No hard and fast rules • Design decisions tend to be based (wrongly!) on behaviour of tools • XML documents are human-readable… • … but ease of human-readability may not be the most important consideration in their design Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
XML : document types & vocabularies • “XML lets me make up names for element types! Great!” • But…. • XML says nothing about what your names mean • will a human recipient of your document recognise your <operation> element? • will a software agent process your <operation> element correctly? • Communication requires consensus on • structural model of class of document/data • labelling of components • semantics of components • Shared use of common XML “vocabularies” Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
XML : DTDs, XML Schemas • Two methods to codify syntax rules of vocabulary used to describe document type • what markup is allowed • structural constraints on use of markup • say nothing about what markup means • Document Type Definition (DTD) • inherited from SGML • part of XML Recommendation • XML Schema • recent recommendation of W3C • support for data-typing i.e. tighter control on element content • support for combining vocabularies • use XML syntax Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
XML : Validation & well-formedness • Validation • parser can check markup of individual document against rules expressed in DTD or Schema • authoring tool can enforce rules of DTD/Schema while document is edited • Well-formed documents • not checked against DTD/Schema, but do follow basic syntax rules e.g. • all tags use proper delimiters • all elements have start and end tags • all elements nested • attribute values in quotes • appropriate use of special characters Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
Exercise 3 • Well-formedness • Identify the errors which mean that the three examples are not well-formed XML • How would you correct the errors? Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
XML : namespaces (1) • Applications wish to use element from multiple vocabularies (DTDs/Schemas) • particularly true of metadata applications • Problems of “name collisions” • <surgery> in GPs Directory Schema • <surgery> in MPs Appointments Schema • XML Namespaces • recommendation of W3C • provides universal naming mechanism • A Namespace is a collection of names • A Namespace is itself given a name, which has the form of a URI Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
XML : namespaces (2) • Element type names and attribute names can be qualified by a namespace name (a URI) • Association with namespace through use of a namespace prefix • Declaration of namespace • xmlns:health=“http://nhs.gov.uk/xml/” • xmlns:parl=“http://gov.gov.uk/xml/” • Use of qualified name • <health:surgery> • <parl:surgery> Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
XML and HTML • HyperText Markup Language (HTML) • recommendation of W3C (version 4.01) • designed as an application of SGML (not XML) • simple, easy to create • (partial?) support in browsers, editors • mixes description of structure and presentation • Browsers • permissive – will display invalid HTML • support proprietary extensions • Context • explosion of Web • new devices Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
XML and HTML (2) • XHTML 1.0 • expression of HTML 4.01 as XML (not SGML) • same features but restrictions on syntax • case sensitivity, XML well-formedness rules • current W3C recommendation for creation of docs for Web • XHTML 1.1 • modularisation of XHTML • separation of structural markup from presentational markup • support for managing extensions Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
Uses of XML (1) • Data (and metadata) exchange • e-commerce • e-government (http://www.govtalk.gov.uk) • rights management • bibliographic data • news syndication • scientific data • health - patient records • (… plus hundreds more…) • Web services • Within systems and between systems • Many standards/protocols built on XML Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
Uses of XML (2) • Storage • publishing • scholarly texts • archival finding aids • document management • … • preservation Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
XML : summary (1) • Means of describing structured data in text format • Independent of platform, vendor • reuse of data • exchange of data • Used • for many types of structured data • in many different applications • both for storage and exchange • data may be stored in database, exposed as XML Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
XML : summary (2) • Use of XML • usually invisible to end-user • increasingly invisible to information manager? • generated and consumed by software • requires consensus amongst communication partners Managing networks: understanding new technologies, Birmingham, 13 Sep 2001
Acknowledgements • UKOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based. • http://www.ukoln.ac.uk/ Managing networks: understanding new technologies, Birmingham, 13 Sep 2001