330 likes | 349 Views
Learn about XML being a meta-markup language derived from SGML, its anatomy, conformance, and more. Explore how XML reshapes data representation online.
E N D
Introduction to XML John Arnett, MSc Standards Modeller Information and Statistics Division NHSScotland Tel: 0131 551 8073 (x2073) mailto:John.Arnett@isd.csa.scot.nhs.uk http://isdscotland.org/xml
Contents • What is XML? • Anatomy of an XML Document • Conformance and Validation • Summary • Find Out More
What is XML? • a programming language • a software panacea • an object-oriented technology • HTML with funny tags • a replacement for HTML… but it is re-shaping publishing on the web • XML is not…
What is XML? • Meta-markup language derived from SGML (Standard Generalised Markup Language) • Open Standard, currently XML 1.0 2nd edition (W3C Recommendation 6 October 2000) • Stands for Extensible Markup Language
What is XML? • XML is the universal format for structured documents and data on the Web • A data object is an XML document if it is well-formed, as defined in [the W3C] specification(more on this later) • W3C says
ID SURNAME FORENAME SEX DOB 134376 Jones Ian 0 06011971 198457 McKenzie Alison 1 23081983 111672 Martin Lesley 0 12111979 147678 Jackson Sarah 1 15061976 Flat file, database, spreadsheet, etc What is XML? • Data Content and Presentation Sample dataset
Structured • Searchable • Easy to understand • Portable What is XML? • Record – data oriented structure 111672 Martin Lesley 0 12111979
Easy to understand • Portable • Structured • Searchable What is XML? • HTML – document oriented structure Record Id: 11672 Surname: Martin Given Name: Lesley Sex: Male Date of Birth: 12 November 1979 <h1>Record Id: <font color="red">11672</font></h1> <table><colgroup><col align="left"></colgroup> <tr><th>Surname:</th><td>Martin</td> </tr><tr><th>Given Name:</th><td>Lesley</td> </tr><tr><th>Sex:</th><td>Male</td></tr> <tr><th>Date of Birth:</th><td>12 November 1979</td></tr> </table>
Easy to understand • Portable • Structured • Searchable What is XML? • XML to the rescue! <Record recordId=“11672"> <Surname>Martin</Surname> <GivenName>Lesley</GivenName> <Sex>M</Sex> <DateOfBirth> <Day>12</Day><Month>11</Month><Year>1979</Year> </DateOfBirth> </Record>
What is XML? • Text based • Open standards • Widely used • HTML and XML are…
What is XML? • Structured • Separates data from presentation • Self-describing • Searchable • Extensible • i.e. any number of tags allowed • But XML also…
Anatomy of an XML Document • character data • tab, carriage return and line feed • Unicode characters • markup • XML documents consist of text
Anatomy of an XML Document • Markup <?xml version="1.0" encoding="UTF-8"?> <Message> <!-- this is an xml comment --> <MessageBody>Hello, World Wide Web!</MessageBody> </Message> • start-, end- and empty element tags • tag names are case sensitive! • entity and character references • comments
Anatomy of an XML Document • Character data <?xml version="1.0" encoding="UTF-8"?> <Message> <!-- this is an xml comment --> <MessageBody>Hello, World Wide Web!</MessageBody> </Message> • Reserved characters • &, <, >,‘ and “
Anatomy of an XML Document • Declaration <?xml version="1.0" encoding="UTF-8"?> <Message> <!-- this is an xml comment --> <MessageBody>Hello, World Wide Web!</MessageBody> </Message> • Optional first line of markup (but W3C recommended) • Used to match documents to parsers
Anatomy of an XML Document • Root Element <?xml version="1.0" encoding="UTF-8"?> <Message> <!-- this is an xml comment --> <MessageBody>Hello, World Wide Web!</MessageBody> </Message> • Uniquely named element • Contains all the data and links to other documents
Anatomy of an XML Document • Elements <Book>XML Bible <Price>24.99</Price> <img src=“book.gif"/> <Author>E.R. Harold</Author> <Publisher>J. Forbes</Publisher> </Book> • Define the content of the XML document • May contain other elements, character data or can be empty
Anatomy of an XML Document • Attributes <BookCatalogSubject="XML"> <Book Title="XML Bible" Price="24.99“/> <Book Title="XML How To Program" Price=“19.99“/> <Book Title=“Definitive XML Schema“ Price=“44.99“/> </BookCatalog> • Add data about the elements
Anatomy of an XML Document • Built-in entities & = & “ = " < = < > = > ‘ = ' • Handling reserved characters • CDATA Sections <CodeSnippet> <![CDATA[if(this->getX() < 5 && values[0] => 10) cerr << "out of range";]]> </CodeSnippet>
Anatomy of an XML Document • Namespaces • Preventing naming collisions <order xmlns:cust="http://www.example.com/custDetails“ xmlns:book="http://www.example.com/bookDetails" xmlns="http://www.example.com/order"> <cust:title>Dr</cust:title> <cust:name>Peter Parker</cust:name> <book:title>White Teeth</book:title> <book:price>5.99</book:price> <orderNumber>AYT2379</orderNumber> </order>
Conformance and Validation • One root element • Start and end tags match <Tag>content</Tag> • Empty elements are terminated as<Tag/> • Tags are correctly nested <Parent><Child></Child></Parent> • All attributes enclosed in “quotes” • All XML processors must check well-formedness constraints
Conformance and Validation • specified in Document Type Definitions (DTDs) or Schemas • a valid XML document must be well-formed • a well-formed document need not necessarily be valid • Validating XML processors check against validity constraints
Structure and order of child elements <!ELEMENT Product (Name, Size?)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Size (#PCDATA)> • Element attributes <!ATTLIST Product EffDate CDATA #IMPLIED> Document Type Definitions • DTD syntax able to specify • limited number of data types • default and fixed attribute values
Document Type Definitions • Easy to understand and implement • Lightweight alternative to schemas • But… • use non-XML syntax • only limited support for data typing and namespaces • difficult to extend • DTD’s
Schemas • Uses XML syntax • Provides built-in and supports user-defined data types • Supports namespaces • Provides several extensibilty mechanisms • W3C Schema
Schemas • Schemas therefore more flexible… <xs:element name="Product"> <xs:complexType> <xs:sequence> <xs:element name=“Name" type="xs:string"/> <xs:element name=“Size" type="xs:positiveInteger” minOccurs="0"/> </xs:sequence> <xs:attribute name=“EffDate" type="xs:date"/> </xs:complexType> </xs:element> • but harder to understand than DTD’s <!ELEMENT Product (Name, Size?)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Size (#PCDATA)> <!ATTLIST Product EffDate CDATA #IMPLIED>
In Summary… • A language for describing markup languages • Extensible, ie. define own tags • Readable, structured and self describing • Documents must be well-formed • Documents may be validated using DTD’s and/or Schemas
Find Out More • World Wide Web Consortium • www.w3.org • W3C XML v1.0 Specification • http://www.w3.org/TR/REC-xml
Find Out More • The XML Industry Portal • www.xml.org • O’Reilly XML site • www.xml.com • XML Cover Pages • www.oasis-open.org/cover/ • Café Con Leche • www.ibiblio.org/xml/
Find Out More • Scottish Health and Community Care XML Steering Group • www.isdscotland.org/xml
XML Tools • XSV - Open Source XML Schema Validator • www.ltg.ed.ac.uk/~ht/xsv-status.html • MSXML 4.0 • www.microsoft.com/downloads/details.aspx?FamilyID=3144b72b-b4f2-46da-b4b6-c5d7485f2b42
XML Tools • XML Spy 2004 IDE • www.altova.com/products_ide.html • Free XML Tools and Software • www.garshol.priv.no/download/xmltools/
Printed Sources • Numerous printed sources – for more information visit • Charles F. Goldfarb'swww.xmlbooks.com • www.amazon.com