190 likes | 321 Views
Introduction to XML. Presented to the seminar “Introduction to MARCXML”, National Library of Scotland, Edinburgh, 5 May 2006, organised by the Cataloguing and Indexing Group in Scotland. Eric Jutrzenka, e.jutrzenka@nls.uk, 5 May 2006. Brief History of XML. Markup used by copy editors
E N D
Introduction to XML Presented to the seminar “Introduction to MARCXML”, National Library of Scotland, Edinburgh, 5 May 2006, organised by the Cataloguing and Indexing Group in Scotland Eric Jutrzenka, e.jutrzenka@nls.uk, 5 May 2006
Brief History of XML • Markup used by copy editors • Lead to ISO Standard Generic Markup Language (SGML) – 1986 • Tim Berners-Lee created Hyper-Text Markup Language - 1989 • eXtensible Markup Language (XML) became W3C recommendation in 1998
What is XML? • A set of specifications that is widely accepted as an industry standard for information transfer on the internet • A means of applying type and structure to information • It’s easy to understand • It’s flexible
Basic Constructs • Document • Contains a single root Element • Which contains zero or more elements • Each of which have • zero or more attributes • zero or one value • zero or more child elements
Example <?xml version="1.0" encoding="ISO-8859-1"?> <!-– Example XML document --> <library> <booktype=“novel”> <title>A Tale of Two Cities</title> <author>Charles Dickens</author> </book> <book/> </library>
All XML is Well Formed • Must have a single root element • Every start tag must have an end tag • Must be a tree structure • Element names are case sensitive and must begin with a letter or underscore (_) • Element names can contain letters, digits, periods (.), hyphens (-), underscores (_), or colons (:)
Example – Not XML <?xml version="1.0" encoding="ISO-8859-1"?> <!-–Example XML document, 4 mistakes--> <library> <1booktype=“novel”> <title> A Tale of Two Cities <join> </Title> <author’s name> </join> Charles Dickens </author’s name> </1book> </library>
Problem Characters • < > ‘ “ • Angle brackets used in element names so can cause problems • Apostrophes and quotes used to delimit attribute values
Example <?xml version="1.0" encoding="ISO-8859-1"?> <!-– Example XML document --> <library> <booktype=“novel”> <title> Pride & Prejudice </title> <author> Jane Austen </author> <notedesc=“quotes look like: “”> 3 < 4, 4 > 3 </note> </book> </library>
Example <?xml version="1.0" encoding="ISO-8859-1"?> <!-– Example XML document --> <library> <booktype=“novel”> <title> Pride & Prejudice </title> <author> Jane Austen </author> <note type=“quotes look like: "”> 3 < 4, 4 > 3 </note> </book> </library>
Namespaces • Name collisions • Conversion <table /> VS. Coffee <table /> • <furniture:table /> • <conversions:table />
Example <?xml version="1.0" encoding="ISO-8859-1"?> <stuff xmlns:conv=‘http://nls.uk/conversion’> <table xmlns=‘http://nls.uk/furniture’> <material>Wood</material> <height>0.5m</height> </table> <conv:table> <conv:unit>Miles</unit> <conv:unit>Kilometres</unit> </conv:table> </stuff>
DTDs and XML Schema • Document Type Definition (DTD) • Part of the SGML standard • XML Schema • New and improved XML standard for specifying type • Formal ways of specifying document type • A Valid XML document conforms to a DTD or Schema
Example DTD <!DOCTYPE library [ <!ELEMENT library (book*)> <!ELEMENT book (title, author+)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> ]>
Example Schema <?xml version="1.0"?> … <xs:element name=“library”> <xs:complexType> <xs:sequence> <xs:element name=“book”> <xs:complexType> <xs:sequence> <xs:element name=“title” type=“xs:string” /> <xs:element name=“author” type=“xs:string” minOccurs=‘1’ maxOccurs=‘unbounded’ /> </xs:sequence>
Extract from MARC 21 XML Schema <xsd:simpleType name="leaderDataType" id="leader.st"> <xsd:restriction base="xsd:string"> <xsd:whiteSpace value="preserve"/> <xsd:pattern value="[\d ]{5}[\dA-Za-z ]{1}[\dA-Za-z]{1}[\dA-Za-z ]{3}(2| )(2| )[\d ]{5}[\dA-Za-z ]{3}(4500| )"/> </xsd:restriction> </xsd:simpleType>
XSLT • Used to transform XML documents • Can handle formats other than XML • Often used to display information in XML document, e.g. XML to PDF
Additional Resources • http://www.w3c.org • For detailed specifications • http://www.w3schools.com/ • For friendly ‘click-through’ web tutorials with many examples • http://www.xml.com • News, Articles, Tutorials