360 likes | 521 Views
Introduction to XML 1. The XML Language. Tim Brailsford. Markup Languages. The word “Markup” is derived from the printing industry Detailed stylistic instructions for typesetting Usually hand-written on the copy (eg underlining some text that is to be set in italics).
E N D
Introduction to XML1. The XML Language Tim Brailsford
Markup Languages • The word “Markup” is derived from the printing industry • Detailed stylistic instructions for typesetting • Usually hand-written on the copy (eg underlining some text that is to be set in italics). • Markup languages do the same job for computerised documentation systems. • Markup adds logical structure to a document, or indicates how it is to be laid out (on paper or screen). • Markup languages are a set of instructions that are amenable to automatic processing.
Markup Languages (cont.) • Usually a sequence of characters in a text file that indicate structure or behaviour of the content. • For example (in HTML) • This is <B>bold</B> and this is <I>italic</I> • <TITLE>This is the title.</TITLE> • Markup may be created by directly editing the symbols, but is more usually hidden from end-users. • Examples • HTML • RTF • Hytime
Generalised Markup Languages • Proprietary markup languages are problematic. • Generalised markup languages are langauges for defining markup languages. • Metalanguages • SGML
SGML - History • Standard Generalised Markup Language • 1969 - GML from IBM • text editing • formatting • information retrieval • 1980 SGML first published • 1980’s SGML adopted by US IRS & DOD • 1986 - ISO standardISO 8879: Information processing--Text and office systems--Standard Generalized Markup Language (SGML), ([Geneva]: ISO, 1986).
SGML • SGML defines a system of tag markup <TAG>This is a pair of SGML tags</TAG> • SGML is a standard for how to specify a tag set. • Document Type Definition (DTD) • SGML documents contain structural elements that can be described without consideration of how they are displayed. • SGML application. • HTML is an SGML application.
Benefits of SGML • Documents are created by thinking in terms of structure rather than appearance (which may change over time). • Documents are portable because any SGML compliant software can interpret them by reference to the DTD. • Documents originally intended for one medium can easily be re-purposed for other media, such as the computer display screen.
What is XML? • XML is based upon SGML, but is substantially simplified for use on the WWW. • Like SGML, XML is a metalanguage • arbitrary definition of elements • <TITLE> <PARAGRAPH> <ChapterHeading> <PRICE><PARTNUMBER> <MANUFACTUER> <ExamGrade> • Syntax may optionally be described by a DTD • Valid documents - have a DTD • Well formed documents do not have a DTD • Style and content are completely separate • XML documents contain content • Style is specified by stylesheets
Example XML Applications • MathML - maths • CML - chemistry • SVG - vector graphics • XHTML - WWW • SMIL - synchronised multimedia • MusicML - sheet music • FpML - financial products • RETML - real estate transactions • and many, many others
XML Elements • XML documents consist of one or more elements. • Elements consist of a pair of tags and (optionally) enclosed text.<TITLE>The XML Companion</TITLE> • Elements may have attributes.<TITLE type=“book”>The XML Companion</TITLE> • Elements may contain other elements.<REFERENCE> <TITLE type=“book”>The XML Companion</TITLE></REFERENCE> • Empty elements may be self closing.<PICTURE src=“mypic.jpg”> </PICTURE><PICTURE src=“mypic.jpg” />
Contents vs Style • XML tags contain meaning not appearance. • This allows extra information to be extracted • Consider the example of the scientific names of animals. • scientific names are in latin • by convention they are always printed in italics The scientific name of the domestic dog is Canis familiaris, and of the domestic cat is Felis catus.
Contents vs Style In HTML <P>The <I>scientific</I> name of the domestic dog is <I>Canis familiaris</I>, and of the domestic cat is <I>Felis catus.</I></P> NB there is no distinction between scientific names and emphasis. • XML tags contain meaning not appearance. • This allows extra information to be extracted • Consider the example of the scientific names of animals. • scientific names are in latin • by convention they are always printed in italics The scientific name of the domestic dog is Canis familiaris, and of the domestic cat is Felis catus.
Contents vs Style In XML <P>The <emph>scientific</emph> name of the domestic dog is <sci>Canis familiaris</sci>, and of the domestic cat is <sci>Felis catus.</sci></P> NB emphasis and scientific names are different tags. They may both be displayed as italic, but they can be treated separately. • XML tags contain meaning not appearance. • This allows extra information to be extracted • Consider the example of the scientific names of animals. • scientific names are in latin • by convention they are always printed in italics The scientific name of the domestic dog is Canis familiaris, and of the domestic cat is Felis catus.
Rendering of XML • XML files contain content not appearance • Stylesheets contain appearance and behaviour • XML data is rendered by being transformed into some form suitable for display • RTF (for simple printing) • PDF or PostScript (for printing or display) • HTML (for display over the web) • HTML 4.0 / DHTML (for complex interfaces) • The transformation is defined by a stylesheet • Rendering may be done by standalone software, or by a web browser, or on a web server.
Standalone Rendering XML HTML XSL
Client Side Rendering XML XML HTML XSL XSL Server (any) Browser with XSL engine (eg MS IE > 5.0)
Server Side Rendering XML HTML HTML XSL Server with XSL engine eg Apache/Tomcat/Cocoon Browser (any)
Client vs Server Stylesheets • Client side stylesheets are processed in client • XML is delivered to the client • XSL/CSS must be supported by client • MS IE supports CSS & XSLT (non-standard in 5.x mostly standard in 6.x) • Netscape 7 & Mozilla supports CSS and possibly XSLT via plugins. • Server side stylesheets are processed in server • XML is not delivered to the client, it is transformed usually to HTML or PDF • XSL/CSS must be supported by server • Cocoon is an Open Source project, implementing XSL as a Java servlet • Any browser can then be used
Cocoon on Nottingham Servers • Any file placed in a directory called public_html is accessible within the Nottingham networkwith the url:http://www.cs.nott.ac.uk/~username/filename • Files with the .xml extension are automatically processed by cocoon. • Providing that they have an XSL stylesheet and the correct Cocoon processing instructions they will be “transformed” into (usually) HTML.
A Simple XML Document <?xml version="1.0" ?> <booklist title="Some XML Books"> </booklist>
A Simple XML Document <?xml version="1.0" ?> <booklist title="Some XML Books"> </booklist> XML declaration Root element (one per document)
A Simple XML Document <?xml version="1.0" ?> <!DOCTYPE booklist SYSTEM "books.dtd" > <booklist title="Some XML Books"> </booklist>
A Simple XML Document <?xml version="1.0" ?> <!DOCTYPE booklist SYSTEM "books.dtd" > <booklist title="Some XML Books"> </booklist> Define root element and specify DTD.
A Simple XML Document <?xml version="1.0" ?> <!DOCTYPE booklist SYSTEM "books.dtd" > <!-- This is a comment --> <booklist title="Some XML Books"> </booklist>
A Simple XML Document <?xml version="1.0" ?> <!DOCTYPE booklist SYSTEM "books.dtd" > <!-- This is a comment --> <booklist title="Some XML Books"> </booklist> This is a comment (as SGML / HTML)
A Simple XML Document <?xml version="1.0" ?> <!DOCTYPE booklist SYSTEM "books.dtd" > <!-- This is a comment --> <?xml-stylesheet type="text/xsl" href=”iti-xml2.xsl"?> <booklist title="Some XML Books"> </booklist>
A Simple XML Document <?xml version="1.0" ?> <!DOCTYPE booklist SYSTEM "books.dtd" > <!-- This is a comment --> <?xml-stylesheet type="text/xsl" href=”iti-xml2.xsl"?> <booklist title="Some XML Books"> </booklist> This defines the XSL stylesheet
A Simple XML Document <?xml version="1.0" ?> <!DOCTYPE booklist SYSTEM "books.dtd" > <!-- This is a comment --> <?xml-stylesheet type="text/xsl" href="books3.xsl"?> <?cocoon-process type="xslt"?> <booklist title="Some XML Books"> </booklist>
A Simple XML Document <?xml version="1.0" ?> <!DOCTYPE booklist SYSTEM "books.dtd" > <!-- This is a comment --> <?xml-stylesheet type="text/xsl" href="books3.xsl"?> <?cocoon-process type="xslt"?> <booklist title="Some XML Books"> </booklist> This is a Cocoon processing directive (NB not standard XML, but required by Cocoon 1.7.4).
Adding Content <booklist title="Some XML Books"> <book> <author> <name>St. Laurent</name> <initial>S</initial> </author> <date>1998</date> <title edition="Second">XML: A Primer</title> <publisher>MIS Press</publisher> <website href="http://www.simonstl.com/xmlprim/" /> <rating stars="4"/> </book> </booklist>
Benefits of a DTD • DTDs are optional in XML • DTD allows validation of documents • DTD defines the application • Vital for collaborative development • IPR implications • DTD allows entity definitions (ie symbols, shortcuts, “foreign” characters etc.).
XML Namespaces • Namespaces are mechanisms to ensure that elements are unique • Namespaces in XML are optional • Consider the following: <title>The Title</title> <title text=“The Title” /> <title> <text>The Title</text> </title>
Ensuring uniqueness • Unique element names • Unique attribute content <title-one>The Title</titleone> <title-two text=“The Title” /> <title-three> <text>The Title</text> </title-three> <title ns=“one”>The Title</title> <title ns=“two” text=“The Title” /> <title ns=“3”> <text>The Title</text> </title>
xmlns attribute • The xmlns attribute is used to declare namespaces • This must be a URI <title xmlns=“http://www.cs.nott.ac.uk/~tjb/NSdemo-one”> The Title </title> <title xmlns=“http://www.cs.nott.ac.uk/~tjb/NSdemo-two” text=“The Title” /> <title xmlns=“http://www.cs.nott.ac.uk/~tjb/NSdemo-three”> <text>The Title</text> </title>
Namespace Abbreviations • If an element doesn’t have a namespace defined it inherits that of its parent. • Where multiple namespaces are used together aliases may be declared <demo xmlns=“http://www.cs.nott.ac.uk/~tjb/NSdemo-two” > <title text=“The Title” /> </demo> <demo xmlns:first =“http://www.cs.nott.ac.uk/~tjb/NSdemo-one” xmlns:second =“http://www.cs.nott.ac.uk/~tjb/NSdemo-two” > <first:title>The Title</first:title> <second:title text=“The Title” /> </demo>
XML Namespaces • Namespaces in XML are optional • Namespaces ensure that elements are unique • In different contexts a given tag might mean different things - eg consider <BOOK> • To me it might mean a book in a bibliography • To a bookshop it might contain stock details • To a travel agent it might contain information about flight bookings! • Namespaces attach unique labels to a given tag set. • URLs are usually used as namespace labels.