370 likes | 384 Views
Learn about XML, a textual language where significant elements are indicated by markers. Understand its advantages over HTML and SGML and its applications in web development.
E N D
CP3024 Lecture 6 XML: Extensible Markup Language
What is a markup language? • Textual (i.e. person readable) language where significant elements are indicated by markers • <TITLE>XML</TITLE> • Examples are RTF, HTML, VRML, TEX etc. • Easy to process and can be manipulated by a variety of application programs
What does the Web use? • HTML • Hypertext Markup Language • Defined as the original Web language • Based on SGML (see later) • Suited for hypertext, multimedia, small simple documents • Currently at version 4.01 (the last?)
Why change? - 1 • Change in Web usage • no longer a mechanism for exchanging scientific papers • presentational aspects are now seen as of greater importance • extracting the meaning of a document using a program will be a new growth area • HTML can't grow much more!
Why change? - 2 • Extensibility • HTML does not allow users to specify their own tags • Structure • HTML cannot represent database schemas or object-oriented hierarchies • Validation • HTML does not allow applications to check that the structure of data is valid
What is SGML? • Standard Generalised Markup Language • ISO 8879 • Can define any document format of any complexity • Enables, extensibility, structure and validation • Too many optional features for the Web
What is XML? • Simplified subset of SGML designed for Web applications • Differs from HTML • Can define new tags • Structures may be nested to any level of complexity • XML documents may define a grammar which enables structural validation of that document
Where has XML come from? • Emanates from the Word Wide Web consortium (W3C) • Developed by XML working group chaired by Jon Bosak (Sun Microsystems) • Group includes representatives from Microsoft, Netscape, HP, Adobe, etc. • Last bastion against proprietary markup and Web fragmentation
Design Goals for XML - 1 • XML shall be straightforwardly usable over the Internet • XML shall support a wide variety of applications • XML shall be compatible with SGML • It shall be easy to write programs which process XML documents • The number of optional features is to be kept to the absolute minimum
Design Goals for XML - 2 • XML documents should be human-legible • The XML design should be prepared quickly • The design of XML shall be formal and concise • XML documents shall be easy to create • Terseness in XML markup is of minimum importance
The XML View of a Document Taken from an example given by Jon Bosak
Structured Publishing Taken from an example given by Jon Bosak
XML Example <?xml version="1.0"?> <sweepjoke> <harry>Say <quote>Bye Bye </quote>, Sweep </harry> <sweep> <quote>Bye Bye, Sweep</quote></sweep> <laughter/> </sweepjoke>
XML Markup • Elements • Entity references • Comments • Processing Instructions • Marked sections • Document type declarations (DTD)
Elements • Commonest form of markup • Delimited by angle brackets (<, >) • May be empty but normally consist of start tag and end tag • Start tag may contain attributes • <a href="www.scit.wlv.ac.uk">
Entity References • In XML (and HTML) certain characters are reserved e.g. < • Entity references are used to insert these into documents • Entity references begin with an ampersand (&) and end with a semicolon (;) • You can define your own entities • Can be used to insert Unicode characters
Comments • Begin with <!-- • End with --> • Can contain any data except -- • XML processors are not required to pass comments to an application
Processing Instructions (PIs) • Provide information to an application • XML processors required to pass them on • Have the form <?name pidata?> • The name (PI target) identifies the PI • Data is optional and meaningful to an application that recognises the target
Marked Sections • Parsers ignore everything in CDATA sections <![CDATA[ <head>if p < <</head> ]]> • Only character string not allowed is ]]> • Data is passed on to the application
Document Type Declarations • Optional in XML (not in SGML) • Specify constraints on the sequence and nesting of tags • Communicates meta-information to the parser about content • Sequence and nesting of tags, attribute values, external files, entities
Kinds of Declaration • Element type declarations • Attribute list declarations • Entity declarations • Notation declarations
Element Type Declaration <!ELEMENT sweepjoke (harry+, sweep, laughter?)> • A sweepjoke consists of a harry element followed by a sweep element and a laughter element • The harry element may be repeated (+) • + indicates one or more • The laughter element is optional (?)
Sweepjoke Declaration <!ELEMENT sweepjoke (harry+, sweep, laughter?)> <!ELEMENT harry (#PCDATA | quote)*> <!ELEMENT sweep (#PCDATA | quote)*> <!ELEMENT quote (#PCDATA)*> <!ELEMENT laughter EMPTY> • PCDATA indicates parseable character data • | indicates 'or' • * indicates 'zero or more'
Attribute List Declaration • Identifies • which elements may have attributes • what attributes they may have • what values are permitted for an attribute • what value is the default <!ATTLIST sweepjoke name ID #REQUIRED label CDATA #IMPLIED status ( funny | notfunny ) 'funny'>
Entity Declarations • Allow a name to be associated with some other content • Internal entities associate a name with a string of literal text (e.g. <) • External entities associate a name with the content of another file • Parameter entities enable text replacement within the DTD
Adding a DTD to an XML File • Inline • External • <?xml version="1.0"?> • <!DOCTYPE sweepjoke SYSTEM “sweep.dtd">
Links in XML • HTML anchors are a very limited form of hypertext • XML introduces • XPointers • XLinks • These standards are outside the scope of the XML standard
Presentation Issues • Use of a stylesheet is implicit • Possible standards: • DSSSL Document Style and Semantics Specification Language (ISO 10179) • CSS Cascading Stylesheet Specification • XSL Extensible Style Language (uses XML syntax)
XSL • XSL is an XML sylesheet language • XSLT is a language for transforming XML documents • XSL formatting objects specify formatting semantics • A set of rules to transform a document • XML can be transformed into HTML
XML Application Areas • Mediation between heterogeneous databases on the Web • Client centric web applications • Applications requiring different views of the same data • Information discovery tailored to the needs of differing individuals
Languages based on XML • MathML • SMIL • RDF • XHTML • CML
RDF • Resource Description Framework • Integrates a variety of web-based metadata activities • Provides interoperability between applications that exchange metadata • Allows machine readable description of Web resources
RDF Example <?xml version="1.0"?> <?xml:namespace ns = "http://www.w3.org/RDF/RDF/" prefix ="RDF" ?> <?xml:namespace ns = "http://purl.oclc.org/DC/" prefix = "DC" ?> <RDF:RDF> <RDF:Description RDF: HREF = "http://uri-of-Document-1"> <DC:Creator>John Smith</DC:Creator> </RDF:Description> </RDF:RDF>
XHTML • New Web languages are defined using XML • HTML 4.0 cannot be defined using XML • XHTML is XML compliant HTML
Major Changes • Documents must be well-formed • Elements and attributes must have lower case names • End tags required in non-empty elements • Attribute values must be in quotes • Empty tags must be terminated • Scripts will be processed by XHTML
XHTML Compatibility • Current browsers unlikely to understand all XHTML • E.g. <br/> may cause an error • Compatibility guidelines defined in XHTML standard • See http://www.w3.org/TR/xhtml1/ Appendix C
Summary • XML significantly expands what is possible on the Web • XML preserves the basic Web ideas • Using XML is an order of magnitude more difficult than writing HTML • Software is out there and more will soon follow • The opportunities are endless!