170 likes | 188 Views
Learn the basics of XML, including its structure, syntax, and how it can be used in various applications. Understand the importance of well-formed XML and how it differs from HTML.
E N D
XML. Contents : 1.0 XML concepts 1.1 Introduction 1.2 Structure of XML and “well-formed” data 1.3 Displaying XML data 2.0 The inevitable list of proceeding acronyms 2.1 DTD’s : document type definitions 2.2 CSS : cascading style sheets 2.3 XSL : extensible style language 3.0 Further acronyms 3.1 SAX : simple api for xml 3.2 DOM : document object model 4.0 The demonstrations - where it all goes horribly wrong !
Extensible Markup Language : XML Introduction • A brief history of XML : • was developed by members of the W3C and released as a recommendation in February 1998. • Html and XML are cousins, they are both drawn from the same inspiration SGML however there are important differences • html evolved into a markup language that describes the look, feel and action of a web page. • XML describes what the words in a document are. • while html combines structure and display of a document, XML separates them, makes them portable in a way that enables them to be used in many different types of applications. • web pages are therefore just one of many uses of XML. • With XML you can store information about all or pieces of your document. You can then use that information as criteria for displaying it, or other example uses might be in validating digital signatures, sharing data across distributed systems, processing data for other applications or configuring applications at runtime. • It’s beauty lies in it’s simplicity and it’s customizability (eg. you can agree upon a common format that relates to the type of work that you do and utilize a more “descriptive” vocabulary).
Structure of XML and well formed data eXtensible : Extends your ability to describe a document. Html restricts the use of which tags and vocabulary you provide, whereas XML enables you to define meaningful tags for your applications. For example you might wish to define a Book in XML, which comprises of a Title, author, ISBN number, price and number of pages. All of these can be defined into a number of tags which comprise a catalogue of books. Whilst this can be slightly disconcerting to start with (not having a reference book handy with all your tags neatly defined - as in html), it provides the developer with more power to structure data more thoroughly - and enables you to enforce structural rules (as we will see with the use of a document type definition later). Remember however - you are extending your tags to identify elements by what they are, not by how they look. Markup : This is essential for documents to make sense. Without markup your computer views your document as one long string of text, with each character having equal importance to every other character. By ‘marking up’ your documents, you provide meaning to the pieces within, you identify them in a way that gives them value and context - this is book, this is title, this is a section head, etc. Language : A language describes something by providing a set of rules. The developer is provided with the flexibility to create a user-defined set of markup tags, but the current XML structure and syntax of the language remain firm and clearly defined.
Structure of XML documents • XML applies to the structure of documents • A structure is the way that we put a skeleton behind information, so that the pieces work together as a whole • You define the elements of a document using any word processor, this can then be ported to another word processor where the structure will remain intact. This structure can be thought of as a tree, or a pyramid and is called a document tree. • Structure vs Format • The most important thing to remember is that a structured document is defined by the elements it contains, not by how it looks. For example : • Structure says that an element is a paragraph • Format says to display the paragraph in 12 point Times • Structure sys the element is a book title • Format says to display the book title in green bold body text • Structure says the element is a social security number • Format says to hide and not display the social security number • Learning to separate structure from format is critical in making good use of XML. Tags <Book> <Title>Java Design Patterns</Title> <Author>J.W.Cooper dob”1952”</Author> <ISBN>780201485394</ISBN</ISBN> <Price>$44.95</Price> <Pages>329</Pages> </Book> Element Attribute value Attribute
Catalogue Book Book Title Author ISBN Price Pages Title Author ISBN Price Pages A document tree • Well - formed XML • Provided that an XML document conforms to the XML syntax rules, it is considered well-formed. Some of the most important guidelines are as follows : • You must type the XML declaration. • If you are embedding XML in an html document, it must go after the <HTML> and <HEAD> tags, and before any Javascript. • You must include the version attribute (currently “1.0”), if the document is declared “stands alone”, the processing application does not need to look for a DTD to validate the XML tags, and the encoding declares how the document is encoded - the default is UTF-8. • Eg. <?xml version=“1.0 standalone=“yes” encoding=“UTF-8”?>
If it is decided that a DTD will be necessary, then firstly the xml declaration standalone tag needs to be set to “no” and a doctype definition tag added to declare the name of the DTD file : • <?xml version=“1.0 standalone=“no” encoding=“UTF-8”?> • <!DOCTYPE Catalogue SYSTEM "book.dtd"> • ‘Catalogue’ is the name of the type of document contained in this file, ‘SYSTEM’ tells the processor to look for the private DTD at the following location and “book.dtd” is the name of the actual file. • The root element must then be entered, under which all other elements are grouped. • All the basic rules then apply : • Do what the dtd instructs (these are developer-defined rules) • Watch your capitalization - XML is case sensitive • Quote attribute values eg. <AUTHOR dob=“1802” dod=“1885”>B. Johnson</AUTHOR> • Close all tags - <AUTHOR> xxxxxxxx </AUTHOR> including empty tags (where nothing is defined) • No overlapping markup : • <book> … <chapter> ... </chapter> … </book> correct • <book> … <chapter> ... </book> … </chapter> incorrect • One single root element is allowed • No isolated markup characters • There are of course further rules but these define the basic building blocks of an XML document.
<?xml version="1.0"?> <!DOCTYPE BookCatalogue SYSTEM "book2.dtd"> <BookCatalogue> <Book> <Title>Java in a Nutshell</Title> <Author>David Flanagan</Author> <Date>Oct, 1999</Date> <ISBN>0-672-31208-5</ISBN> <Publisher>O'Reilly</Publisher> </Book> <Book> <Title>CORBA in 14 days</Title> <Author>Jeremy Rosenberger</Author> <Date>1998</Date> <ISBN>0-672-31208-5</ISBN> <Publisher>SAMS Publishing</Publisher> </Book> <Book> <Title>C++ Primer</Title> <Author>Stanley B. Lippman</Author> <Date>1991</Date> <ISBN>0-201-54848-8</ISBN> <Publisher>Addison Wesley</Publisher> </Book> </BookCatalogue>
2.0 The inevitable list of proceeding acronyms • 2.1 DTDs • Valid XML documents follow a set of rules defined in an associated DTD which defines elements, attributes and relationships between elements. • DTDs are saved in an ASCII file with the ‘.dtd’ extension. • When the XML document is processed, it is compared to its associated DTD to be sure that it is structured correctly and that all tags are used properly. • 2.1.1 DTD schema • A schema is a set of rules for data, which defines 2 things : • 1. The elements in a data set & their associated relationship. • 2. The content that be contained in each element. • 2.1.2 XML parsers • A parser is a software tool that checks to make sure a document follows a particular syntax. They come in 2 varieties : • A non-validating parser checks to make sure XML rules are followed and builds a document tree from the element tags • A validatingparser checks the syntax, builds the tree and compares the rules specified in the associated DTD with the element tags. • Parsers can be external programs or part of the editing/browsing tool.
A simple DTD example. XML is made up of a hierarchy of ‘elements’ BookCatalogue comprises of 0..* Books Could use ‘+’ for 1..*, or ? for once or not at all <!ELEMENT BookCatalogue (Book)*> <!ELEMENT Book (Title, Author, Date, ISBN, Publisher)> <!ELEMENT Title (#PCDATA)> <!ELEMENT Author (#PCDATA)> <!ELEMENT Date (#PCDATA)> <!ELEMENT ISBN (#PCDATA)> <!ELEMENT Publisher (#PCDATA)> PCDATA contains parsed character data or text Book comprises of 5 children
2.2 CSS : Cascading style sheets • A CSS is a simple mechanism that allows author and readers to attch style to XML and HMTL documents. • A CSS style sheet contains 5 basic types of presentation information, called properties : • 1. Foreground and background colours and background images. • 2. Font properties • 3. Text properties (word / letter spacing) • 4. Boxes (margins and borders around block elements, floating elements etc.) • 5. Classifications (control over list styles, formatting of elements, etc.) • Styles are suggestions to the browser and the effect of a style will depend largely upon the program that is interpreting it. • 2.2.1 Style sheet inclusion • In an HTML document • 1. Use the <LINK> tag in the <HEAD> of the document. • <HEAD> • <LINK REL=STYLESHEET TYPE=“text/css” HREF=“mystyle.css”> • </HEAD> • 2. Use the <STYLE> tag in the <HEAD> of the document. • <HEAD> • <STYLE TYPE=“text/css”> H1 {color: red } </STYLE> • </HEAD> • 3. Directly in the STYLE attribute of the document (this is discouraged because it mixes style with content). • <H1 STYLE=“color: red”>A Title </H1>>
In an XML document • Using the format : • <?xml-stylesheet type=“text/css” href=“mystyle.css”?> • 2.2.2 Example of a style sheet CATALOG { background-color: #ffffff; width: 100%; } CD { display: block; margin-bottom: 30pt; margin-left: 0; } TITLE { color: #FF0000; font-size: 20pt; } ARTIST { color: #0000FF; font-size: 20pt; } COUNTRY,PRICE,YEAR,COMPANY { Display: block; color: #000000; margin-left: 20pt; } Elements Properties
2.3 XSL Extensible Style Language. • This is a style language used for XML, providing a transformation of the raw XML data into a richer dataset • XSL specifies the presentation of XSL information using 2 basic categories of techniques : • 1. Optional transformation of the input document into another structure : • generation of constant text • suppression of content • moveing text • duplicating text • sorting • XSL stylesheet transforms the source document (or tree) into a result tree Input documents : XML DTD CSS Output documents : HTML XML TEXT Transformation Engine XSL parser
2. A description of how to present transformed information : • This can take one of three levels of formatting : • specification of the general screen or page layout • assignment of the transformed text into basic “content container types” (e.g. lists, paragraphs etc) • specification of formatting properties (spacing, margins, alignment, fonts, etc.) • It should be borne in mind that XSL is a complete programming language in its own right, which provides its own (different) syntax to those already presented. Suffice to say that XSL generally follows the format : • <xsl:template match=“pattern”> • [ action ] • </xsl:template> • A typical example of an XSL file might look like the following : • <?xml version=“1.0”?> • <xsl:stylesheet> • <xsl:template match=“/”> [action] </xsl:template> • <xsl:template match=“BookCatalogue”> [action] </xsl:template> • <xsl:template match=“Book”> [action] </xsl:template> • </xsl:stylesheet> match=“/” applies to whole document Match root element Match each subsequent child of root element
3.0 Further acronyms SAX (simple API for XML) and DOM (document object model). • 3.1 SAX • Both SAX and DOM are application programming interfaces which basically equates to a set of classes that define a standard set of mechanisms to read and write XML data. • The SAX API is an event-driven protocol, because the technique works by first registering handlers with the SAX parser, after which the parser invokes the callback methods whenever it sees an XML tag. • The methods in this handler class perform the application-specific functionality during the parse. • The SAX API is one of the simplest interfaces to use and handle XML. • There are a number of implementations in different programming languages (C, java, C++, etc). Example : Java front-end server to database retrieving XML data using SAX parser in java, “stylizing” the output & returning it to requesting Clients Database with XML objects utilizing SAX parser written in C Client browser, application, email, etc. retrieves and interprets returned data
3.2 DOM : Document Object Model • The SAX interface allows you to parse an XML file and execute particular actions whenever certain structures (like tags) appear in the input. This is great for a lot of applications, however you might want to completely alter whole sections of the tree structure, restructure or completely build from scratch an Object structure and then save the whole thing in XML format. • For this purpose we use the DOM API, where instead of reading XML and carrying-out call backs to a user-defined class, a file as a tree of objects is returned. Note at this point we have checked the file for formatting and semantic validity and built a complete hierarchical object structure which exactly represents the XML file. We may then add further nodes and leafs, update information, move information, delete information and when we are content with our alterations, we can then tell the top node to print itself to another XML file, and the new document object is created. • Methods such as these are available to effectively navigate and alter the XML architecture : • Nodes Elements • getNodeType() createElement() • getNodeValue() createTextNode() • getParentNode() createCDATASection() (for DTDs) • getChildNodes() createAttribute() • setNodeValue() createEntityReference() • SAX and DOM are really quite large APIs and should therefore studied as a separate exercise.
Useful Resources Complete the 2 day course “Introduction to XML”, Jean Philippe Forrestier Specification : http://www.w3.org/TR/REC-xml : the xml specifications Web : http://www.w3.org/XML/Activity.html : new XML developments http://www.oasis-open.org/ : this group covers structured documents, SGML & XML issues http://www.xml.com : another site http://www.microsoft.com/xml/c-frame.htm : Microsoft tutorials on XML http://www.zvon.org : Geeky guy with an unhealthy interest in XSL ! Books : Professional XML - Mark Birbeck et al - 2000 - Wrox Press ISBN: 1861003110 XML by Example - Benoit Marchal - 1999 - Que ISBN: 0789722429
??? Any Questions ??? (with the exception of Michel who should be reminded that I am not an expert on XML, just completed did a 2-day course !)