XML

XML Vadim ParizherCS 496-EBT Fall 2003

Learning Objectives • Learn what XML is • Learn the various ways in which XML is used • Learn the key companion technologies

Agenda • Overview • Syntax and Structure • The XML Alphabet Soup

OverviewWhat is XML? • A tag-based meta language • Designed for structured data representation • Represents data hierarchically (in a tree) • Provides context to data (makes it meaningful) • Self-describing data • Separates presentation (HTML) from data (XML) • An open W3C standard • A subset of SGML • vs. HTML, which is an implementation of SGML

Application X Database Repository Configuration Documents OverviewWhat is XML? • XML is a “use everywhere” data specification XML XML XML XML

OverviewDocuments vs. Data • XML is used to represent two main types of things: • Documents • Lots of text with tags to identify and annotate portions of the document • Data • Hierarchical data structures

OverviewXML and Structured Data • Pre-XML representation of data: • XML representation of the same data: “PO-1234”,”CUST001”,”X9876”,”5”,”14.98” <PURCHASE_ORDER> <PO_NUM> PO-1234 </PO_NUM> <CUST_ID> CUST001 </CUST_ID> <ITEM_NUM> X9876 </ITEM_NUM> <QUANTITY> 5 </QUANTITY> <PRICE> 14.98 </PRICE> </PURCHASE_ORDER>

OverviewBenefits of XML • Open W3C standard • Representation of data across heterogeneous environments • Cross platform • Allows for high degree of interoperability • Strict rules • Syntax • Structure • Case sensitive

OverviewWho Uses XML? • Submissions by • Microsoft • IBM • Hewlett-Packard • Fujitsu Laboratories • Sun Microsystems • Netscape (AOL), and others… • Technologies using XML • SOAP, ebXML, BizTalk, WebSphere, many others…

Syntax and StructureComponents of an XML Document • Elements • Each element has a beginning and ending tag • <TAG_NAME>...</TAG_NAME> • Elements can be empty (<TAG_NAME />) • Attributes • Describes an element; e.g. data type, data range, etc. • Can only appear on beginning tag • Processing instructions • Encoding specification (Unicode by default) • Namespace declaration • Schema declaration

<?xml version=“1.0” ?> <?xml-stylesheet type="text/xsl" href=“template.xsl"?> <ROOT> <ELEMENT1><SUBELEMENT1 /><SUBELEMENT2 /></ELEMENT1> <ELEMENT2> </ELEMENT2> <ELEMENT3 type=‘string’> </ELEMENT3> <ELEMENT4 type=‘integer’ value=‘9.3’> </ELEMENT4> </ROOT> Syntax and StructureComponents of an XML Document Elements with Attributes Elements Prologue (processing instructions)

Syntax and StructureRules For Well-Formed XML • There must be one, and only one, root element • Sub-elements must be properly nested • A tag must end within the tag in which it was started • Attributes are optional • Defined by an optional schema • Attribute values must be enclosed in “” or ‘’ • Processing instructions are optional • XML is case-sensitive • <tag> and <TAG> are not the same type of element

Syntax and StructureWell-Formed XML? • No, CHILD2 and CHILD3 do not nest propertly <xml? Version=“1.0” ?> <PARENT> <CHILD1>This is element 1</CHILD1> <CHILD2><CHILD3>Number 3</CHILD2></CHILD3> </PARENT>

Syntax and StructureWell-Formed XML? • No, there are two root elements <xml? Version=“1.0” ?> <PARENT> <CHILD1>This is element 1</CHILD1> </PARENT> <PARENT> <CHILD1>This is another element 1</CHILD1> </PARENT>

Syntax and StructureWell-Formed XML? • Yes <xml? Version=“1.0” ?> <PARENT> <CHILD1>This is element 1</CHILD1> <CHILD2/> <CHILD3></CHILD3> </PARENT>

Syntax and StructureAn XML Document <?xml version='1.0'?> <bookstore> <book genre=‘autobiography’ publicationdate=‘1981’ ISBN=‘1-861003-11-0’> <title>The Autobiography of Benjamin Franklin</title> <author> <first-name>Benjamin</first-name> <last-name>Franklin</last-name> </author> <price>8.99</price> </book> <book genre=‘novel’ publicationdate=‘1967’ ISBN=‘0-201-63361-2’> <title>The Confidence Man</title> <author> <first-name>Herman</first-name> <last-name>Melville</last-name> </author> <price>11.99</price> </book> </bookstore>

Syntax and Structure Namespaces: Overview • Part of XML’s extensibility • Allow authors to differentiate between tags of the same name (using a prefix) • Frees author to focus on the data and decide how to best describe it • Allows multiple XML documents from multiple authors to be merged • Identified by a URI (Uniform Resource Identifier) • When a URL is used, it does NOT have to represent a live server

Syntax and Structure Namespaces: Declaration Namespace declaration examples: xmlns: bk = “http://www.example.com/bookinfo/” xmlns: bk = “urn:mybookstuff.org:bookinfo” xmlns: bk = “http://www.example.com/bookinfo/” Namespace declaration Prefix URI (URL)

Syntax and Structure Namespaces: Examples <BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”> <bk:TITLE>All About XML</bk:TITLE> <bk:AUTHOR>Joe Developer</bk:AUTHOR> <bk:PRICE currency=‘US Dollar’>19.99</bk:PRICE> <bk:BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo” xmlns:money=“urn:finance:money”> <bk:TITLE>All About XML</bk:TITLE> <bk:AUTHOR>Joe Developer</bk:AUTHOR> <bk:PRICE money:currency=‘US Dollar’> 19.99</bk:PRICE>

Syntax and Structure Namespaces: Default Namespace • An XML namespace declared without a prefix becomes the default namespace for all sub-elements • All elements without a prefix will belong to the default namespace: <BOOK xmlns=“http://www.bookstuff.org/bookinfo”> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR>

Syntax and Structure Namespaces: Scope • Unqualified elements belong to the inner-most default namespace. • BOOK, TITLE, and AUTHOR belong to the default book namespace • PUBLISHER and NAME belong to the default publisher namespace <BOOK xmlns=“www.bookstuff.org/bookinfo”> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> <PUBLISHER xmlns=“urn:publishers:publinfo”> <NAME>Microsoft Press</NAME> </PUBLISHER> </BOOK>

Syntax and Structure Namespaces: Attributes • Unqualified attributes do NOT belong to any namespace • Even if there is a default namespace • This differs from elements, which belong to the default namespace

Syntax and Structure Entities • Entities provide a mechanism for textual substitution, e.g. • You can define your own entities • Parsed entities can contain text and markup • Unparsed entities can contain any data • JPEG photos, GIF files, movies, etc.

The XML ‘Alphabet Soup’ • XML itself is fairly simple • Most of the learning curve is knowing about all of the related technologies

The XML ‘Alphabet Soup’

The XML ‘Alphabet Soup’ Schemas: Overview • DTD (Document Type Definitions) • Not written in XML • No support for data types or namespaces • XSD (XML Schema Definition) • Written in XML • Supports data types • Current standard recommended by W3C

The XML ‘Alphabet Soup’ Schemas: Purpose • Define the “rules” (grammar) of the document • Data types • Value bounds • A XML document that conforms to a schema is said to be valid • More restrictive than well-formed XML • Define which elements are present and in what order • Define the structural relationships of elements

The XML ‘Alphabet Soup’ Schemas: DTD Example • XML document: • DTD schema: <BOOK> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> </BOOK> <!DOCTYPE BOOK [ <!ELEMENT BOOK (TITLE+, AUTHOR) > <!ELEMENT TITLE (#PCDATA) > <!ELEMENT AUTHOR (#PCDATA) > ]>

The XML ‘Alphabet Soup’Schemas: XSD Example • XML document: <CATALOG> <BOOK> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> </BOOK> … </CATALOG>

The XML ‘Alphabet Soup’Schemas: XSD Example <xsd:schema id="NewDataSet“ targetNamespace="http://tempuri.org/schema1.xsd" xmlns="http://tempuri.org/schema1.xsd" xmlns:xsd="http://www.w3.org/1999/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"> <xsd:element name="book"> <xsd:complexType content="elementOnly"> <xsd:all> <xsd:element name="title" minOccurs="0" type="xsd:string"/> <xsd:element name="author" minOccurs="0" type="xsd:string"/> </xsd:all> </xsd:complexType> </xsd:element> <xsd:element name=“Catalog" msdata:IsDataSet="True"> <xsd:complexType> <xsd:choice maxOccurs="unbounded"> <xsd:element ref="book"/> </xsd:choice> </xsd:complexType> </xsd:element> </xsd:schema>

The XML ‘Alphabet Soup’ Schemas: Why You Should Use XSD • Newest W3C Standard • Broad support for data types • Reusable “components” • Simple data types • Complex data types • Extensible • Inheritance support • Namespace support • Ability to map to relational database tables • XSD support in Visual Studio.NET

The XML ‘Alphabet Soup’ Transformations: XSL • Language for expressing document styles • Specifies the presentation of XML • More powerful than CSS • Consists of: • XSLT • XPath • XSL Formatting Objects (XSL-FO)

The XML ‘Alphabet Soup’ Transformations: Overview • XSLT – a language used to transform XML data into a different form (commonly XML or HTML) XML XML,HTML,… XSLT

The XML ‘Alphabet Soup’ Transformations: XSLT • The language used for converting XML documents into other forms • Describes how the document is transformed • Expressed as an XML document (.xsl) • Template rules • Patterns match nodes in source document • Templates instantiated to form part of result document • Uses XPath for querying, sorting, etc.

The XML ‘Alphabet Soup’ Transformations: Example <sales> <summary> <heading>Scootney Publishing</heading> <subhead>Regional Sales Report</subhead> <description>Sales Report</description> </summary> <data> <region> <name>West Coast</name> <quarter number="1" books_sold="24000" /> <quarter number="2" books_sold="38600" /> <quarter number="3" books_sold="44030" /> <quarter number="4" books_sold="21000" /> </region> ... </data> </sales>

The XML ‘Alphabet Soup’ Transformations: Example <xsl:param name="low_sales" select="21000"/> <BODY> <h1><xsl:value-of select="//summary/heading"/></h1> ... <table><tr><th>Region\Quarter</th> <xsl:for-each select="//data/region[1]/quarter"> <th>Q<xsl:value-of select="@number"/></th> </xsl:for-each> ... <xsl:for-each select="//data/region"> <tr><xsl:value-of select="name"/></th> <xsl:for-each select="quarter"> <td><xsl:choose> <xsl:when test="number(@books_sold <= $low_sales)"> color:red;</xsl:when> <xsl:otherwise>color:green;</xsl:otherwise></xsl:choose> <xsl:value-of select="format-number(@books_sold,'###,###')"/></td> ... <td><xsl:value-of select="format-number(sum(quarter/@books_sold),'###,###')"/>

The XML ‘Alphabet Soup’ Transformations: Example

The XML ‘Alphabet Soup’XSL Formatting Objects (XSL-FO) • A set of formatting semantics • Denotes typographic elements (for example: page, paragraph, rule, etc.) • Allows finer control obtained via formatting elements • Word, letter spacing • Indentation • Widow, orphan, hyphenation control • Font style, etc.

The XML ‘Alphabet Soup’ XPath (XML Path Language) • General purpose query language for identifying nodes in an XML document • Declarative (vs. procedural) • Contextual – the results depend on current node • Supports standard comparison, Boolean and mathematical operators (=, <, and, or, *, +, etc.)

The XML ‘Alphabet Soup’XPath Operators

The XML ‘Alphabet Soup’XPath Query Examples ./author (finds all author elements within current context) /bookstore (find the bookstore element at the root) /* (find the root element) //author (find all author elements anywhere in document) /bookstore[@specialty = “textbooks”] (find all bookstores where the specialty attribute = “textbooks”) /book[@style = /bookstore/@specialty] (find all books where the style attribute = the specialty attribute of the bookstore element at the root)

The XML ‘Alphabet Soup’ XPointer • Builds upon XPath to: • Identify sub-node data • Identify a range of data • Identify data in local document or remote documents • New standard

The XML ‘Alphabet Soup’ XLink • XML Linking Language • Elements of XML documents • Describes links between resources • Simple links (for example, HTML HREFs) • Extended links • Remote resources • Local resources • Rules for how a link is followed, etc.

The XML ‘Alphabet Soup’ The XML DOM • XML Document Object Model (DOM) • Provides a programming interface for manipulating XML documents in memory • Includes a set of objects and interfaces that represent the content and structure of an XML document • Enables a program to traverse an XML tree • Allows elements, attributes, etc., to be added/deleted in an XML tree • Allows new XML documents to be created programmatically

The XML ‘Alphabet Soup’ SAX (Simple API for XML) • API to allow developers to read/write XML data • Event based • Uses a “push” model • Sequential access only (data not cached) • Requires less memory to process XML data than the DOM • SAX has less overhead (uses small input, work and output buffers) than the DOM • DOM constructs the data structure in memory (work and output buffers = to size of data)

The XML ‘Alphabet Soup’ Data Islands • XML embedded in an HTML document • Manipulated via client side script or data binding <XML id=“XMLID”> <BOOK> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> </BOOK> </XML> <XML id=“XMLID” src=“mydocument.xml”>

The XML ‘Alphabet Soup’ Data Islands • Can be embedded in an HTML SCRIPT element • XML is accessible via the DOM: <SCRIPT language=“xml” id=“XMLID”> <SCRIPT type=“text/xml” id=“XMLID”> <SCRIPT language=“xml” id=“XMLID” src=“mydocument.xml”>

XML

XML

Presentation Transcript

XML

XML

XML

XML

XML

XML

XML

XML

XML & XML Schema

XML & XML Schema

XML

XML

XML

XML to XML through XML

XML

XML

XML

XML

XML

Presentation Transcript

XML

XML

XML

XML

XML

XML

XML

XML

XML &amp; XML Schema

XML &amp; XML Schema

XML

XML

XML

XML to XML through XML

XML

XML

XML

XML & XML Schema

XML & XML Schema