970 likes | 986 Views
XML. Mark Sapossnek CS 594 Computer Science Department Metropolitan College Boston University. Jump Table Added for CSE681. Components of an XML Document Syntax and Structure An XML Document Syntax and Structure Namespaces: Overview The XML ‘Alphabet Soup’ Transformations: XSL
E N D
XML Mark SapossnekCS 594 Computer Science Department Metropolitan College Boston University
Jump TableAdded for CSE681 • Components of an XML Document • Syntax and Structure An XML Document • Syntax and Structure Namespaces: Overview • The XML ‘Alphabet Soup’ • Transformations: XSL • XPath (XML Path Language) • XML in .NET Core Classes in System.XML
Learning Objectives • Learn what XML is • Learn the various ways in which XML is used • Learn the key companion technologies • Learn how to use the .NET framework to read, write, and navigate XML documents
Agenda • Overview • Syntax and Structure • The XML Alphabet Soup • XML in .NET • Relational Data and XML
OverviewWhat is XML? • A tag-based meta language • Designed for structured data representation • Represents data hierarchically (in a tree) • Provides context to data (makes it meaningful) • Self-describing data • Separates presentation (HTML) from data (XML) • An open W3C standard • A subset of SGML • vs. HTML, which is an implementation of SGML
Application X Database Repository Configuration Documents OverviewWhat is XML? • XML is a “use everywhere” data specification XML XML XML XML
OverviewDocuments vs. Data • XML is used to represent two main types of things: • Documents • Lots of text with tags to identify and annotate portions of the document • Data • Hierarchical data structures
OverviewXML and Structured Data • Pre-XML representation of data: • XML representation of the same data: “PO-1234”,”CUST001”,”X9876”,”5”,”14.98” <PURCHASE_ORDER> <PO_NUM> PO-1234 </PO_NUM> <CUST_ID> CUST001 </CUST_ID> <ITEM_NUM> X9876 </ITEM_NUM> <QUANTITY> 5 </QUANTITY> <PRICE> 14.98 </PRICE> </PURCHASE_ORDER>
OverviewBenefits of XML • Open W3C standard • Representation of data across heterogeneous environments • Cross platform • Allows for high degree of interoperability • Strict rules • Syntax • Structure • Case sensitive
OverviewWho Uses XML? • Submissions by • Microsoft • IBM • Hewlett-Packard • Fujitsu Laboratories • Sun Microsystems • Netscape (AOL), and others… • Technologies using XML • SOAP, ebXML, BizTalk, WebSphere, many others…
Agenda • Overview • Syntax and Structure • The XML Alphabet Soup • XML in .NET • Relational Data and XML
Syntax and StructureComponents of an XML Document • Elements • Each element has a beginning and ending tag • <TAG_NAME>...</TAG_NAME> • Elements can be empty (<TAG_NAME />) • Attributes • Describes an element; e.g. data type, data range, etc. • Can only appear on beginning tag • Processing instructions • Encoding specification (Unicode by default) • Namespace declaration • Schema declaration
<?xml version=“1.0” ?> <?xml-stylesheet type="text/xsl" href=“template.xsl"?> <ROOT> <ELEMENT1><SUBELEMENT1 /><SUBELEMENT2 /></ELEMENT1> <ELEMENT2> </ELEMENT2> <ELEMENT3 type=‘string’> </ELEMENT3> <ELEMENT4 type=‘integer’ value=‘9.3’> </ELEMENT4> </ROOT> Syntax and StructureComponents of an XML Document Elements with Attributes Elements Prologue (processing instructions)
Syntax and StructureRules For Well-Formed XML • There must be one, and only one, root element • Sub-elements must be properly nested • A tag must end within the tag in which it was started • Attributes are optional • Defined by an optional schema • Attribute values must be enclosed in “” or ‘’ • Processing instructions are optional • XML is case-sensitive • <tag> and <TAG> are not the same type of element
Syntax and StructureWell-Formed XML? • No, CHILD2 and CHILD3 do not nest propertly <xml? Version=“1.0” ?> <PARENT> <CHILD1>This is element 1</CHILD1> <CHILD2><CHILD3>Number 3</CHILD2></CHILD3> </PARENT>
Syntax and StructureWell-Formed XML? • No, there are two root elements <xml? Version=“1.0” ?> <PARENT> <CHILD1>This is element 1</CHILD1> </PARENT> <PARENT> <CHILD1>This is another element 1</CHILD1> </PARENT>
Syntax and StructureWell-Formed XML? • Yes <xml? Version=“1.0” ?> <PARENT> <CHILD1>This is element 1</CHILD1> <CHILD2/> <CHILD3></CHILD3> </PARENT>
Syntax and StructureAn XML Document <?xml version='1.0'?> <bookstore> <book genre=‘autobiography’ publicationdate=‘1981’ ISBN=‘1-861003-11-0’> <title>The Autobiography of Benjamin Franklin</title> <author> <first-name>Benjamin</first-name> <last-name>Franklin</last-name> </author> <price>8.99</price> </book> <book genre=‘novel’ publicationdate=‘1967’ ISBN=‘0-201-63361-2’> <title>The Confidence Man</title> <author> <first-name>Herman</first-name> <last-name>Melville</last-name> </author> <price>11.99</price> </book> </bookstore>
Syntax and Structure Namespaces: Overview • Part of XML’s extensibility • Allow authors to differentiate between tags of the same name (using a prefix) • Frees author to focus on the data and decide how to best describe it • Allows multiple XML documents from multiple authors to be merged • Identified by a URI (Uniform Resource Identifier) • When a URL is used, it does NOT have to represent a live server
Syntax and Structure Namespaces: Declaration Namespace declaration examples: xmlns: bk = “http://www.example.com/bookinfo/” xmlns: bk = “urn:mybookstuff.org:bookinfo” xmlns: bk = “http://www.example.com/bookinfo/” Namespace declaration Prefix URI (URL)
Syntax and Structure Namespaces: Examples <BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”> <bk:TITLE>All About XML</bk:TITLE> <bk:AUTHOR>Joe Developer</bk:AUTHOR> <bk:PRICE currency=‘US Dollar’>19.99</bk:PRICE> <bk:BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo” xmlns:money=“urn:finance:money”> <bk:TITLE>All About XML</bk:TITLE> <bk:AUTHOR>Joe Developer</bk:AUTHOR> <bk:PRICE money:currency=‘US Dollar’> 19.99</bk:PRICE>
Syntax and Structure Namespaces: Default Namespace • An XML namespace declared without a prefix becomes the default namespace for all sub-elements • All elements without a prefix will belong to the default namespace: <BOOK xmlns=“http://www.bookstuff.org/bookinfo”> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR>
Syntax and Structure Namespaces: Scope • Unqualified elements belong to the inner-most default namespace. • BOOK, TITLE, and AUTHOR belong to the default book namespace • PUBLISHER and NAME belong to the default publisher namespace <BOOK xmlns=“www.bookstuff.org/bookinfo”> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> <PUBLISHER xmlns=“urn:publishers:publinfo”> <NAME>Microsoft Press</NAME> </PUBLISHER> </BOOK>
Syntax and Structure Namespaces: Attributes • Unqualified attributes do NOT belong to any namespace • Even if there is a default namespace • This differs from elements, which belong to the default namespace
Syntax and Structure Entities • Entities provide a mechanism for textual substitution, e.g. • You can define your own entities • Parsed entities can contain text and markup • Unparsed entities can contain any data • JPEG photos, GIF files, movies, etc.
Agenda • Overview • Syntax and Structure • The XML Alphabet Soup • XML in .NET • Relational Data and XML
The XML ‘Alphabet Soup’ • XML itself is fairly simple • Most of the learning curve is knowing about all of the related technologies
The XML ‘Alphabet Soup’ Schemas: Overview • DTD (Document Type Definitions) • Not written in XML • No support for data types or namespaces • XSD (XML Schema Definition) • Written in XML • Supports data types • Current standard recommended by W3C • XDR (XML Data Reduced schema) • Interim schema proposed by Microsoft • Obsoleted by XSD
The XML ‘Alphabet Soup’ Schemas: Purpose • Define the “rules” (grammar) of the document • Data types • Value bounds • A XML document that conforms to a schema is said to be valid • More restrictive than well-formed XML • Define which elements are present and in what order • Define the structural relationships of elements
The XML ‘Alphabet Soup’ Schemas: DTD Example • XML document: • DTD schema: <BOOK> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> </BOOK> <!DOCTYPE BOOK [ <!ELEMENT BOOK (TITLE+, AUTHOR) > <!ELEMENT TITLE (#PCDATA) > <!ELEMENT AUTHOR (#PCDATA) > ]>
The XML ‘Alphabet Soup’Schemas: XSD Example • XML document: <CATALOG> <BOOK> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> </BOOK> … </CATALOG>
The XML ‘Alphabet Soup’Schemas: XSD Example <xsd:schema id="NewDataSet“ targetNamespace="http://tempuri.org/schema1.xsd" xmlns="http://tempuri.org/schema1.xsd" xmlns:xsd="http://www.w3.org/1999/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"> <xsd:element name="book"> <xsd:complexType content="elementOnly"> <xsd:all> <xsd:element name="title" minOccurs="0" type="xsd:string"/> <xsd:element name="author" minOccurs="0" type="xsd:string"/> </xsd:all> </xsd:complexType> </xsd:element> <xsd:element name=“Catalog" msdata:IsDataSet="True"> <xsd:complexType> <xsd:choice maxOccurs="unbounded"> <xsd:element ref="book"/> </xsd:choice> </xsd:complexType> </xsd:element> </xsd:schema>
The XML ‘Alphabet Soup’ Schemas: Why You Should Use XSD • Newest W3C Standard • Broad support for data types • Reusable “components” • Simple data types • Complex data types • Extensible • Inheritance support • Namespace support • Ability to map to relational database tables • XSD support in Visual Studio.NET
The XML ‘Alphabet Soup’ Transformations: XSL • Language for expressing document styles • Specifies the presentation of XML • More powerful than CSS • Consists of: • XSLT • XPath • XSL Formatting Objects (XSL-FO)
The XML ‘Alphabet Soup’ Transformations: Overview • XSLT – a language used to transform XML data into a different form (commonly XML or HTML) XML XML,HTML,… XSLT
The XML ‘Alphabet Soup’ Transformations: XSLT • The language used for converting XML documents into other forms • Describes how the document is transformed • Expressed as an XML document (.xsl) • Template rules • Patterns match nodes in source document • Templates instantiated to form part of result document • Uses XPath for querying, sorting, etc.
The XML ‘Alphabet Soup’ Transformations: Example <sales> <summary> <heading>Scootney Publishing</heading> <subhead>Regional Sales Report</subhead> <description>Sales Report</description> </summary> <data> <region> <name>West Coast</name> <quarter number="1" books_sold="24000" /> <quarter number="2" books_sold="38600" /> <quarter number="3" books_sold="44030" /> <quarter number="4" books_sold="21000" /> </region> ... </data> </sales>
The XML ‘Alphabet Soup’ Transformations: Example <xsl:param name="low_sales" select="21000"/> <BODY> <h1><xsl:value-of select="//summary/heading"/></h1> ... <table><tr><th>Region\Quarter</th> <xsl:for-each select="//data/region[1]/quarter"> <th>Q<xsl:value-of select="@number"/></th> </xsl:for-each> ... <xsl:for-each select="//data/region"> <tr><xsl:value-of select="name"/></th> <xsl:for-each select="quarter"> <td><xsl:choose> <xsl:when test="number(@books_sold <= $low_sales)"> color:red;</xsl:when> <xsl:otherwise>color:green;</xsl:otherwise></xsl:choose> <xsl:value-of select="format-number(@books_sold,'###,###')"/></td> ... <td><xsl:value-of select="format-number(sum(quarter/@books_sold),'###,###')"/>
The XML ‘Alphabet Soup’XSL Formatting Objects (XSL-FO) • A set of formatting semantics • Denotes typographic elements (for example: page, paragraph, rule, etc.) • Allows finer control obtained via formatting elements • Word, letter spacing • Indentation • Widow, orphan, hyphenation control • Font style, etc.
The XML ‘Alphabet Soup’ XPath (XML Path Language) • General purpose query language for identifying nodes in an XML document • Declarative (vs. procedural) • Contextual – the results depend on current node • Supports standard comparison, Boolean and mathematical operators (=, <, and, or, *, +, etc.)
The XML ‘Alphabet Soup’XPath Query Examples ./author (finds all author elements within current context) /bookstore (find the bookstore element at the root) /* (find the root element) //author (find all author elements anywhere in document) /bookstore[@specialty = “textbooks”] (find all bookstores where the specialty attribute = “textbooks”) /book[@style = /bookstore/@specialty] (find all books where the style attribute = the specialty attribute of the bookstore element at the root)
The XML ‘Alphabet Soup’ XPointer • Builds upon XPath to: • Identify sub-node data • Identify a range of data • Identify data in local document or remote documents • New standard
The XML ‘Alphabet Soup’ XLink • XML Linking Language • Elements of XML documents • Describes links between resources • Simple links (for example, HTML HREFs) • Extended links • Remote resources • Local resources • Rules for how a link is followed, etc.
The XML ‘Alphabet Soup’ The XML DOM • XML Document Object Model (DOM) • Provides a programming interface for manipulating XML documents in memory • Includes a set of objects and interfaces that represent the content and structure of an XML document • Enables a program to traverse an XML tree • Allows elements, attributes, etc., to be added/deleted in an XML tree • Allows new XML documents to be created programmatically
The XML ‘Alphabet Soup’ SAX (Simple API for XML) • API to allow developers to read/write XML data • Event based • Uses a “push” model • Sequential access only (data not cached) • Requires less memory to process XML data than the DOM • SAX has less overhead (uses small input, work and output buffers) than the DOM • DOM constructs the data structure in memory (work and output buffers = to size of data)
The XML ‘Alphabet Soup’ Data Islands • XML embedded in an HTML document • Manipulated via client side script or data binding <XML id=“XMLID”> <BOOK> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> </BOOK> </XML> <XML id=“XMLID” src=“mydocument.xml”>