530 likes | 636 Views
XML_1. Ch. 7 Fall 2010. Bibliography. W3C Recommendations http://www.w3.org/TR/REC-xml/ XML online tutorials http://www.w3schools.com/xml/default.asp Java API for XML Processing (JAXP): https://jaxp.dev.java.net/ http://www.xml-training-guide.com/ Examples from textbook
E N D
XML_1 Ch. 7 Fall 2010 Comp Sci 346
Bibliography • W3C Recommendations http://www.w3.org/TR/REC-xml/ • XML online tutorials • http://www.w3schools.com/xml/default.asp • Java API for XML Processing (JAXP): https://jaxp.dev.java.net/ • http://www.xml-training-guide.com/ • Examples from textbook • Examples from “Internet & World Wide Web How to Program” Third Edition by Deitel, Deitel, and Goldberg Prentice Hall Comp Sci 346
Extensible Markup Language (XML) What is XML? • A meta-markup language • A technology for creating markup languages Why should you learn XML? • It allows you to invent your own tags • XML documents can be easily parsed • XML is portable • The Web is becoming XML-based rather than HTML-based Comp Sci 346
Based on tag pairs Purpose: Markup language Displays the data Focus: How it looks Does not care about the meaning of contents Predefined tags Based on tag pairs Purpose: Meta-markup language to define markup language ML defined by XML describes the data Focus:meaning (what) of data Method of data exchange Content must be well structured No predefined tags XHTML vs XML Comp Sci 346
How to display XML data? • XHTML: Use CSS (Cascading Style Sheet) • XSL (Extensible Style Language) Comp Sci 346
Historical Development SGML( Standard Generalized Markup Language) XML HTML MusicXML . . . MathML XHTML CML RSS XBRL Comp Sci 346
XML Syntax • Same as XHTML (an application of XML) • First line: <?xml version = "1.0"?> • Tree of elements: one root element • Element: opening tag, content, closing tag • Opening tag format: <tag_name> • Closing tag format: </tag_name> • Opening tag may contain attributes • Attribute values must be quoted Comp Sci 346
XML Documents • Contain marked up data • Do not contain any formatting information • XML parser passes data on to an application (e.g. browser) • Stylesheet may be applied to render the document Comp Sci 346
XML Documents • Two major elements • Prolog • XML declaration statements • Processing instructions • Comments • For example: <?xml version = "1.0" encoding = "utf-8"?> <!--Sample XML document --> • Body Comp Sci 346
XML Document • XML is hierarchical • E.g. • Book • Title • Chapter • Paragraph <Book> <Title> </Title> <Chapter> <paragraph> </paragraph> </Chapter> </Book> Comp Sci 346
XML Document • SystemMessage as an example <SystemMessage> <MessageTitle>System Down for Maintenance</MessageTitle> <MessageBody>Going down for maintenance soon! </MessageBody> <MessageAuthor> <MessageAuthorName>Joe SystemGod </MessageAuthorName> <MessageAuthorEmail> systemgod@someserver.com </MessageAuthorEmail> </MessageAuthor> < MessageDate> Oct. 19, 2010</MessageDate> </SystemMessage> Comp Sci 346
Rules • XML is case-sensitive • All XML tags must be properly closed • XML tags must be properly nested • No overlapping tags are allowed Comp Sci 346
View 1article.xml with Browsers • Microsoft Internet Explorer • Netscape • Mozilla • Firefox • Opera • What did you discover? Comp Sci 346
A more complicated xml document • XML for a Business Letter • See 2letter.xml Comp Sci 346
<?xml version = "1.0"?> <!-- Fig. 20.3: 2letter.xml --> <!-- Business letter formatted with XML --> <!DOCTYPE letter SYSTEM "letter.dtd"> <letter> <contact type = "from"> </contact> <contact type = "to"> </contact> <salutation>Dear Sir:</salutation> <paragraph>It is our privilege to inform you about our new database managed with XML. This new system allows you to reduce the load of your inventory list server by having the client machine perform the work of sorting and filtering the data.</paragraph> <closing>Sincerely</closing> <signature>Mr. Doe</signature> </letter> Comp Sci 346
DTD • DTD (Document-Type-Information) added • XML document does not need DTD • XML parsers need DTD to ensure that XML documents have the proper structure • Use a validator • www.w3.org/XML/Schema.html for suggestions • Microsoft downloadable XML validatorhttp://support.microsoft.com/kb/307379 Comp Sci 346
The contact element <contact type = "from"> <name>John Doe</name> <address1>123 Main St.</address1> <address2></address2> <city>Anytown</city> <state>Anystate</state> <zip>12345</zip> <phone>555-1234</phone> <flag gender = "M"/> </contact> Comp Sci 346
Characters XML documents may contain • Carriage returns • Line feeds • Unicode characters Angle brackets < > delimit markup text Character data is the text between start and end tags Comp Sci 346
Reserved Characters • These characters may not be used in character data: & < > ' " • Entity references are used if reserved characters are needed in character data <blah> "Hello & goodbye." </blah> Comp Sci 346
Unicode Characters Entity references are used for Unicode characters not found on keyboard Example: د denotes an Arabic character Comp Sci 346
DOCTYPE • XML documents may contain a <!DOCTYPE> tag <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0Strict//EN" "http://www.w3.org/TR/chtml1/DTD/xhtml1-strict.dtd"> • Specifies root element (html) • Information about location of the document type definition (dtd) Comp Sci 346
CDATA Sections • Sections of XML doc ignored by parser • May contain special characters • Example: JavaScript code <![CDATA[ XML parser <<< ignores >>> all of this stuff. Note no space between first [ and CDATA; CDATA and second [ ]]> Comp Sci 346
Namespaces • Document authors may invent their own elements • Tag names may be reused • Naming collisions must be avoided • How? Comp Sci 346
XML Namespace • XML • Allows document authors to create custom elements • Naming collisions • XML namespace • Collection of element and attribute names may conflict • <subject>Math</subject> • <subject>xhtml</subject> • Uniform resource identifier (URI) • Uniquely identifies the namespace • A string of text for differentiating names • <School:subject>Math</School:subject> • <web_programming:subject>xhtml</web_programming:subject> • Any name except for reserved namespace xml • Directory • Root element and contains other elements Comp Sci 346
Specify Namespace with URIExample: namespace.xml <?xml version = "1.0"?> <!-- Fig. 20.4 : namespace.xml --> <!-- Demonstrating Namespaces --> <directory xmlns:text = "urn:deitel:textInfo" xmlns:image = "urn:deitel:imageInfo"> <text:file filename = "book.xml"> <text:description>A book list</text:description> </text:file> <image:file filename = "funny.jpg"> <image:description> A funny picture</image:description> <image:size width = "200" height = "100"/> </image:file> </directory> Comp Sci 346
Or use URL • <text:directory xmlns:text = http://www.deitel.com/xml-text Xmlns:image = http://www.deitel.com/xmlns-image> Comp Sci 346
Default namespaceExample: defaultnamespace.xml <directory xmlns = "urn:deitel:textInfo" xmlns:image = "urn:deitel:imageInfo"> <file filename = "book.xml"> <description>A book list</description> </file> <image:file filename = "funny.jpg"> <image:description>A funny picture</image:description> <image:size width = "200" height = "100"/> </image:file> </directory> Comp Sci 346
How to specify structure of document • Two methods for defining an XML document's structure • DTD • Schema • "Valid" XML doc: conforms to DTD or schema • Note: a doc may be well-formed but invalid Comp Sci 346
DTD • Document Type Definition • Uses Extended Backus-Naur Form (EBNF) grammar to define structure • DTDs exist for XHTML strict and transitional • Used by validation services Comp Sci 346
Document Type Definitions • Enables XML parser to verify whether XML document is valid • Allow independent user groups to check structure and exchange data in standardized format • Expresses set of rules for structure using EBNF grammar • ELEMENT type declaration • Defines rules • ATTLIST attribute-list declaration • Defines an attribute Comp Sci 346
DTD for 2letter.xml <!-- Fig. 20.4: letter.dtd --> <!-- DTD document for letter.xml --> <!ELEMENT letter ( contact+, salutation, paragraph+, closing, signature )> <!ELEMENT contact ( name, address1, address2, city, state, zip, phone, flag )> <!ATTLIST contact type CDATA #IMPLIED> <!ELEMENT name ( #PCDATA )> <!ELEMENT address1 ( #PCDATA )> <!ELEMENT address2 ( #PCDATA )> <!ELEMENT city ( #PCDATA )> <!ELEMENT state ( #PCDATA )> <!ELEMENT zip ( #PCDATA )> <!ELEMENT phone ( #PCDATA )> <!ELEMENT flag EMPTY> <!ATTLIST flag gender (M | F) "M"> <!ELEMENT salutation ( #PCDATA )> <!ELEMENT closing ( #PCDATA )> <!ELEMENT paragraph ( #PCDATA )> <!ELEMENT signature ( #PCDATA )> Comp Sci 346
Indicators • +: one or more elements • *: optional element that can occur any number of times • ?: optional element that can occur at most once • No indicator: exactly once Comp Sci 346
ATTLIST – Attribute-List Declaration • Defines type <!ATTLIST contact type CDATA #IMPLIED> • If no type, arbitrary or ignore <!ATTLIST contact type CDATA #REQUIRED> • Attribute must be present <!ATTLIST contact type CDATA #FIXED> • Attribute if present must have the given fixed value <!ATTLIST address zip #FIXED “54901”> Comp Sci 346
XHTML11.DTD Comp Sci 346
Why Schema? • Development feels DTD inflexible • Cannot manipulate DTD as XML documents • DTD defines structure, not contents • <quantity>5</quantity> • 5 is treated as PCDATA – Parsed Character Data • Parser verifies that 5 is PCDATA but not numeric • Even <quantity>string</quantity> is acceptable • XML Schema allows specification that quantity must be numeric Comp Sci 346
Schema • New and improved method describing XML doc structure • Uses XML syntax • Schema is an XML document • Schemas may be modified by software • Allows more detailed specification of element content • Tutorial: www.w3schools.com/schema/default.asp Comp Sci 346
W3C XML Schema Documents • Properties • Specify XML document structure • Do not use EBNF grammar • Use XML syntax • Can be manipulated like other XML documents • Require validating parsers • W3C XML schemas www.w3.org/XML/Schema • XML document is “Schema valid” means • XML document conforms to a schema document • Schemas uses .xsd extension Comp Sci 346
W3C XML Schema Documents • Root element schema e.g. book.xsd • Contains elements that define the XML document structure • targetNamespace • Namespace of XML vocabulary the schema defines • Same as the xmlns defined in book.xml • Xml document is connected via this targetNamespace • element tag • Defines element to be included in XML document structure • name and type attributes • Specify element’s name and data type respectively • Built-in simple types • String, date, int, double, time, etc Comp Sci 346
W3C XML Schema Documents • Two categories of data types • Simple types • Cannot contain attributes or child elements • Complex types • May contain attributes and child elements • complexType • Define complex type • Simple content • Cannot have child elements • Complex content • May have child elements Comp Sci 346
Is the schema correctly specified? • To validate book.xsd, use XSV (XML Schema Validator) open source: www.w3.org/2001/03/webdata/xsv • Or Free Trials http://www.stylusstudio.com/xml_parsers.html Comp Sci 346
Online XSD Schema Validator • Schema\book.xml is based on Schema specification • Schema\book.xsd, an XML Schema document, defines the structure for book.xml • Schemas use .xsd extension • Does book.xml conform to the schema book.xsd? • Cut and paste book.xml and book.xsd into www.xmlforasp.net/Schemavalidator.aspx and validate • Or SAX (exercise) http://msdn2.microsoft.com/en-us/library/ms756991.aspx Comp Sci 346
Book.xml <?xml version = "1.0"?> <!-- Fig. 20.7 : book.xml --> <!-- Book list marked up as XML --> <deitel:books xmlns:deitel = "http://www.deitel.com/booklist"> <book> <title>XML How to Program</title></book> <book> <title>C How to Program</title></book> <book> <title>Java How to Program</title></book> <book> <title>C++ How to Program</title></book> <book> <title>Perl How to Program</title> </book> </deitel:books> Comp Sci 346
Book.xsd <?xml version = "1.0"?> <!-- Fig. 20.8 : book.xsd --> <!-- Simple W3C XML Schema document --> <schema xmlns = "http://www.w3.org/2001/XMLSchema" xmlns:deitel = "http://www.deitel.com/booklist" targetNamespace = "http://www.deitel.com/booklist"> <element name = "books" type = "deitel:BooksType“ /> <complexType name = "BooksType"> <sequence> <element name = "book" type = "deitel:SingleBookType" minOccurs = "1" maxOccurs = "unbounded"/> </sequence> </complexType> <complexType name = "SingleBookType"> <sequence><element name = "title" type = "string"/></sequence> </complexType> </schema> Comp Sci 346