E N D
Creating Markup with XML Outline5.1 Introduction5.2 Introduction to XML Markup5.3 Parsers and Well-formed XML Documents5.4 Parsing an XML Document with msxml5.5 Characters 5.5.1 Character Set 5.5.2 Characters vs. Markup 5.5.3 While Space, Entity References and Built-in Entities 5.5.4 Using Unicode in an XML Document5.6 Markup5.7 CDATA Sections5.8 XML Namespaces5.9 Case Study: A Day Planner Application
5.1 Introduction • XML • Technology for creating markup languages • Enables document authors to describe data of any type • Allows creating new tags • HTML limits document authors to fixed tag set
5.2 Introduction to XML Markup • XML document (intro.xml) • Marks up message as XML • Commonly stored in text files • Extension .xml
1 <?xml version = "1.0"?> Document begins with declaration that specifies XML version 1.0 2 Comments 3 <!-- Fig. 5.1 : intro.xml --> Element message is child element of root elementmyMessage 4 <!-- Simple introduction to XML markup --> Line numbers are not part of XML document. We include them for clarity. 5 6 <myMessage> 7 <message>Welcome to XML!</message> 8 </myMessage> Fig. 5.1 Simple XML document containing a message. Line numbers are not part of XML document. We include them for clarity.Document begins with declaration that specifies XML version 1.0CommentsElement message is child element of root elementmyMessage
5.2 Introduction to XML Markup (cont.) • XML documents • Must contain exactly one root element • Attempting to create more than one root element is erroneous • Elements must be nested properly • Incorrect:<x><y>hello</x></y> • Correct:<x><y>hello</y></x>
5.3 Parsers and Well-formed XML Documents • XML parser • Processes XML document • Reads XML document • Checks syntax • Reports errors (if any) • Allows programmatic access to document’s contents
5.3 Parsers and Well-formed XML Documents (cont.) • XML document syntax • Considered well formed if syntactically correct • Single root element • Each element has start tag and end tag • Tags properly nested • Attribute (discussed later) values in quotes • Proper capitalization • Case sensitive
5.3 Parsers and Well-formed XML Documents (cont.) • XML parsers support • Document Object Model (DOM) • Builds tree structure containing document data in memory • Simple API for XML (SAX) • Generates events when tags, comments, etc. are encountered • (Events are notifications to the application)
5.4 Parsing an XML Document with msxml • XML document • Contains data • Does not contain formatting information • Load XML document into Internet Explorer 5.0 • Document is parsed by msxml. • Places plus (+) or minus (-) signs next to container elements • Plus sign indicates that all child elements are hidden • Clicking plus sign expands container element • Displays children • Minus sign indicates that all child elements are visible • Clicking minus sign collapses container element • Hides children • Error generated, if document is not well formed
5.5 Characters • Character set • Characters that may be represented in XML document • e.g., ASCII character set • Letters of English alphabet • Digits (0-9) • Punctuation characters, such as !, - and ?
5.5.1 Character Set • XML documents may contain • Carriage returns • Line feeds • Unicode characters (Section 5.5.4) • Enables computers to process characters for several languages
5.5.2 Characters vs. Markup • XML must differentiate between • Markup text • Enclosed in angle brackets (< and >) • e.g,. Child elements • Character data • Text between start tag and end tag • e.g., Fig. 5.1, line 7: Welcome to XML!
5.5.3 White Space, Entity References and Built-in Entities • Whitespace characters • Spaces, tabs, line feeds and carriage returns • Significant (preserved by application) • Insignificant (not preserved by application) • Normalization • Whitespace collapsed into single whitespace character • Sometimes whitespace removed entirely <markup>This is character data</markup> after normalization, becomes <markup>This is character data</markup>
5.5.3 White Space, Entity References and Built-in Entities (cont.) • XML-reserved characters • Ampersand (&) • Left-angle bracket (<) • Right-angle bracket (>) • Apostrophe (’) • Double quote (”) • Entity references • Allow to use XML-reserved characters • Begin with ampersand (&) and end with semicolon (;) • Prevents from misinterpreting character data as markup
5.5.3 White Space, Entity References and Built-in Entities (cont.) • Build-in entities • Ampersand (&) • Left-angle bracket (<) • Right-angle bracket (>) • Apostrophe (') • Quotation mark (") • Mark up characters “<>&” in element message <message><>&</message>
5.5.4 Using Unicode in an XML Document • XML Unicode support • e.g., Fig. 5.4 displays Arabic words • Arabic characters • represented by entity references for Unicode characters
1 <?xml version = "1.0"?> Document type definition (DTD) defines document structure and entities 2 3 <!-- Fig. 5.4 : lang.xml --> 4 <!-- Demonstrating Unicode --> 5 Root element welcome contains child elements from and subject 6 <!DOCTYPE welcome SYSTEM "lang.dtd"> 7 Sequence of entity references for Unicode characters in Arabic alphabet 8 <welcome> 9 <from> lang.dtd defines entities assoc and text 10 11 <!-- Deitel and Associates --> 12 دايتَل 13 أند 14 15 <!-- entity --> 16 &assoc; 17 </from> 18 19 <subject> 20 21 <!-- Welcome to the world of Unicode --> 22 أهلاً 23 بكم 24 فيِ 25 عالم 26 27 <!-- entity --> 28 &text; 29 </subject> 30 </welcome> Fig. 5.4 XML document that contains Arabic words Document type definition (DTD) defines document structure and entitiesRoot element welcome contains child elements from and subjectSequence of entity references for Unicode characters in Arabic alphabetlang.dtd defines entities assoc and text
5.6 Markup • XML element markup • Consists of • Start tag • Content • End tag • All elements must have corresponding end tag<img src =“img.gif”>is correct in HTML, but not XML • XML requires end tag or forward slash (/) for termination <img src =“img.gif”></img>or <img src =“img.gif”/>is correct XML syntax
5.6 Markup (cont.) • Elements • Define structure • May (or may not) contain content • Child elements, character data, etc. • Attributes • Describe elements • Elements may have associated attributes • Placed within element’s start tag • Values are enclosed in quotes • Element car contains attribute doors, which has value “4” <car doors =“4”/>
5.6 Markup (cont.) • Processing instruction (PI) • Passed to application using XML document • Provides application-specific document information • Delimited by <? and ?>
1 <?xml version = "1.0"?> Processing instruction specifies stylesheet (discussed in Chapter 12) 2 3 <!-- Fig. 5.5 : usage.xml --> 4 <!-- Usage of elements and attributes --> Root element book contains child elements title, author, chapters and media 5 6 <?xml:stylesheet type = "text/xsl"href = "usage.xsl"?> Element book contains attribute isbn, which has value of 999-99999-9-X 7 8 <book isbn = "999-99999-9-X"> Element chapters contains four child elements, each which contain two attributes 9 <title>Deitel&s XML Primer</title> 10 11 <author> 12 <firstName>Paul</firstName> 13 <lastName>Deitel</lastName> 14 </author> 15 16 <chapters> 17 <preface num = "1" pages = "2">Welcome</preface> 18 <chapter num = "1" pages = "4">Easy XML</chapter> 19 <chapter num = "2" pages = "2">XML Elements?</chapter> 20 <appendix num = "1" pages = "9">Entities</appendix> 21 </chapters> 22 23 <media type = "CD"/> 24 </book> Fig. 5.5 XML document that marks up information about a fictitious book. Processing instruction specifies stylesheet (discussed in Chapter 12)Root element book contains child elements title, author, chapters and mediaElement book contains attribute isbn, which has value of 999-9999-9-XElement chapters contains four child elements, each which contain two attributes
Fig. 5.5 XML document that marks up information about a fictitious book.
1 <?xml version = "1.0"?> 2 3 <!-- Fig. 5.6: letter.xml --> 4 <!-- Business letter formatted with XML --> 5 6 <letter> 7 8 <contact type = "from"> 9 <name>Jane Doe</name> 10 <address1>Box 12345</address1> 11 <address2>15 Any Ave.</address2> 12 <city>Othertown</city> 13 <state>Otherstate</state> 14 <zip>67890</zip> 15 <phone>555-4321</phone> 16 <flag gender = "F"/> 17 </contact> 18 19 <contact type = "to"> 20 <name>Jane Doe</name> 21 <address1>123 Main St.</address1> 22 <address2></address2> 23 <city>Anytown</city> 24 <state>Anystate</state> 25 <zip>12345</zip> 26 <phone>555-1234</phone> 27 <flag gender = "M"/> 28 </contact> 29 Fig. 5.6 XML document that marks up a letter.
30 <salutation>Dear Sir:</salutation> 31 32 <paragraph>It is our privilege to inform you about our new 33 database managed with <bold>XML</bold>. This new system 34 allows you to reduce the load on your inventory list 35 server by having the client machine perform the work of 36 sorting and filtering the data.</paragraph> 37 38 <paragraph>The data in an XML element is normalized, so 39 plain-text diagrams such as 40 /---\ 41 | | 42 \---/ 43 will become gibberish.</paragraph> 44 45 <closing>Sincerely</closing> 46 <signature>Ms. Doe</signature> 47 48 </letter> Fig. 5.6 XML document that marks up a letter. (Part 2)
5.7 CDATA Sections • CDATA sections • May contain text, reserved characters and whitespace • Reserved characters need not be replaced by entity references • Not processed by XML parser • Commonly used for scripting code (e.g., JavaScript) • Begin with <![CDATA[ • Terminate with ]]>
1 <?xml version = "1.0"?> 2 3 <!-- Fig. 5.7 : cdata.xml --> Entity references required if not in CDATA section 4 <!-- CDATA section containing C++ code --> 5 6 <book title = "C++ How to Program" edition = "3"> XML does not process CDATA section 7 8 <sample> 9 // C++ comment Note the simplicity offered by CDATA section 10 if ( this->getX() < 5 && value[ 0 ] != 3 ) 11 cerr << this->displayError(); 12 </sample> 13 14 <sample> 15 <![CDATA[ 16 17 // C++ comment 18 if ( this->getX() < 5 && value[ 0 ] != 3 ) 19 cerr << this->displayError(); 20 ]]> 21 </sample> 22 23 C++ How to Program by Deitel & Deitel 24 </book> Fig. 5.7 Using a CDATA section. Entity references required if not in CDATA sectionXML does not process CDATA sectionNote the simplicity offered by CDATA section
5.8 XML Namespaces • Naming collisions • Two different elements have same name <subject>Math</subject> <subject>Thrombosis</subject> • Namespaces • Differentiate elements that have same name<school:subject>Math</school:subject> <medical:subject>Thrombosis</medical:subject> • school and medical are namespace prefixes • Prepended to elements and attribute names • Tied to uniform resource identifier (URI) • Series of characters for differentiating names
5.8 XML Namespaces (cont.) • Creating namespaces • Use xmlns keyword xmlns:text =“urn:deitel:textInfo” xmlns:image =“urn:deitel:imageInfo” • Creates two namespace prefixes text and image • urn:deitel:textInfo is URI for prefix text • urn:deitel:imageInfo is URI for prefix image • Default namespaces • Child elements of this namespace do not need prefix xmlns =“urn:deitel:textInfo”
1 <?xml version = "1.0"?> 2 3 <!-- Fig. 5.8 : namespace.xml --> Use prefix text to describe elements file and description Element directory contains two namespace prefixes 4 <!-- Namespaces --> 5 6 <directory xmlns:text = "urn:deitel:textInfo" 7 xmlns:image = "urn:deitel:imageInfo"> Apply prefix text to describe elements file, description and size 8 9 <text:file filename = "book.xml"> 10 <text:description>A book list</text:description> 11 </text:file> 12 13 <image:file filename = "funny.jpg"> 14 <image:description>A funny picture</image:description> 15 <image:size width = "200" height = "100"/> 16 </image:file> 17 18 </directory> Fig. 5.8 Listing for namespace.xml. Element directory contains two namespace prefixesUse prefix text to describe elements file and descriptionApply prefix text to describe elements file, description and size
1 <?xml version = "1.0"?> urn:deitel:textInfo is default namespace 2 3 <!-- Fig. 5.9 : defaultnamespace.xml --> Element file is in default namespace 4 <!-- Using Default Namespaces --> 5 Specify namespace 6 <directory xmlns = "urn:deitel:textInfo" 7 xmlns:image = "urn:deitel:imageInfo"> 8 9 <file filename = "book.xml"> 10 <description>A book list</description> 11 </file> 12 13 <image:file filename = "funny.jpg"> 14 <image:description>A funny picture</image:description> 15 <image:size width = "200"height = "100"/> 16 </image:file> 17 18 </directory> Fig. 5.9 Using default namespaces. urn:deitel:text-Info is default namespaceElement file is in default namespaceSpecify namespace
5.9 Case Study: A Day Planner Application • Markup for Day-Planner application • Scheduling appointments and task • Date • Time • Appointment type
1 <?xml version = "1.0"?> 2 3 <!-- Fig. 5.10 : planner.xml --> Root element planner holds all appointments 4 <!-- Day Planner XML document --> 5 date elements store specific dates with attributes month and day 6 <planner> 7 8 <year value = "2000"> note elements mark up appointments 9 10 <date month = "7" day = "15"> 11 <note time = "1430">Doctor's appointment</note> 12 <note time = "1620">Physics class at BH291C</note> 13 </date> 14 15 <date month = "7" day = "4"> 16 <note>Independence Day</note> 17 </date> 18 Fig. 5.10 Day planner XML document planner.xml. Root element planner holds all appointments date elements store specific dates with attributes month and daynote elements mark up appointments
19 <date month = "7" day = "20"> 20 <note time = "0900">General Meeting in room 32-A</note> 21 </date> 22 23 <date month = "7"day = "20"> 24 <note time = "1900">Party at Joe's</note> 25 </date> 26 27 <date month = "7" day = "20"> 28 <note time = "1300">Financial Meeting in room 14-C</note> 29 </date> 30 31 </year> 32 33 </planner> Fig. 5.10 Day planner XML document planner.xml. (Part 2)