500 likes | 642 Views
ISQA 407 XML/WML Spring 2002 Dr. Sergio Davalos. Chapter 6 – Document Type Definition (DTD).
E N D
Chapter 6 – Document Type Definition (DTD) Outline6.1 Introduction6.2 Parsers, Well-formed and Valid XML Documents6.3 Document Type Declaration6.4 Element Type Declarations 6.4.1 Sequences, Pipe Characters and Occurrence Indicators 6.4.2 EMPTY, Mixed Content and ANY6.5 Attribute Declarations 6.5.1 Attribute Defaults (#REQUIRED, #IMPLIED, #FIXED)6.6 Attribute Types 6.6.1 Tokenized Attribute Type (ID, IDREF, ENTITY, NMTOKEN) 6.6.2 Enumerated Attribute Types6.7 Conditional Sections6.8 Whitespace Characters6.9 Case Study: Writing a DTD for the Day Planner Application
6.1 Introduction • Document Type Definitions (DTDs) • Define structure of XML document • i.e., what elements, attributes, etc. are permitted in document • XML document not required to have DTD • Usually recommended for document conformity • Use Extended Backus-Naur Form (EBNF) grammar
6.2 Parsers, Well-formed and Valid XML Documents • Parsers • Validating • Able to read DTD • Determine whether XML document conforms to DTD • Valid document conforms to DTD • Document is then well formed, by definition • Documents can be well formed, but not valid • Nonvalidating • Able to read DTD • Cannot check document against DTD for conformity
6.3 Document Type Declaration • Document Type Declaration • Introduce DTDs into XML documents • Placed in XML document’s prolog • Begins with <!DOCTYPE • Ends with > • Can point to • External subsets • Declarations outside document • Exist in different file • typically ending with .dtd extension • Internal subsets • Declarations inside document • Visible only within document in which it resides
6.4 Element Type Declarations • Element type declarations • Declare elements in XML documents • Begin with <!ELEMENT • End with > <!ELEMENT myElement ( #PCDATA )> • myElement is generic identifier • Parentheses specify element’s content (content specification) • Keyword PCDATA • Element must contain parsable character data • All text treated as markup
1 <?xml version = "1.0"?> DOCTYPE starts document type declaration 2 Document type declaration is named myMessage 3 <!-- Fig. 6.1: intro.xml --> 4 <!-- Using an external subset --> 5 Keyword SYSTEM specifies external subset 6 <!DOCTYPE myMessage SYSTEM"intro.dtd"> intro.dtd is DTD 7 8 <myMessage> 9 <message>Welcome to XML!</message> 10 </myMessage> u Fig. 6.1 XML document declaring its associated DTD. DOCTYPE starts document type declarationDocument type declaration is named myMessageKeyword SYSTEM specifies external subsetintro.dtd is DTD
1 <!-- Fig. 6.2: intro.dtd --> Declare element myMessage 2 <!-- External declarations --> Element myMessage contains child element message 3 4 <!ELEMENT myMessage ( message )> 5 <!ELEMENT message ( #PCDATA )> Declare element message Element message contains parsable character data Fig. 6.2 Validation with using an external DTD. Declare element myMessageElement myMessage contains child element messageDeclare element messageElement message contains parsable character data
1 <?xml version = "1.0"?> 2 3 <!-- Fig. 6.3 : intro-invalid.xml --> 4 <!-- Simple introduction to XML markup --> 5 6 <!DOCTYPE myMessage SYSTEM "intro.dtd"> 7 Element myMessage’s structure does not adhere to that specified in intro.dtd 8 <!-- Root element missing child element message --> 9 <myMessage> 10 </myMessage> Fig. 6.3 Non-valid XML document. Element myMessage’s structure does not adhere to that specified in intro.dtd
6.4.1 Sequences, Pipe Characters and Occurrence Indicators • Sequences • Specify order in which elements occur • Comma (,) used as delimiter <!ELEMENT classroom ( teacher, student )>
6.4.1 Sequences, Pipe Characters and Occurrence Indicators (cont.) • Pipe characters (|) • Specify choices <!ELEMENT dessert ( iceCream, pastry )>
6.4.1 Sequences, Pipe Characters and Occurrence Indicators (cont.) • Occurrence indicators • Specify element’s frequency • Plus sign (+) indicates one or more occurrences <!ELEMENT album ( song+ )> • Asterisk (*) indicates optional element <!ELEMENT library ( book* )> • Question mark (?) indicates element can occur only once <!ELEMENT seat ( person? )>
6.4.2 EMPTY, Mixed Content and ANY • Content specification types • EMPTY • Elements do not contain character data • Elements do not contain child elements <!ELEMENT oven EMPTY> • Markup for oven element <oven/>
6.4.2 EMPTY, Mixed Content and ANY • Content specification types • Mixed content • Combination of elements and PCDATA <!ELEMENT myMessage ( #PCDATA | message )*> • Markup for myMessage <myMessage>Here is some text, some<message>other text</message>and<message>even more text</message></myMessage>
1 <?xml version = "1.0"standalone = "yes"?> Specify DTD as internal subset 2 Declare format as mixed content element 3 <!-- Fig. 6.5 : mixed.xml --> 4 <!-- Mixed content type elements --> Elements bold and italic have PCDATA only for content specification 5 6 <!DOCTYPE format [ Element format adheres to structure in DTD 7 <!ELEMENT format ( #PCDATA | bold | italic )*> 8 <!ELEMENT bold ( #PCDATA )> 9 <!ELEMENT italic ( #PCDATA )> 10 ]> 11 12 <format> 13 This is a simple formatted sentence. 14 <bold>I have tried bold.</bold> 15 <italic>I have tried italic.</italic> 16 Now what? 17 </format> Fig. 6.5 Example of a mixed-content element. Specify DTD as internal subsetDeclare format as mixed content elementElements bold and italic have PCDATA only for content specificationElement format adheres to structure in DTD
6.4.2 EMPTY, Mixed Content and ANY • Content specification types • ANY • Can contain any content • PCDATA, elements or combination • Can also be empty elements • Commonly used in early DTD-development stages • Replace with specific content as DTD evolves
6.5 Attribute Declarations • Attribute declaration • Specifies element’s attribute list • Uses ATTLIST attribute list declaration
1 <?xml version = "1.0"?> Specify DTD as internal subset 2 Declare element myMessagewith child element message 3 <!-- Fig. 6.7: intro2.xml --> 4 <!-- Declaring attributes --> 5 Declare that attribute idcontain required CDATA 6 <!DOCTYPE myMessage [ 7 <!ELEMENT myMessage ( message )> 8 <!ELEMENT message ( #PCDATA )> 9 <!ATTLIST message id CDATA #REQUIRED> 10 ]> 11 12 <myMessage> 13 14 <messageid ="445"> 15 Welcome to XML! 16 </message> 17 18 </myMessage> Fig. 6.7 Declaring attributes.Specify DTD as internal subsetDeclare element myMessagewith child element messageDeclare that attribute idcontain required CDATA
6.5.1 Attributes Defaults (#REQUIRED, #IMPLIED, #FIXED) • Attribute defaults • Specify attribute’s default value • #IMPLIED • Use (application’s) default value if attribute value not specified • #REQUIRED • Attribute must appear in element • Document is not valid if attribute is missing • #FIXED • Attribute value is constant • Attribute value cannot differ in XML document
6.6 Attribute Types • Attribute types • Strings (CDATA) • No constraints on attribute values • Except for disallowing <, >, &, ’and ” characters • Tokenized attributes • Constraints on permissible characters for attribute values • Enumerated attributes • Most restrictive • Take only one value listed in attribute declaration
6.6.1 Tokenized Attribute Type (ID, IDREF, ENTITY, NMTOKEN) • Tokenized attribute types • Restrict attribute values • ID • Uniquely identifies an element • IDREF • Points to elements with ID attribute
1 <?xml version = "1.0"?> 2 Each shipping element has a unique identifier (shipID) 3 <!-- Fig. 6.8: IDExample.xml --> 4 <!-- Example for ID and IDREF values of attributes --> 5 6 <!DOCTYPE bookstore [ 7 <!ELEMENT bookstore ( shipping+, book+ )> Attribute shippedBy points to shipping element by matching shipID attribute 8 <!ELEMENT shipping ( duration )> 9 <!ATTLIST shipping shipID ID #REQUIRED> 10 <!ELEMENT book ( #PCDATA )> 11 <!ATTLIST book shippedBy IDREF #IMPLIED> 12 <!ELEMENT duration ( #PCDATA )> 13 ]> 14 15 <bookstore> 16 <shipping shipID = "s1"> 17 <duration>2 to 4 days</duration> 18 </shipping> 19 Fig. 6.8 XML document with ID and IDREF attribute types. Each shipping element has a unique identifier (shipID)Attribute shippedBy points to shipping element by matching shipID attribute
20 <shipping shipID = "s2"> 21 <duration>1 day</duration> 22 </shipping> 23 Declare book elements with attribute shippedBy 24 <book shippedBy = "s2"> 25 Java How to Program 3rd edition. 26 </book> 27 28 <book shippedBy = "s2"> 29 C How to Program 3rd edition. 30 </book> 31 32 <book shippedBy = "s1"> 33 C++ How to Program 3rd edition. 34 </book> 35 </bookstore> Fig. 6.8 XML document with ID and IDREF attribute types . (Part 2) Declare book elements with attribute shippedBy
Assign shippedBy (line 28) value “s3” Fig. 6.9 Error displayed by XML Validator when an invalid ID is referenced. Outline Assign shippedBy (line 28) value “s3”
6.6.1 Tokenized Attribute Type (ID, IDREF, ENTITY, NMTOKEN) (cont.) • ENTITY tokenized attribute type • Indicate that attribute has entity for its value • Entity declaration <!ENTITY digits “0123456789”> • Entity may be used as follows: <useAnEntity>&digits;</useAnEntity> • Entity reference &digits; replaced by its value <useAnEntity>0123456789</useAnEntity>
1 <?xml version = "1.0"?> Declare entity city that refers to external document elements tour.html 2 3 <!-- Fig. 6.10: entityExample.xml --> 4 <!-- ENTITY and ENTITY attribute types --> NDATA indicates that external-entity content is not XML 5 6 <!DOCTYPE database [ Attribute tour for element company requires ENTITY attribute type 7 <!NOTATION html SYSTEM"iexplorer"> 8 <!ENTITY city SYSTEM"tour.html"NDATA html> 9 <!ELEMENT database ( company+ )> 10 <!ELEMENT company ( name )> 11 <!ATTLIST company tour ENTITY #REQUIRED> 12 <!ELEMENT name ( #PCDATA )> 13 ]> 14 15 <database> 16 <company tour = "city"> 17 <name>Deitel & Associates, Inc.</name> 18 </company> 19 </database> Declare entity city that refers to external document elements tour.html Fig. 6.10 XML document that contains an ENTITY attribute type.Declare entity city that refers to external document elements tour.htmlNDATA indicates that external-entity content is not XMLAttribute tour for element company requires ENTITY attribute type
Fig. 6.10 XML document that contains an ENTITY attribute type.
Replace line 16<company tour ="city">with<company tour ="country"> Fig. 6.11 Error generated by XML Validator when a DTD contains a reference to an undefined entity. Outline Replace line 16
6.6.1 Tokenized Attribute Type (ID, IDREF, ENTITY, NMTOKEN) (cont.) • NMTOKEN tokenized attribute type • “Name token” • Value consists of letters, digits, periods, underscores, hyphens and colon characters
6.6.2 Enumerated Attribute Types • Enumerated attribute types • Declare list of possible values for attribute <!ATTLIST person gender ( M | F ) “F”> • Attribute gender can have either value M or F • F is default value
6.7 Conditional Sections • Conditional sections • Include declarations • Keyword INCLUDE • Exclude declarations • Keyword IGNORE • Often used with entities • Parameter entities • Preceded by percent character (%) • Creates entities specific to DTD • Can be used only inside DTD in which they are declared
1 <!-- Fig. 6.12: conditional.dtd --> Entities accept and reject represent strings INCLUDE and IGNORE, respectively 2 <!-- DTD for conditional section example --> 3 Include this element message declaration 4 <!ENTITY % reject "IGNORE"> Exclude this element message declaration 5 <!ENTITY % accept "INCLUDE"> 6 7 <![ %accept; [ 8 <!ELEMENT message ( approved, signature )> 9 ]]> 10 11 <![ %reject; [ 12 <!ELEMENT message ( approved, reason, signature )> 13 ]]> 14 15 <!ELEMENT approved EMPTY> 16 <!ATTLIST approved flag ( true | false ) "false"> 17 18 <!ELEMENT reason ( #PCDATA )> 19 <!ELEMENT signature ( #PCDATA )> Entities accept and reject represent strings INCLUDE and IGNORE, respectively Fig. 6.12 Conditional sections in a DTD.Entities accept and reject represent strings INCLUDE and IGNORE, respectivelyInclude this element message declarationExclude this element message declaration
1 <?xml version = "1.0" standalone = "no"?> 2 3 <!-- Fig. 6.13: conditional.xml --> 4 <!-- Using conditional sections --> 5 6 <!DOCTYPE message SYSTEM "conditional.dtd"> 7 8 <message> 9 <approved flag = "true"/> 10 <signature>Chairman</signature> 11 </message> Fig. 6.13 XML document that conforms to conditional.dtd.
Whitespace Characters • Whitespace • Either preserved or normalized • Depending on context in which it is used
1 <?xml version ="1.0"?> 2 3 <!-- Fig. 6.14 : whitespace.xml --> 4 <!-- Demonstrating whitespace parsing --> Attribute hasCDATA requires CDATA, which preserves whitespace 5 6 <!DOCTYPE whitespace [ Other attributes normalize (do not preserve) whitespace 7 <!ELEMENT whitespace ( hasCDATA, 8 hasID, hasNMTOKEN, hasEnumeration, hasMixed )> 9 10 <!ELEMENT hasCDATA EMPTY> 11 <!ATTLIST hasCDATA cdata CDATA #REQUIRED> 12 13 <!ELEMENT hasID EMPTY> 14 <!ATTLIST hasID id ID #REQUIRED> 15 16 <!ELEMENT hasNMTOKEN EMPTY> 17 <!ATTLIST hasNMTOKEN nmtoken NMTOKEN #REQUIRED> 18 19 <!ELEMENT hasEnumeration EMPTY> 20 <!ATTLIST hasEnumeration enumeration ( true | false ) 21 #REQUIRED> 22 23 <!ELEMENT hasMixed ( #PCDATA | hasCDATA )*> 24 ]> 25 Fig. 6.14 Processing whitespace in an XML document.Attribute hasCDATA requires CDATA, which preserves whitespaceOther attributes normalize (do not preserve) whitespace
26 <whitespace> Whitespace preserved 27 28 <hasCDATA cdata = " simple cdata "/> Whitespace normalized 29 30 <hasID id = " i20"/> 31 32 <hasNMTOKEN nmtoken = " hello"/> 33 34 <hasEnumeration enumeration = " true"/> 35 36 <hasMixed> 37 This is text. 38 <hasCDATA cdata = " simple cdata"/> 39 This is some additional text. 40 </hasMixed> 41 42 </whitespace> Fig. 6.14 Processing whitespace in an XML document. (Part 2) Whitespace preservedWhitespace normalized
Whitespace preserved Whitespace normalized • >java Tree yeswhitespace.xmlURL: file:C:/Examplesps/Files/deleted/ch09/Tree/whitespace.xml[ document root ]+-[ element : whitespace ] +-[ ignorable ] +-[ ignorable ] +-[ ignorable ] +-[ element : hasCDATA ] +-[ attribute : cdata ]" simple cdata “ +-[ ignorable ] +-[ ignorable ] +-[ ignorable ]+-[ element : hasID ] +-[ attribute : id ] "i20“ +-[ ignorable ] +-[ ignorable ] +-[ ignorable ] +-[ element : hasNMTOKEN ] +-[ attribute : nmtoken ] "hello“ +-[ ignorable ] +-[ ignorable ] +-[ ignorable ] +-[ element : hasEnumeration ] +-[ attribute : enumeration ] "true“ +-[ ignorable ] +-[ ignorable ] +-[ ignorable ] +-[ element : hasMixed ] +-[ text ] ““ +-[ text ] " This is text.“ +-[ text ] “ Outputfrom Fig. 6.14 Whitespace preservedWhitespace normalized
“ +-[ text ] " “ +-[ element : hasCDATA ] +-[ attribute : cdata ] " simple cdata“ +-[ text ] ““ +-[ text ] " This is some additional text.“ +-[ text ] ““ +-[ text ] " “ +-[ ignorable ] +-[ ignorable ][ document end ] Outputfrom Fig. 6.14
6.9 Case Study: Writing a DTD for the Day Planner Application • Continue case study from Chapter 5 • External subset of DTD for day planner
1 <!-- Fig. 6.15: planner.dtd --> Root element planner contains any number of (optional) year elements 2 <!-- DTD for day planner --> Element year contains one or more date elements 3 Element year contains attribute value that has character data 4 <!ELEMENT planner ( year* )> Element date contains one or more note elements 5 6 <!ELEMENT year ( date+ )> Element date contains attributes month and day, which contain has character data 7 <!ATTLIST year value CDATA #REQUIRED> Element note contains parsed character data and optional attribute time 8 9 <!ELEMENT date ( note+ )> 10 <!ATTLIST date month CDATA #REQUIRED> 11 <!ATTLIST date day CDATA #REQUIRED> 12 13 <!ELEMENT note ( #PCDATA )> 14 <!ATTLIST note time CDATA #IMPLIED> Fig. 6.15 DTD for planner.xml.Root element plannerElement year contains one or more date elementsElement year contains attribute value that has character dataElement date contains one or more note elementsElement date contains attributes month and dayElement note contains parsed character data and optional attribute time