560 likes | 814 Views
Chapter 6 – Document Type Definition (DTD).
E N D
Chapter 6 – Document Type Definition (DTD) Outline6.1 Introduction6.2 Parsers, Well-formed and Valid XML Documents6.3 Document Type Declaration6.4 Element Type Declarations 6.4.1 Sequences, Pipe Characters and Occurrence Indicators 6.4.2 EMPTY, Mixed Content and ANY6.5 Attribute Declarations 6.5.1 Attribute Defaults (#REQUIRED, #IMPLIED, #FIXED)6.6 Attribute Types: strings (CDATA), tokenized, enumerated 6.6.1 Tokenized Attribute Type (ID, IDREF, ENTITY, NMTOKEN) 6.6.2 Enumerated Attribute Types6.7 Conditional Sections6.8 Whitespace Characters6.9 Case Study: Writing a DTD for the Day Planner Application
6.1 Introduction • Document Type Definitions (DTDs) • Define structure of XML document • i.e., what elements, attributes, etc. are permitted in document • XML document not required to have DTD • Usually recommended for document conformity • Use Extended Backus-Naur Form (EBNF) grammar • Wikipedia Definition: The Backus–Naur form (also known as BNF, the Backus–Naur formalism, Backus normal form, or Panini–Backus Form) is a metasyntax used to express context-free grammars: that is, a formal way to describe formal languages.
Simple DTD (not in the Deitel Text) <?xml version=”1.0” encoding=”UTF-8”?> <!ELEMENT rootelement (firstelement, secondelement)> <!ELEMENT firstelement (level1)> <!ATTLIST firstelement position CDATA #REQUIRED> <!ELEMENT level1 (#PCDATA | level2)*> <!ATTLIST level1 children (0 | 1) #REQUIRED> <!ATTLIST secondelement position CDATA #REQUIRED> <!ELEMENT level2 (#PCDATA)> <!ELEMENT secondelement (level1)>
XML Document using a simple DTD <?xml version=”1.0” encoding=”UTF-8”?> <!DOCTYPE rootelement SYSTEM “verysimplexml.dtd”> <rootelement> <firstelement position=”1”> <level1 children=”0”>This is level 1 of the nested elements</level1> </firstelement> <secondelement position=”2”> <level1 children=”1”> <level2>This is level 2 of the nested elements</level2> </level1> </secondelement> </rootelement>
6.2 Parsers, Well-formed and Valid XML Documents • Parsers • Validating • Able to read DTD • Determine whether XML document conforms to DTD • Valid document conforms to DTD • Document is then well formed, by definition • Documents can be well formed, but not valid • Nonvalidating • Able to read DTD • Cannot check document against DTD for conformity
6.3 Document Type Declaration • Document Type Declaration • Introduce DTDs into XML documents • Placed in XML document’s prolog (markup preceding the root element) • Begins with <!DOCTYPE • Ends with > • Can point to • External subsets • Declarations outside document • Exist in different file • typically ending with .dtd extension • Internal subsets • Declarations inside document • Visible only within document in which it resides
6.3 Document Type DeclarationExternal If the DTD is external to your XML source file, it should be wrapped in a DOCTYPE definition with the following syntax: <!DOCTYPE root-element SYSTEM "filename"> • Example (from intro.xml Fig 6.1) <!DOCTYPE myMessage SYSTEM "intro.dtd">
6.3 Document Type DeclarationInternal If the DTD is included in your XML source file, it should be wrapped in a DOCTYPE definition with the following syntax: <!DOCTYPE root-element [element-declarations]> • Example (from intro2.xml in Fig 6.7) <!DOCTYPE myMessage [ <!ELEMENT myMessage ( message )> <!ELEMENT message ( #PCDATA )> <!ATTLIST message id CDATA #REQUIRED> ]>
6.3 Document Type Declarationcombined Example (from p. 136 of Deitel) <!DOCTYPE myMessage SYSTEM “myDTD.dtd” [ <!ELEMENT myElement ( #PCDATA )> ]>
1 <?xml version = "1.0"?> DOCTYPE starts document type declaration 2 Document type declaration is named myMessage 3 <!-- Fig. 6.1: intro.xml --> 4 <!-- Using an external subset --> 5 Keyword SYSTEM specifies external subset 6 <!DOCTYPE myMessage SYSTEM"intro.dtd"> intro.dtd is DTD 7 8 <myMessage> 9 <message>Welcome to XML!</message> 10 </myMessage> u Fig. 6.1 XML document declaring its associated DTD. DOCTYPE starts document type declarationDocument type declaration is named myMessageKeyword SYSTEM specifies external subsetintro.dtd is DTD
6.4 Element Type Declarations • Element type declarations • Declare elements in XML documents • Begin with <!ELEMENT, End with > • Syntax: <!ELEMENT element-name category> or <!ELEMENT element-name (content-model)> <!ELEMENT myElement ( #PCDATA )> • myElement is generic identifier • Parentheses specify element’s content (content specification) • Keyword PCDATA • Element must contain parsable character data • All text treated as markup
1 <!-- Fig. 6.2: intro.dtd --> Declare element myMessage 2 <!-- External declarations --> Element myMessage contains child element message 3 4 <!ELEMENT myMessage ( message )> 5 <!ELEMENT message ( #PCDATA )> Declare element message Element message contains parsable character data Fig. 6.2 Validation with using an external DTD. Declare element myMessageElement myMessage contains child element messageDeclare element messageElement message contains parsable character data
1 <?xml version = "1.0"?> 2 3 <!-- Fig. 6.3 : intro-invalid.xml --> 4 <!-- Simple introduction to XML markup --> 5 6 <!DOCTYPE myMessage SYSTEM "intro.dtd"> 7 Element myMessage’s structure does not adhere to that specified in intro.dtd 8 <!-- Root element missing child element message --> 9 <myMessage> 10 </myMessage> Fig. 6.3 Non-valid XML document. Element myMessage’s structure does not adhere to that specified in intro.dtd
6.4.1 Sequences, Pipe Characters and Occurrence Indicators • Sequences • Specify order in which elements occur • Comma (,) used as delimiter <!ELEMENT classroom ( teacher, student )>
6.4.1 Sequences, Pipe Characters and Occurrence Indicators (cont.) • Pipe characters (|) • Specify choices <!ELEMENT dessert ( iceCream | pastry )>
6.4.1 Sequences, Pipe Characters and Occurrence Indicators (cont.) • Occurrence indicators • Specify element’s frequency • Plus sign (+) indicates one or more occurrences <!ELEMENT album ( song+ )> • Asterisk (*) indicates optional element <!ELEMENT library ( book* )> • Question mark (?) indicates element can occur only once <!ELEMENT seat ( person? )>
6.4.2 EMPTY, Mixed Content and ANY • Content specification types • EMPTY • Elements do not contain character data • Elements do not contain child elements <!ELEMENT oven EMPTY> • Markup for oven element <oven/>
6.4.2 EMPTY, Mixed Content and ANY • Content specification types • Mixed content • Combination of elements and PCDATA <!ELEMENT myMessage ( #PCDATA | message )*> • Markup for myMessage <myMessage>Here is some text, some<message>other text</message>and<message>even more text</message></myMessage>
1 <?xml version = "1.0"standalone = "yes"?> Specify DTD as internal subset 2 Declare format as mixed content element 3 <!-- Fig. 6.5 : mixed.xml --> 4 <!-- Mixed content type elements --> Elements bold and italic have PCDATA only for content specification 5 6 <!DOCTYPE format [ Element format adheres to structure in DTD 7 <!ELEMENT format ( #PCDATA | bold | italic )*> 8 <!ELEMENT bold ( #PCDATA )> 9 <!ELEMENT italic ( #PCDATA )> 10 ]> 11 12 <format> 13 This is a simple formatted sentence. 14 <bold>I have tried bold.</bold> 15 <italic>I have tried italic.</italic> 16 Now what? 17 </format> Fig. 6.5 Example of a mixed-content element. Specify DTD as internal subsetDeclare format as mixed content elementElements bold and italic have PCDATA only for content specificationElement format adheres to structure in DTD
6.4.2 EMPTY, Mixed Content and ANY • Content specification types • ANY • Can contain any content • PCDATA, elements or combination • Can also be empty elements • Commonly used in early DTD-development stages • Replace with specific content as DTD evolves
6.5 Attribute Declarations • Attribute declaration • Specifies element’s attribute list • Uses ATTLIST attribute list declaration An attribute declaration has the following syntax: <!ATTLIST element-name attribute-name attribute-type default-value> DTD example: <!ATTLIST payment type CDATA "check"> XML example: <payment type="check" /> <payment type=“cash" />
1 <?xml version = "1.0"?> Specify DTD as internal subset 2 Declare element myMessagewith child element message 3 <!-- Fig. 6.7: intro2.xml --> 4 <!-- Declaring attributes --> 5 Declare that attribute idcontain required CDATA 6 <!DOCTYPE myMessage [ 7 <!ELEMENT myMessage ( message )> 8 <!ELEMENT message ( #PCDATA )> 9 <!ATTLIST message id CDATA #REQUIRED> 10 ]> 11 12 <myMessage> 13 14 <messageid ="445"> 15 Welcome to XML! 16 </message> 17 18 </myMessage> Fig. 6.7 Declaring attributes.Specify DTD as internal subsetDeclare element myMessagewith child element messageDeclare that attribute idcontain required CDATA
6.5.1 Attribute Defaults (#REQUIRED, #IMPLIED, #FIXED) • Attribute defaults • Specify attribute’s default value • #IMPLIED • Use (application’s) default value if attribute value not specified [from w3schools: Use the #IMPLIED keyword if you don't want to force the author to include an attribute, and you don't have an option for a default value.] • #REQUIRED • Attribute must appear in element • Document is not valid if attribute is missing • #FIXED • Attribute value is constant • Attribute value cannot differ in XML document
6.6 Attribute Types • Attribute types • Strings (CDATA) • No constraints on attribute values • Except for disallowing <, >, &, ’and ” characters • Tokenized attributes • Constraints on permissible characters for attribute values • Enumerated attributes • Most restrictive • Take only one value listed in attribute declaration
6.6 Attribute Types • Type Value Explanation • String CDATA The value is character data • Enumerated (en1|en2|…) The value must be one from an enumerated list • Tokenized ID The value is a unique id • IDREF The value is the id of another element • IDREFS The value is a list of other ids • NMTOKEN The value is a valid XML name • NMTOKENS The value is a list of valid XML names • ENTITY The value is an entity • ENTITIES The value is a list of entities • NOTATION The value is a name of a notation • xml: The value is a predefined xml value
6.6.1 Tokenized Attribute Type (ID, IDREF, ENTITY, NMTOKEN) • Tokenized attribute types • Restrict attribute values • ID • Uniquely identifies an element • IDREF • Points to elements with ID attribute
1 <?xml version = "1.0"?> 2 Each shipping element has a unique identifier (shipID) 3 <!-- Fig. 6.8: IDExample.xml --> 4 <!-- Example for ID and IDREF values of attributes --> 5 6 <!DOCTYPE bookstore [ 7 <!ELEMENT bookstore ( shipping+, book+ )> Attribute shippedBy points to shipping element by matching shipID attribute 8 <!ELEMENT shipping ( duration )> 9 <!ATTLIST shipping shipID ID #REQUIRED> 10 <!ELEMENT book ( #PCDATA )> 11 <!ATTLIST book shippedBy IDREF #IMPLIED> 12 <!ELEMENT duration ( #PCDATA )> 13 ]> 14 15 <bookstore> 16 <shipping shipID = "s1"> 17 <duration>2 to 4 days</duration> 18 </shipping> 19 Fig. 6.8 XML document with ID and IDREF attribute types. Each shipping element has a unique identifier (shipID)Attribute shippedBy points to shipping element by matching shipID attribute
20 <shipping shipID = "s2"> 21 <duration>1 day</duration> 22 </shipping> 23 Declare book elements with attribute shippedBy 24 <book shippedBy = "s2"> 25 Java How to Program 3rd edition. 26 </book> 27 28 <book shippedBy = "s2"> 29 C How to Program 3rd edition. 30 </book> 31 32 <book shippedBy = "s1"> 33 C++ How to Program 3rd edition. 34 </book> 35 </bookstore> Fig. 6.8 XML document with ID and IDREF attribute types . (Part 2) Declare book elements with attribute shippedBy
Assign shippedBy (line 28) value “s3” Fig. 6.9 Error displayed by XML Validator when an invalid ID is referenced. Outline Assign shippedBy (line 28) value “s3”
6.6.1 Tokenized Attribute Type (ID, IDREF, ENTITY, NMTOKEN) (cont.) • ENTITY tokenized attribute type • Indicate that attribute has entity for its value • Entity declaration <!ENTITY digits “0123456789”> • Entity may be used as follows: <useAnEntity>&digits;</useAnEntity> • Entity reference &digits; replaced by its value <useAnEntity>0123456789</useAnEntity>
1 <?xml version = "1.0"?> Declare entity city that refers to external document elements tour.html 2 3 <!-- Fig. 6.10: entityExample.xml --> 4 <!-- ENTITY and ENTITY attribute types --> NDATA indicates that external-entity content is not XML 5 6 <!DOCTYPE database [ Attribute tour for element company requires ENTITY attribute type 7 <!NOTATION html SYSTEM"iexplorer"> 8 <!ENTITY city SYSTEM"tour.html"NDATA html> 9 <!ELEMENT database ( company+ )> 10 <!ELEMENT company ( name )> 11 <!ATTLIST company tour ENTITY #REQUIRED> 12 <!ELEMENT name ( #PCDATA )> 13 ]> 14 15 <database> 16 <company tour = "city"> 17 <name>Deitel & Associates, Inc.</name> 18 </company> 19 </database> Declare entity city that refers to external document elements tour.html Fig. 6.10 XML document that contains an ENTITY attribute type.Declare entity city that refers to external document elements tour.htmlNDATA indicates that external-entity content is not XMLAttribute tour for element company requires ENTITY attribute type
Fig. 6.10 XML document that contains an ENTITY attribute type.
Replace line 16<company tour ="city">with<company tour ="country"> Fig. 6.11 Error generated by XML Validator when a DTD contains a reference to an undefined entity. Outline Replace line 16
6.6.1 Tokenized Attribute Type (ID, IDREF, ENTITY, NMTOKEN) (cont.) • NMTOKEN tokenized attribute type • “Name token” • Value consists of letters, digits, periods, underscores, hyphens and colon characters (i.e. cannot contain spaces)
6.6.2 Enumerated Attribute Types • Enumerated attribute types • Declare list of possible values for attribute <!ATTLIST person gender ( M | F ) “F”> • Attribute gender can have either value M or F • F is default value
6.7 Conditional Sections • Conditional sections • Include declarations • Keyword INCLUDE • Exclude declarations • Keyword IGNORE • Often used with entities • Parameter entities • Preceded by percent character (%) • Creates entities specific to DTD • Can be used only inside DTD in which they are declared
1 <!-- Fig. 6.12: conditional.dtd --> Entities accept and reject represent strings INCLUDE and IGNORE, respectively 2 <!-- DTD for conditional section example --> 3 Include this element message declaration 4 <!ENTITY % reject "IGNORE"> Exclude this element message declaration 5 <!ENTITY % accept "INCLUDE"> 6 7 <![ %accept; [ 8 <!ELEMENT message ( approved, signature )> 9 ]]> 10 11 <![ %reject; [ 12 <!ELEMENT message ( approved, reason, signature )> 13 ]]> 14 15 <!ELEMENT approved EMPTY> 16 <!ATTLIST approved flag ( true | false ) "false"> 17 18 <!ELEMENT reason ( #PCDATA )> 19 <!ELEMENT signature ( #PCDATA )> Entities accept and reject represent strings INCLUDE and IGNORE, respectively Fig. 6.12 Conditional sections in a DTD.Entities accept and reject represent strings INCLUDE and IGNORE, respectivelyInclude this element message declarationExclude this element message declaration
1 <?xml version = "1.0" standalone = "no"?> 2 3 <!-- Fig. 6.13: conditional.xml --> 4 <!-- Using conditional sections --> 5 6 <!DOCTYPE message SYSTEM "conditional.dtd"> 7 8 <message> 9 <approved flag = "true"/> 10 <signature>Chairman</signature> 11 </message> Fig. 6.13 XML document that conforms to conditional.dtd.
Whitespace Characters • Whitespace • Either preserved or normalized • Depending on context in which it is used
1 <?xml version ="1.0"?> 2 3 <!-- Fig. 6.14 : whitespace.xml --> 4 <!-- Demonstrating whitespace parsing --> Attribute cdata requires CDATA, which preserves whitespace 5 6 <!DOCTYPE whitespace [ Other attributes normalize (do not preserve) whitespace 7 <!ELEMENT whitespace ( hasCDATA, 8 hasID, hasNMTOKEN, hasEnumeration, hasMixed )> 9 10 <!ELEMENT hasCDATA EMPTY> 11 <!ATTLIST hasCDATA cdata CDATA #REQUIRED> 12 13 <!ELEMENT hasID EMPTY> 14 <!ATTLIST hasID id ID #REQUIRED> 15 16 <!ELEMENT hasNMTOKEN EMPTY> 17 <!ATTLIST hasNMTOKEN nmtoken NMTOKEN #REQUIRED> 18 19 <!ELEMENT hasEnumeration EMPTY> 20 <!ATTLIST hasEnumeration enumeration ( true | false ) 21 #REQUIRED> 22 23 <!ELEMENT hasMixed ( #PCDATA | hasCDATA )*> 24 ]> 25 Fig. 6.14 Processing whitespace in an XML document.Attribute hasCDATA requires CDATA, which preserves whitespaceOther attributes normalize (do not preserve) whitespace
26 <whitespace> Whitespace preserved 27 28 <hasCDATA cdata = " simple cdata "/> Whitespace normalized 29 30 <hasID id = " i20"/> 31 32 <hasNMTOKEN nmtoken = " hello"/> 33 34 <hasEnumeration enumeration = " true"/> 35 36 <hasMixed> 37 This is text. 38 <hasCDATA cdata = " simple cdata"/> 39 This is some additional text. 40 </hasMixed> 41 42 </whitespace> Fig. 6.14 Processing whitespace in an XML document. (Part 2) Whitespace preservedWhitespace normalized