310 likes | 430 Views
Creating Document Type Definitions (DTDs). Ellen Pearlman Eileen Mullin Programming the Web Using XML. Learning Objectives. Understanding the function and syntax of a DTD Learning what distinguishes internal and external subsets of DTDs Defining the order and frequency of elements
E N D
Creating Document Type Definitions (DTDs) Ellen Pearlman Eileen Mullin Programming the Web Using XML
Learning Objectives • Understanding the function and syntax of a DTD • Learning what distinguishes internal and external subsets of DTDs • Defining the order and frequency of elements • Understanding how to create attributes and apply them to elements • Exploring how entity declarations and notation elements work.
XML’s Schema Languages • There are two common schema languages used in XML. • The first is the Document Type Definition (DTD), which defines the grammar and vocabulary of your markup language. The DTD lists everything that a parser needs to know in order to display and process your XML document. • The second common schema language is XML Schema, which is well-suited for more complex XML documents.
Introducing DTDs • A DTD uses grammar and structure to bring order to the elements of a markup language defined by XML. • To stipulate grammatical rules, a DTD compares a set of expressions against predefined patterns in an XML document to figure out whether that document is valid or not • The matching is very nitpicky - for example, pressing the space bar twice when the processor is looking for a single blank space is a mismatch.
Checking for Validation • As you gain experience writing XML documents, consider using a validating parser – a parser that refers to your DTD in scanning your documents for errors. • An interactive validation process pops up warnings and error messages as you're constructing an XML document. • With a parser that performs batch validation, you submit a complete XML document for error-checking and then see a complete report listing all errors and warnings that may apply.
Proofread Your Own Code in Addition to Validating It • While using a validating parser is a good way to catch mistakes, it’s not a substitute for your own careful review of your code. • For example, your parser will note if a song title is missing from an XML document containing lyrics if the songbook DTD states that every song must have a title. If, however, an entire verse is omitted, the parser will not complain as long as the DTD contains no rules governing how many verses are in a song.
Using DTD Syntax • A DTD consists of a number of declarations. Each declaration is assigned to one of the following categories: • ELEMENT, for defining tags • ATTLIST, for specifying attributes in your tags • ENTITY, for identifying sources of data • NOTATION, for defining data types for non-XML data
Internal and External DTDs • The declarations that make up a DTD can be stored either at the beginning of an XML document or in an external file. Each method has its own advantages. • When the declarations in a DTD are stored with the document it describes, you can easily move the file to a new location and have the DTD travel with it. • However, it’s often most efficient to create an external DTD that you can reference from many different files.
Writing Element Declarations • An element declaration is used to define a new tag and specify what kind of content it can contain. • The declaration begins with an opening bracket (<) and exclamation point (!) followed by the keyword ELEMENT. • The name of the element you’re declaring follows next; it’s often called a generic identifier. • The rest of the tag states what content is allowed in the element, also known as the content specification.
Model Groups • A model group defines an element that may contain tokens. These are other elements (called child elements) or an element that contains a combination of both document text and child elements. • The element declaration: <!ELEMENT author (firstname, lastname)> creates a tag that contains other tags as shown below: <author> <firstname>Marge</firstname> <lastname>Piercy</lastname> </author>
Controlling Quantity • You can impose requirements about how often each element can be used. • For example, some books may have more than one author. • Accordingly, you could add a requirement that while an element called book must have at least one author, but could have more: <!ELEMENT book (title, author+, publisher)>
Controlling Order • When you define two or more tokens in an element, you can also control the order in which the child elements should appear. • The declaration for an element called flight might appear as follows: <!ELEMENT flight (airline, number, from, to, depdate)> could be used to generate the following code fragment: <flight> <airline>United</airline> <number>21</number> <from>LGA</from> <to>SFO</to> <depdate>July 27</depate> </flight>
Controlling Order (2) • Another way you can control which elements appear is through a choice connector, which indicates that only one of a selection of elements can be used. Instead of commas, the tokens are separated by a pipe character ( | ) in an element declaration: <!ELEMENT season (winter | spring | summer | fall)> • This tagging might be put to use in a clothes catalog, for instance: <season> <winter>…</winter> </season>
Free Text • Any elements that may contain freeform document text are indicated by the keyword PCDATA preceded by a hash mark (#) in the element declaration. Some examples include: <!ELEMENT question (#PCDATA)> <!ELEMENT response (#PCDATA)> • It's also possible to define elements that contain free text as well as child elements: <!ELEMENT scrapbook (#PCDATA | caption | quote)*>
Parameters for Attribute List Declarations • Attribute name. The name of your new attribute is subject to the same restrictions as element names. • Attribute type. An attribute’s type can restrict the range of values that the attribute can hold. It can also identify special attributes that may be of importance to the parser. • Required or default values. The attribute is set to any value entered here if and when the document author fails to supply another.
Declarations in Attribute Lists • The declarations in attribute lists identify the name, data type, and default value (if there is one) of each attribute associated with a given element type: • Name. This defines the full set of attributes that pertain to a given element type. • Type. Establishes type constraints for the attribute. • Default value. Provides a value to used for that attribute unless it is changed.
Writing Parameter Entity Declarations • Parameter entities are used exclusively for assisting with constructing a DTD. • You can use a parameter entity to assign a short name for a model group that’s in common use throughout your documents. • You identify a parameter entity by preceding its name in the declaration with a percentage sign (%) and a blank space: <!ENTITY % tablecell "(p | ul | ol)*" >
Writing Notation Declarations • In an XML document, an entity or element is permitted to contain non-XML format data. When this happens, the element declaration has to specify which formats may be embedded via a definition in a notation declaration. • For example, an XML-based photo catalog might contain graphics file formats (like PNG or TIF) that your processor doesn't recognize. When that happens, you must define the notation: <!NOTATION PNG SYSTEM "photoshop.exe">
Referencing DTDs • The declarations that make up a DTD can be stored in one of two locations. • At the top of each of your XML documents. • In a separate data file that’s simply referenced from the XML document. • A DOCTYPE declaration is used in both cases. When the DTD declarations are enclosed internally within an XML document, the DOCTYPE declaration lists encloses the list of declarations within a set of square brackets ([ ]).
Referencing an Internal DTD <?xml version="1.0"?> <!DOCTYPE message [ <!ELEMENT message (to,from,date,time,subject,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT date (#PCDATA)> <!ELEMENT time (#PCDATA)> <!ELEMENT subject (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> <message> <to>Ross</to> <from>Bunny</from> <date>July 25</ date > <time>11:15 a.m.</time> <subject>Reminder</ subject > <body>Please don’t forget to tape Survivor tonight!</body> </message>
Referencing an External DTD <?xml version="1.0"?> <!DOCTYPE message SYSTEM "message.dtd"><message> <to>Ross</to> <from>Bunny</from> <date>July 25</ date> <time>11:15 a.m.</time> <subject>Reminder</ subject> <body>Please don't forget to tape Survivor tonight!</body> </message> message.dtd <!ELEMENT message (to,from,date,time,subject,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT date (#PCDATA)> <!ELEMENT time (#PCDATA)> <!ELEMENT subject (#PCDATA)> <!ELEMENT body (#PCDATA)>
An Internal Subset of Declarations Is Processed Before an External Subset
Conditional Sections • Another way you can control whether or not certain declarations are accessible to any given document is with a conditional section. • This includes instructions that say whether or not a particular set of declarations within an external subset are available. • The declarations contained within such a section are only available if all conditions are met.
Creating External DTD Subsets • In order to design and maintain a consistent look across a large body of documents, you need a way to manage your DTD’s declarations centrally. • To handle this, you can store some or all of your declarations in a separate file that’s named with a .dtd extension. This kind of file is known as an external subset. <!DOCTYPE termpaper SYSTEM http://www.genuineclass.com/xml/dtd/termpaper.dtd" [ <!-- the rest of the termpaper DTD local to this document appears here, if needed--> <!….> ]>
Using Internal DTD Subsets • After you define an external DTD, you may not have many declarations left to include in an internal subset. • These leftover declarations typically include entity declarations for important phrases or images that appear frequently within this particular document but not others. • An internal subset is also a good place to define special characters used here that aren’t needed in the majority of your documents.
Using Conditional Sections with Entities • You can control whether to include or omit declarations by using a conditional section in a DTD. The keyword INCLUDE specifies when a certain declaration should be included, while the keyword IGNORE dictates when a declaration is excluded: <![INCLUDE [ <!ELEMENT suggestion ( #PCDATA )> ] ] > • Similarly, the following conditional section tells the parser to exclude the declaration of an element called graffiti: <![IGNORE[ <!ELEMENT graffiti ( #PCDATA )> ] ] >