Creating Document Type Definitions (DTDs)

Creating Document Type Definitions (DTDs) Ellen Pearlman Eileen Mullin Programming the Web Using XML

Learning Objectives • Understanding the function and syntax of a DTD • Learning what distinguishes internal and external subsets of DTDs • Defining the order and frequency of elements • Understanding how to create attributes and apply them to elements • Exploring how entity declarations and notation elements work.

XML’s Schema Languages • There are two common schema languages used in XML. • The first is the Document Type Definition (DTD), which defines the grammar and vocabulary of your markup language. The DTD lists everything that a parser needs to know in order to display and process your XML document. • The second common schema language is XML Schema, which is well-suited for more complex XML documents.

Introducing DTDs • A DTD uses grammar and structure to bring order to the elements of a markup language defined by XML. • To stipulate grammatical rules, a DTD compares a set of expressions against predefined patterns in an XML document to figure out whether that document is valid or not • The matching is very nitpicky - for example, pressing the space bar twice when the processor is looking for a single blank space is a mismatch.

Checking for Validation • As you gain experience writing XML documents, consider using a validating parser – a parser that refers to your DTD in scanning your documents for errors. • An interactive validation process pops up warnings and error messages as you're constructing an XML document. • With a parser that performs batch validation, you submit a complete XML document for error-checking and then see a complete report listing all errors and warnings that may apply.

Tracking Validated Files with TIBCO TurboXML

Example: An Online XML Validator

Proofread Your Own Code in Addition to Validating It • While using a validating parser is a good way to catch mistakes, it’s not a substitute for your own careful review of your code. • For example, your parser will note if a song title is missing from an XML document containing lyrics if the songbook DTD states that every song must have a title. If, however, an entire verse is omitted, the parser will not complain as long as the DTD contains no rules governing how many verses are in a song.

Using DTD Syntax • A DTD consists of a number of declarations. Each declaration is assigned to one of the following categories: • ELEMENT, for defining tags • ATTLIST, for specifying attributes in your tags • ENTITY, for identifying sources of data • NOTATION, for defining data types for non-XML data

Internal and External DTDs • The declarations that make up a DTD can be stored either at the beginning of an XML document or in an external file. Each method has its own advantages. • When the declarations in a DTD are stored with the document it describes, you can easily move the file to a new location and have the DTD travel with it. • However, it’s often most efficient to create an external DTD that you can reference from many different files.

Writing Element Declarations • An element declaration is used to define a new tag and specify what kind of content it can contain. • The declaration begins with an opening bracket (<) and exclamation point (!) followed by the keyword ELEMENT. • The name of the element you’re declaring follows next; it’s often called a generic identifier. • The rest of the tag states what content is allowed in the element, also known as the content specification.

Model Groups • A model group defines an element that may contain tokens. These are other elements (called child elements) or an element that contains a combination of both document text and child elements. • The element declaration: <!ELEMENT author (firstname, lastname)> creates a tag that contains other tags as shown below: <author> <firstname>Marge</firstname> <lastname>Piercy</lastname> </author>

Controlling Quantity • You can impose requirements about how often each element can be used. • For example, some books may have more than one author. • Accordingly, you could add a requirement that while an element called book must have at least one author, but could have more: <!ELEMENT book (title, author+, publisher)>

Controlling Order • When you define two or more tokens in an element, you can also control the order in which the child elements should appear. • The declaration for an element called flight might appear as follows: <!ELEMENT flight (airline, number, from, to, depdate)> could be used to generate the following code fragment: <flight> <airline>United</airline> <number>21</number> <from>LGA</from> <to>SFO</to> <depdate>July 27</depate> </flight>

Controlling Order (2) • Another way you can control which elements appear is through a choice connector, which indicates that only one of a selection of elements can be used. Instead of commas, the tokens are separated by a pipe character ( | ) in an element declaration: <!ELEMENT season (winter | spring | summer | fall)> • This tagging might be put to use in a clothes catalog, for instance: <season> <winter>…</winter> </season>

Ordering Elements : Sequence Connector and Choice Connector

Free Text • Any elements that may contain freeform document text are indicated by the keyword PCDATA preceded by a hash mark (#) in the element declaration. Some examples include: <!ELEMENT question (#PCDATA)> <!ELEMENT response (#PCDATA)> • It's also possible to define elements that contain free text as well as child elements: <!ELEMENT scrapbook (#PCDATA | caption | quote)*>

Attribute List Declarations

Parameters for Attribute List Declarations • Attribute name. The name of your new attribute is subject to the same restrictions as element names. • Attribute type. An attribute’s type can restrict the range of values that the attribute can hold. It can also identify special attributes that may be of importance to the parser. • Required or default values. The attribute is set to any value entered here if and when the document author fails to supply another.

Declarations in Attribute Lists • The declarations in attribute lists identify the name, data type, and default value (if there is one) of each attribute associated with a given element type: • Name. This defines the full set of attributes that pertain to a given element type. • Type. Establishes type constraints for the attribute. • Default value. Provides a value to used for that attribute unless it is changed.

Writing Parameter Entity Declarations • Parameter entities are used exclusively for assisting with constructing a DTD. • You can use a parameter entity to assign a short name for a model group that’s in common use throughout your documents. • You identify a parameter entity by preceding its name in the declaration with a percentage sign (%) and a blank space: <!ENTITY % tablecell "(p | ul | ol)*" >

Writing Notation Declarations • In an XML document, an entity or element is permitted to contain non-XML format data. When this happens, the element declaration has to specify which formats may be embedded via a definition in a notation declaration. • For example, an XML-based photo catalog might contain graphics file formats (like PNG or TIF) that your processor doesn't recognize. When that happens, you must define the notation: <!NOTATION PNG SYSTEM "photoshop.exe">

Referencing DTDs • The declarations that make up a DTD can be stored in one of two locations. • At the top of each of your XML documents. • In a separate data file that’s simply referenced from the XML document. • A DOCTYPE declaration is used in both cases. When the DTD declarations are enclosed internally within an XML document, the DOCTYPE declaration lists encloses the list of declarations within a set of square brackets ([ ]).

Referencing an Internal DTD <?xml version="1.0"?> <!DOCTYPE message [ <!ELEMENT message (to,from,date,time,subject,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT date (#PCDATA)> <!ELEMENT time (#PCDATA)> <!ELEMENT subject (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> <message> <to>Ross</to> <from>Bunny</from> <date>July 25</ date > <time>11:15 a.m.</time> <subject>Reminder</ subject > <body>Please don’t forget to tape Survivor tonight!</body> </message>

Referencing an External DTD <?xml version="1.0"?> <!DOCTYPE message SYSTEM "message.dtd"><message> <to>Ross</to> <from>Bunny</from> <date>July 25</ date> <time>11:15 a.m.</time> <subject>Reminder</ subject> <body>Please don't forget to tape Survivor tonight!</body> </message> message.dtd <!ELEMENT message (to,from,date,time,subject,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT date (#PCDATA)> <!ELEMENT time (#PCDATA)> <!ELEMENT subject (#PCDATA)> <!ELEMENT body (#PCDATA)>

An Internal Subset of Declarations Is Processed Before an External Subset

Conditional Sections • Another way you can control whether or not certain declarations are accessible to any given document is with a conditional section. • This includes instructions that say whether or not a particular set of declarations within an external subset are available. • The declarations contained within such a section are only available if all conditions are met.

Creating External DTD Subsets • In order to design and maintain a consistent look across a large body of documents, you need a way to manage your DTD’s declarations centrally. • To handle this, you can store some or all of your declarations in a separate file that’s named with a .dtd extension. This kind of file is known as an external subset. <!DOCTYPE termpaper SYSTEM http://www.genuineclass.com/xml/dtd/termpaper.dtd" [  <!….> ]>

Using Internal DTD Subsets • After you define an external DTD, you may not have many declarations left to include in an internal subset. • These leftover declarations typically include entity declarations for important phrases or images that appear frequently within this particular document but not others. • An internal subset is also a good place to define special characters used here that aren’t needed in the majority of your documents.

Using Conditional Sections with Entities • You can control whether to include or omit declarations by using a conditional section in a DTD. The keyword INCLUDE specifies when a certain declaration should be included, while the keyword IGNORE dictates when a declaration is excluded: <![INCLUDE [ <!ELEMENT suggestion ( #PCDATA )> ] ] > • Similarly, the following conditional section tells the parser to exclude the declaration of an element called graffiti: <![IGNORE[ <!ELEMENT graffiti ( #PCDATA )> ] ] >

The End

Creating Document Type Definitions (DTDs)

Creating Document Type Definitions (DTDs)

Presentation Transcript

DTD (Document Type Definition)

Creating Your Document

Use Case Document Definitions

Creating Solution Architecture Templates

Document Type Definition (DTD)

Document Type definition

Document Instances and Grammars

2 Document Instances and Grammars

Type de document

XML Data

Introducing XHTML: Module C: Document Structure

XML: Document Type Definitions

Document Type Definition (DTD)

Document Type Definition

Document Type Definitions (DTD) Basic Valid XML

2 Document Instances and Grammars

Document Type Definitions

3. Document Type Definitions(DTDs)

Document Type Definition DTDs

CREATING A DOCUMENT

XML: Document Type Definitions

XML Validation II Advanced DTDs