220 likes | 343 Views
XML Study-Session: Part II. Validating XML Documents. Objectives: . By completing this study-session, you should be able to: Validate XML documents against a DTD. Understand basic DTD syntax. Create simple DTDs of your own. What is a DTD?. Document Type Definition:
E N D
XML Study-Session: Part II Validating XML Documents
Objectives: By completing this study-session, you should be able to: • Validate XML documents against a DTD. • Understand basic DTD syntax. • Create simple DTDs of your own.
What is a DTD? Document Type Definition: • Standard originally developed for SGML. • Provides a description of the XML document’s structure, and serves as a grammar to specify what tags and attributes are valid in an XML document and in what context they are valid. • E.g. The following is an example DTD statement: <!ELEMENT person (name, e-mail*)>
Why use a DTD? DTDs are used to allow an application to construct valid XML that conforms to that specification. Also: • Self documentation • Portability • Provides defaults for attributes • Entity declaration
Using a DTD in an XML document An XML document may do any of the following: • Refer to a DTD, using its URI. • Include a DTD inline as part of the XML document. • Omit a DTD altogether. Without a DTD, an XML document can be checked for well-formedness, but not for validity. The DTD used by the XML document may be internal or external. An external DTD is stored as an ASCII text .dtd file.
Example: Using a DTD inline <?xml version=‘1.0’ encoding=‘UTF-8’?> <!DOCTYPE Book [ <!ELEMENT Book (Title, Author+, Summary*, Note?)> <!ATTLIST Book ISBN CDATA #REQUIRED section (fiction|nonfiction) ‘fiction’> <!ELEMENT Title(#PCDATA)> <!ELEMENT Author (#PCDATA)> <!ELEMENT Summary(#PCDATA)> <!ENTITY Description ‘A great American novel.’> ]> <Book ISBN=‘1234’> <Title> To Kill a Mockingbird </Title> <Author> Harper Lee </Author> <Summary> &Description; </Summary> </Book>
Doctype declaration The Document Type (Doctype) declaration is used to indicate the DTD used for the document. Syntax may be in any of the following forms: • <!DOCTYPE rootname [DTD]> • <!DOCTYPE rootname SYSTEM URL> • <!DOCTYPE rootname SYSTEM URL [DTD]> • <!DOCTYPE rootname PUBLIC identifier URL> • <!DOCTYPE rootname PUBLIC identifier URL [DTD]>
Example: External DTD The following is an example of an XML document that uses an external DTD: <?xml version=‘1.0’ standalone=‘no’?> <!DOCTYPE Book SYSTEM ‘booklist.dtd’> <Book ISBN=‘4576’> <Title> Moby Dick </Title> <Author> Herman Melville </Author> </Book> The external DTD must be located in the same directory as the XML document.
Example: Using DTDs with URLS The following is an example of an XML document that references an external DTD with an URL: <?xml version=‘1.0’ standalone=‘no’?> <!DOCTYPE Book SYSTEM http://www.somewebsite.com/booklist.dtd> <Book ISBN=‘4576’> <Title> Moby Dick </Title> <Author> Herman Melville </Author> </Book>
Specifying Elements • In the DTD, this is done with the notation: <!ELEMENT elemName elemDefinitionOrType> where elemName is the actual element name, and elemDefinitionOrType indicates whether the content of the content is pure data or a compound type of data and other elements.
Some Element Types • The element type keyword ANY allows the element to contain textual data, nested elements, or any legal XML combination of the two. • The element type keyword #PCDATA indicates textual data, and can be used to store regular character data we want the XML document to handle normally. • The element type keyword EMPTY indicates that the element is always empty.
Nesting elements • To define the allowed nestings within a DTD, the following notation is used: <!ELEMENT elemName (nestedElem, nestedElem, …)> where the order of elements is enforced as a validity constraint within an XML document. • By default, an element can appear exactly once when specified without any modifiers in the DTD.
Recurrence Operators: Recurrence operators can be used to indicate how many times an element must appear in an XML document:
Grouping elements • Often, recurrence occurs for a block or group of elements rather than with a single element. • To signify a group, enclose a set of elements within parantheses. Nested parentheses are acceptable. • In this way, a recurrence operator can then be applied to the group. • E.g. <!ELEMENT groupingExample ((group1Elem1, group1Elem2)+, (group2Elem1, group2Elem2)?)+>
Either Or • In the DTD, an “OR” operator is signified by using |. This allows one thing or the other to occur, and can be used in conjunction with groupings. • E.g. <!ELEMENT aggregateElement (#PCDATA|Element1|Element2)*>
Defining Attributes • Attribute definitions are in the following form: <!ATTLIST enclosingElement attributName attributeType attributeModifier …> • The attributeType keyword CDATA allows an attribute to take on any value, and may represent a comment or additional information about an element. • Another attribute type is an enumeration, where any of the specified values may be used, but any other value for the attribute results in an invalid document. • E.g. <!ATTLIST elementName attribuetName (value1|value2) attributeModifier …>
Attribute Modifiers • We can indicate in the attribute definition whether the attribute is required within an element. • The three modifier keywords are: #IMPLIED, #REQUIRED, and #FIXED. • An implied attribute may be given a value, or left unspecified. • A required attribute must be given a value. • A fixed attribute has a specified value that can never change. The notation for this is: <!ATTLIST elementName attributName #FIXED fixedValue>
Parameter Entities in DTDs • Parameter entities are entities that can only be used in the DTD. • A simple internal parameter entity has the format: <!ENTITY % name definition> • E.g. <?xml version=‘1.0’ standalone=‘yes’> <!DOCTYPE Book [ <!ENTITY % sum “<!ELEMENT Summary (#PCDATA)>”> <!ELEMENT Book (Title, Author+, Summary*, Note?)> <!ELEMENT Title(#PCDATA)> <!ELEMENT Author (#PCDATA)> %sum; ]> …
Parameter Entities in DTDs (contd.) • External parameter entitites can be declared using the following: <!ENTITY % name SYSTEM URI> or <!ENTITY % name PUBLIC identifier URI> • E.g. The following ‘orders.dtd’ file could be created: <!ENTITY % record "(Name, Date, Orders)"> <!ELEMENT Store (Customer|Buyer|Supplier)*> <!ELEMENT Customer %record;> <!ELEMENT Buyer %record;> <!ELEMENT Supplier %record;> <!ELEMENT Name (#PCDATA)> <!ELEMENT Date (#PCDATA)> <!ELEMENT Orders (Product|Price)> <!ELEMENT Product (#PCDATA)> <!ELEMENT Price (#PCDATA)> <!ENTITY % XHTML1 –t.dtd PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd> %XHTML1-t.dtd
Using INCLUDE and IGNORE • We can customize our DTDs using the INCLUDE and IGNORE statements, which have the following syntax: <![INCLUDE [DTD sections]]> <![IGNORE [DTD sections]]> • E.g. In the ‘orders.dtd’ file, add the following lines: <!ENTITY % includer “INCLUDE”> …(same as before)… <![includer; [ <ELEMENT Product_ID (#PCDATA)> <ELEMENT Ship_Date (#PCDATA)> <ELEMENT Tax (#PCDATA)> ]]>
Example: Using the XHTML 1.1 DTD • The XHTML 1.1 DTD is a DTD driver which includes various XHTML 1.1 modules (i.e. DTD sections) using parameter entities. • E.g. <!--Tables Module……………………………--> <ENTITY % xhtml-table.module “INCLUDE”> <![%xhtml-table.module;[ <ENTITY % xhtml-table.mod PUBLIC “-//W3C//ELEMENTS XHTML 1.1 Tables 1.0//EN” “xhtml11-table-1.mod”> %xhtml-table.mod;]]> • The above allows us to customize the XHTML 1.1 DTD to include/exclude support for tables.
Next session: Parsing XML Documents • Parsing techniques • Writing your own XML applications