520 likes | 660 Views
Tutorial 3: XML. Creating a Valid XML Document. Creating a Valid Document. You validate documents to make certain necessary elements are never omitted. For example, each customer order should include a customer name, address, and phone number.
E N D
Tutorial 3:XML Creating a Valid XML Document
Creating a Valid Document • You validate documents to make certain necessary elements are never omitted. • For example, each customer order should include a customer name, address, and phone number. • A document is validated to prevent errors in their content or structure. • An XML document can be validated using either DTDs (Document Type Definitions) or schemas.
<customers> <customer custID="cust201" custType="home"> <name title="Mr.">David Lynn</name> <address> <![CDATA[ 211 Fox Street Greenville, NH 80021 ]]> </address> <phone>(315) 555-1812</phone> <email>dlynn@nhs.net</email> <orders> <order orderID="or10311" orderBy="cust201"> <orderDate>8/1/2008</orderDate> <items> <item itemPrice="599.95">DCT5Z</item> <item itemPrice="199.95">SM128</item> <item itemPrice="29.95" itemQty="2">RCL</item> </items> </order> </customer>
DTD statements are inserted here the root element the root element of the document must match the root element listed in the DOCTYPE declaration Writing the Document Type Declaration <!DOCTYPE customers [ ]> <customers>
Declaring a DTD A DTD can be used to: • ensure all required elements are present • prevent undefined elements from being used • enforce a specific data structure • specify the use of attributes and define their possible values • define default values for attributes • describe how the parser should access non-XML or non-textual content
Declaring a DTD • A document type definition is a collection of rules or declarations that define the content and structure of the document. • A document type declaration attaches those rules to the document’s content. • You create a DTD by first entering a document type declaration into your XML document.
Declaring a DTD • While there can only be one DTD per XML document, it can be divided into two parts: an internal subset and an external subset. • An internal subset is declarations placed in the same file as the document content. • An external subset is located in a separate file.
Declaring a DTD To declare an internal DTD subset, use: <!DOCTYPE root [ declarations ]> • Where root is the name of the document’s root element, and declarations are the statements that comprise the DTD.
To declare an external DTD subset with a system or public location, use: <!DOCTYPE root SYSTEM “uri”> or <!DOCTYPE root PUBLIC “id” “uri”> • idis a text string that tells an application how to locate the external subset • uri is the location and filename of the external subset • Unless your application requires a public identifier, you should use the SYSTEM location form.
A DOCTYPE declaration can indicate both an external and an internal subset. The syntax is: <!DOCTYPE root SYSTEM “uri” [ declarations ]> or <!DOCTYPE root PUBLIC “id” “uri” [ declarations ]>
Declaring a DTD • The real power of XML comes from an external DTD that can be shared among many documents. • If a document contains both an internal and an external subset, the internal subset takes precedence over the external subset if there is a conflict between the two. • This way, the external subset would define basic rules for all the documents, and the internal subset would define those rules specific to each document.
Declaring Document Elements • In a valid document, every element must be declared in the DTD. • An element (type) declaration specifies the name of the element and indicates what kind of content the element can contain. The syntax is: <!ELEMENT element content-model> • Where element is the name of the element and content-model specifies what type of content the element contains. • The element name is case sensitive
Five different types of element contentfor content-model • ANY - No restrictions on the element’s content. • EMPTY - The element cannot store any content. • #PCDATA - The element can only contain parsed character data. • Elements - The element can only contain child elements. • Mixed - The element contains both parsed character data and child elements.
ANY Content: The declared element can store any type of content • The syntax is: <!ELEMENT element ANY> • Example: <!ELEMENT products ANY> • Any of the following would satisfy the above declaration: • <products>SLR100 Digital Camera</productgs> • <products> <name>SLR100</name> <type>Digital Camera</type></products>
EMPTY content: This is reserved for elements that store no content • The syntax is: <!ELEMENT element EMPTY> • Example: <!ELEMENT img EMPTY> • The following would satisfy the above declaration: • <img />
#PCDATA Content: can store parsed character data • The syntax is: <!ELEMENT element (#PCDATA)> • <!ELEMENT name (#PCDATA)>would permit the following element in an XML document: • <name>Lea Ziegler</name> • PCDATA element does not allow for child elements
Working with Child Elements • The syntax is: <!ELEMENT element (children)> • Where element is the parent element and children is a listing of its child elements. • The declaration <!ELEMENT customer (phone)> indicates that the following would be invalid:<customer> <name>Lea Ziegler</name> <phone>555-2819</phone>
Working with Child Elements • To declare the order of child elements, use: <!ELEMENT element (child1, child2, …)> • Where child1, child2, … is the order in which the child elements must appear within the parent element. • Thus, <!ELEMENT customer (name, phone, email)>indicates the customer element should contain three child elements named name, phone, email.
Working with Child Elements • To allow for a choice of child elements, use: <!ELEMENT element (child1 | child2 | …)> • where child1, child2, etc. are the possible child elements of the parent element. • <!ELEMENT customer (name | company)> • allows the customer element to contain either the name element or the company element. • <!ELEMENT customer ((name | company), phone, email)>
Modifying Symbols • A modifying symbol specifies the number of occurrences of each element: • ? allows zero or one of the item. • + allows one or more of the item. • * allows zero or more of the item. • Modifying symbols can be applied within sequences or choices. They can also modify entire element sequences or choices by placing the character immediately following the closing parenthesis of the sequence or choice.
Modifying Symbols • <!ELEMENT customers (customer+)>indicates that the customers element must contain at least one element named customer. • <!ELEMENT order (orderDate, items)+> indicates that the child element sequence (orderDate, items) can be repeated one or more times within each order element. • <!ELEMENT customer (name, address, phone, email?, orders)> allows the customer element to contain zero or one email elements.
Working with Mixed Content • Mixed content elements contain both parsed character data and child elements. The syntax is: <!ELEMENT element (#PCDATA | child1 | child2 | … )*> • The parent element can contain character data or any number of the specified child elements, or it can contain no content at all. • It is better not to work with mixed content if you want a tightly structured document.
Declaring Element Attributes • For a document to be valid, all the attributes associated with elements must also be declared. To enforce attribution properties, you must add an attribute-list declaration to the document’s DTD.
Declaring Element Attributes The attribute-list declaration: • Lists the names of all attributes associated with a specific element • Specifies the data type of the attribute • Indicates whether the attribute is required or optional • Provides a default value for the attribute, if necessary
Declaring Element Attributes The syntax to declare a list of attributes is: <!ATTLIST element attribute1 type1 default1 attribute2 type2 default2 attribute3 type3 default3 … > • Where element is the name of the element associated with the attributes, attribute is the name of an attribute, type is the attribute’s data type, and default indicates whether the attribute is required and whether it has a default value.
Declaring Element Attributes • Attribute-list declaration can be placed anywhere within the document type declaration, although it is easier if they are located adjacent to the declaration for the element with which they are associated.
Working with Attribute Types Attribute values can consist only of character data, but you can control the format of those characters. Three general categories of attribute values are: • CDATA can contain any character except those reserved by XML • Enumerated types are attributes that are limited to a set of possible values • Tokenized types are text strings that follow certain rules for the format and content
CDATA • The syntax is:<!ATTLIST elementattribute CDATA default> • Examples:<!ATTLIST item itemPrice CDATA><!ATTLIST item itemQty CDATA> • Any of the following attributes values are allowed:<item itemPrice=“29.95”> . . . </item><item itemPrice=“$29.95”> . . . </item><item itemPrice=“£29.95”> . . . </item>
Enumerated Types • The general form for an enumerated type is:<!ATTLIST elementattribute (value1 | value2 | value3 | …) default >where value1, value2, . . are allowed values • Under the declaration below:<!ATTLIST customer custType(home| business). . . >any custType attribute whose value is not “home” or “business” causes parsers to reject the document as invalid.
Working with Attribute Types • Another type of enumerated attribute is notation. It associates the value of the attribute with a <!NOTATION> declaration located elsewhere in the DTD. The notation provides information to the XML parser about how to handle non-XML data.
Tokenized Types are character strings that follow certain rules for format and content • To declare an attribute as a tokenized type, use: attribute token • DTDs support seven tokens: IDs, IDREF, IDREFS, NMTOKEN, NMTOKENS, ENTITY, ENTITIES • An ID is used when an attribute value must be unique within an document. For example: <!ATTLIST customer custID ID . . . > • This ensures each customer will have a unique ID.
IDREF token • IDREF token must have a value equal to the value of an id attribute. This enables an XML document to contain cross-references between one element and another. • <!ATTLIST elementattribute IDREF default> • <!ATTLIST order orderBy IDREF . . .>
Attribute Defaults There are four possible defaults: • #REQUIRED: the attribute must appear with every occurrence of the element. • #IMPLIED: The attribute is optional. • An optional default value: A validated XML parser will supply the default value if one is not specified. • #FIXED: The attribute is optional but if one is specified, it must match the default.
Validating a Document with SMLSpy • XMLSpy is an XML development environment created by Altova, which is used for designing and editing professional applications involving XML, XML Schema, and other XML-based technologies. • Install and use the XMLSpy Home Edition, a free application which can be downloaded from the Altova Web site at http://www.altova.com/
Introducing Entities Entities are storage units for a document’s content. The most fundamental entity is the XML document itself and is known as the document entity. Entities can also refer to: • a text string • a DTD • an element or attribute declaration • an external file containing character or binary data
Working with Entities Entities can be declared in a DTD. How to declare an entity depends on how it is classified. There are three factors involved in classifying entities: • The content of the entity • How the entity is constructed • Where the definition of the entity is located
General Parsed Entities • To create an internal parsed entity, use: <!ENTITY entity “value”> • Whereentityis the name assigned to the entity and value is the entity’s value. • For example, to store the product description for the Tapan digital camera, use:<!ENTITY DCT5Z “Tapan Digital Camera 5 Mpx – zoom”> or <!ENTITY DCT5Z “<desc>Tapan Digital Camera 5 Mpx – zoom</desc>”>
General Parsed Internal Entities • After an entity is declared, it can be referenced anywhere within the document. The syntax is:&entity • For example, <item>&DCT5Z</item>is interpreted as<item>Tapan Digital Camera 5 Mpx – zoom</item>
General Parsed External Entities • For longer text strings, it is preferable to place the content in an external file. To create an external parsed entity, use: <!ENTITY entity SYSTEM “uri”> • For example, in the declaration: <!ENTITY DCT5Z SYSTEM “description.xml”an entity named “DCT5Z” gets its value from the description.xml file
entity name entity value Declare parsed entities in the codes.dtd filefor the product codes in the orders.xml documentation <!ENTITY DCT5Z "Tapan DIgital Camera 5 Mpx - zoom"> <!ENTITY SM128 "SmartMedia 128MB Card"> <!ENTITY RCL "Rechargeable Lithium Ion Battery"> <!ENTITY BCE4L "Battery Charger 4pt Lithium"> <!ENTITY WBC500 "WebNow Webcam 500"> <!ENTITY RCA "Rechargeable Alkaline Batgtery"> <!ENTITY SCL4C "Linton Flatbed Scanner 4C">
Parameter Entities • Parameter entities are used to store the content of a DTD. For internal parameter entities, the syntax is: <!ENTITY % entity “value”> • For external parameter entities, the syntax is: <!ENTITY % entity SYSTEM “uri”> • Once a parameter has been declared, you can add a reference to it within the DTD using:%entity
Add a parameter entity to the DTD within the orders.xml file to load the contents of the codes.dtd file <!DOCTYPE customers[ . <!ENTITY % itemCodes SYSTEM "codes.dtd"> %itemCodes;]> <customers> . <orders> <order orderID="or10311" orderBy="cust201"> <orderDate>8/1/2008</orderDate> <items> <item itemPrice="599.95">&DCT5Z</item> <item itemPrice="199.95">&SM128</item> <item itemPrice="29.95" itemQty="2">&RCL</item> </items> </order>
<orders> <order orderID="or10311" orderBy="cust201"> <orderDate>8/1/2008</orderDate> <items> <item itemPrice="599.95" itemQty="1">Tapan DIgital Camera 5 Mpx – zoom</item> <item itemPrice="199.95" itemQty="1">SmartMedia 128MB Card</item> <item itemPrice="29.95" itemQty="2">Rechargeable Lithium Ion Battery</item> </items> </order>
Parameter Entities • Parameter entity references can only be placed where a declaration would normally occur, such as an internal or external DTD. • Parameter entities used with an internal DTD do not offer any time or effort savings. • However, an external parameter entity can allow XML to use more than one DTD per document by combining declarations from multiple DTDs.
Unparsed Entities • You need to create an unparsed entity in order to reference binary data such as images or video clips, or character data that is not well formed. • The unparsed entity includes instructions for how the unparsed entity should be treated. • A notation is declared that identifies a resource to handle the unparsed data.