280 likes | 829 Views
Defining XML The Document Type Definition. Document Type Definition. text syntax for defining elements of XML attributes (and possibly default values) structure <?xml … standalone = “no”… ?> implies that an external definition exists and may be required to properly understand the content.
E N D
Document Type Definition • text syntax for defining • elements of XML • attributes (and possibly default values) • structure • <?xml … standalone = “no”… ?> • implies that an external definition exists and may be required to properly understand the content
Why do we need DTDs? • Define classes of xml documents • For particular applications • Agreement on data and structure • Validate xml data • DTD is used to check structure • Document an xml class • DTD provides complete information about an xml class
linking an XML file to a DTD • a document type declaration is added to the xml <!DOCTYPE message SYSTEM “myDTD.dtd”> XML file DTD DOCTYPE link message.xml myDTD.dtd
What Is a DTD? • Defines a type of xml document • What elements are allowed? • What attributes do they have? • How can they be structured? • DTD is in text format • Usually external to the xml data • Linked by a document type declaration • May be included in the xml data file
Element type declarations <!ELEMENT myElement (#PCDATA)> content that the element can have the “element definition” element name of the element being defined #PCDATA = parsed character data
Example <!ELEMENT message ( #PCDATA )> One line of text, stored in messageML.dtd Example of a message document conforming to this DTD <?xml version = “1.0” ?> <!DOCTYPE message SYSTEM ”messageML.dtd"> <message> Welcome to XML! </message>
Internal DTD Example <?xml version = “1.0” ?> <!DOCTYPE message [ <!ELEMENT message (#PCDATA)> ]> <message> Welcome to XML! </message>
Defining structure • Element declarations define the content of elements • Content can be text or other elements • Content defines structure • How are the elements nested? • How many elements can be included? • What order do elements come in?
Defining structure <!ELEMENT classroom (teacher, student)> a classroom contains exactly one teacher followed by exactly one student <!ELEMENT dessert (iceCream ¦ pastry)> a dessert contains either one iceCream or one pastry, but not both <!ELEMENT album (track+)> an album contains one or more tracks
occurrence indicators Plus sign (+) Element will appear 1 to many times <!ELEMENT album (track+)> Asterisk (*) Element will appear 0 to many times <!ELEMENT library (book*)> Question mark (?) Element will appear 0 to 1 times <!ELEMENT seat (person?)>
DTD Example 1 <!ELEMENT class (number, (instructor ¦ assistant+), (credit ¦ nocredit) )> a class must contain a number followed by either an instructor or one or more assistants followed by either a credit or a nocredit <class> <number>CM4003</number> <instructor>John McCall</instructor> <credit>15</credit> </class>
DTD Example 2 <!ELEMENT donutBox (jam?, lemon*, ((cream | sugar)+ | iced)) a donutBox contains 0 or 1 jam followed by 0 to many lemon followed by either one to many cream or sugar or one iced <donutBox> <jam>raspberry</jam> <lemon>sour</lemon> <lemon>half-sour</lemon> <iced>chocolate</iced </donutBox> <donutBox> <iced>pink</iced> </donutBox>
DTD Example 3 <!ELEMENT farm (farmer+, (dog* | cat?), pig*, (goat | cow)?, (chicken+ | duck*) )> <farm> <farmer>Farmer Maggot</farmer> <cat>Tiddles</cat> <duck>Donald</duck> </farm>
DTD Example 4 mixed content (narrative XML) <!ELEMENT paragraph (#PCDATA|name|profession|date|irony)*> A <paragraph> element may contain any combination of <name>, <profession> or <date> elements interspersed with parsed character data. <paragraph> Today’s date is <date month=“October” day=“1”/> and <name>John McCall</name>, a <profession>lecturer</profession> is delivering a <irony>scintillating</irony> XML lecture.</paragraph>
Defining attributes • attributes assigned to elements using the <!ATTLIST …> instruction • ATTLIST defines • Which element the attribute belongs to • The name of the attribute • The values the attribute can take • Possible default values • Whether the attribute MUST be present or not
Attribute values • In HTML all attributes are text • DTDs support 10 attribute types • Most common are: • CDATA (literal text) • ID (unique identifier) • NMTOKEN (“no whitespace”) • Enumeration (of all possible values)
Conditions on attributes • #REQUIRED • the attribute must be given a value in the XML • #IMPLIED • the attribute may be omitted from the XML • #FIXED • the value of the attribute is fixed and defined in the DTD • literal • a default value is supplied literally in the DTD
Example attribute declarations <!ELEMENT pig (PCDATA)> <!ATTLIST pig weight CDATA #REQUIRED> <!ATTLIST pig id_code ID #REQUIRED> <!ATTLIST pig name NMTOKEN #IMPLIED> <!ATTLIST pig sex (M | F) “F”> <!ATTLIST pig canFly FIXED “no”> <pig weight = “1000kg” id_code = “pig017”> Porky </pig>
entities • used to represent text that would cause parsing problems • < represents < • & represents & • > represents > • " represents “ • ' represents ‘
defining entities • <!ENTITY label replacementText> • <!ENTITY super supercallifragilisticexpialidocious> • now &super; is replaced in the XML (or in attribute values) by supercallifragilisticexpialidocious
CDATA or PCDATA? • PCDATA • Parsed Character DATA • will be parsed for entities • CDATA • Character DATA • Will NOT be parsed • CDATA sections are sometimes included in xml to include “literal” sections of code
Writing a CDATA section <!CDATA[ Hi! I’m a CDATA section! I can include anything that would normally upset the parser: <?<<< &&&;; ><></> hahahahahahaha!!! The only thing I have to avoid is a double square closing bracket, which means the CDATA has ended. ]]>
Validation of xml • Validation means checking that an xml document conforms to its DTD • Adds security to automatic processing • Allows free machine-machine exchange of xml • Applied before manipulating xml • See XSLT, SAX, DOM later
Well-formed vs valid • Well-formed xml • The data obeys the xml syntax rules • Valid xml • The data is well-formed xml • The data has a DTD • The data conforms to the DTD • xml data may be well-formed but invalid
xml parser types • validating parser • checks XML is well-formed • conforms to XML specification • checks XML is valid (has and matches a DTD) • non-validating parser • only checks XML is well-formed • may pass invalid XML
Labs • Now split into two sessions • Thursday C26 11.00-13.00 • Friday C18 11.00-13.00 • Choose one as convenient • Assessed Lab will be in a separately arranged session on afternoon of Friday 30th November