420 likes | 543 Views
XML Syntax - Writing XML and Designing DTD's. HTML – 1 st Example. <html><head><title>Chocolate Cake</title><body> <b>Ingredient List</b><hr /> <br>2 cups flour <br>1 cup sugar <br>2 bars chocolate <br>1 cup milk <br><br><b>Instructions</b> <hr><br>Mix flour, sugar and milk
E N D
HTML – 1st Example <html><head><title>Chocolate Cake</title><body> <b>Ingredient List</b><hr /> <br>2 cups flour <br>1 cup sugar <br>2 bars chocolate <br>1 cup milk <br><br><b>Instructions</b> <hr><br>Mix flour, sugar and milk <br>Eat chocolate <br>Bake at 400 degrees </body></html>
XML Document Structure • Text file containing Elements, Attributes & Text <?xml version=“1.0” ?> <Recipe name=“Chocolate Cake” type=“Desert” > <IngredientList> <Ingredient>2 cups flour</Ingredient> <Ingredient>1 cup sugar</Ingredient> </IngredientList> <Instruction>Sift the flour</Instruction> </Recipe>
XML Document Structure • Text file containing Elements, Attributes & Text <?xml version=“1.0” ?> <Recipename=“Chocolate Cake” type=“Desert” > <IngredientList> <Ingredient>2 cups flour</Ingredient> <Ingredient>1 cup sugar</Ingredient> </IngredientList> <Instruction>Sift the flour</Instruction> </Recipe>
10 Rules – Well Formed XML1. Must start with XML declaration <?xml version=“1.0” ?>
Valid Example(s) <?xml version=“1.0” ?> <recipe> </recipe> or <recipeBook> <recipe></recipe> <recipe></recipe> </recipeBook> Invalid Example <?xml version=“1.0”?> <recipe> </recipe> <recipe> </recipe> 2. Must be only one document element
3. Match opening & closing tags • Carry over from html origins • <hr> <p> or <bold><italic></bold></italic> • Browsers forgive, XML Parsers do NOT • <p></p> or <br /> • <bold><italic></italic></bold> • <recipe></recipe>
4. Comments allowed, but not inside attribute or element tag • <!-- Isn’t XML really cool? --> • <!-- Just like being a student!!! -->
5. Elements and Attributes must start with a letter • <Recipe> OK • <Second third=“false”> OK • <2nd> INVALID • <Recipe 2nd=“true”> INVALID
6. Attributes must go in the opening tag Valid: <recipe name=“Chocolate Cake” category=“Desert”></recipe> Invalid: <recipe></recipe name=“Chocolate Cake”>
7. Attributes must be enclosed in matching quotes • Can use either single or double quotes but must use same type to start and end attribute value Name=“Australian Computer Society” Name=‘Australian Computer Society’
Let’s finish these rules! • 8. Only simple text for attributes, no nested values. Nesting is allowed in elements, not in attributes. • 9. Use < & > " and ' for special characters. < & > “ ‘ • 10. Write empty elements using <recipe /> syntax if no nested values, can still have attributes in tag <recipe type=“desert” />.
With these 10 rules, we have a “Well Formed” xml document • It means the xml can be read, processed or parsed. • Doesn’t mean the structure makes sense. <recipe model=“Holden”> <chapter></chapter> <engine cylinders=“4”></engine> <recipe>
Examples • Buggy dictionary • Non-buggy dictionary • FIDA
DTD – Document Type Definition • Allows us to define the exact elements and attributes for the document • These effectively become the rules of our own markup language, the extensible part of xml • DTD – really only defines the structure, limited in what you can validate in regards to the text values of the element or attribute.
Recipe DTD <!ELEMENT Recipe (Name, Description?, Ingredients?, Instructions?)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Ingredient (Qty, Item)> <!ELEMENT Qty (#PCDATA)> <!ATTLIST Qty unit CDATA #REQUIRED> <!ELEMENT Item (#PCDATA)> <!ATTLIST Item optional CDATA “0” isVegetarian CDATA “true”>
Elements • Basic rules • Start tag <tag_name> and end tag </tag_name> • Tags must be nested • <tag1><tag2>…</tag2></tag1> • Tags may be empty (no enclosed data) • <empty_tag/> • Whitespace in element content usually ignored • <section><p> … </p></section> • <section> <p> … </p></section>
Element Declarations • Used to define new elements and their content • <!ELEMENT name (#PCDATA)> <name> … </name> • Empty element has no content • <!ELEMENT name EMPTY> <name/> • When children allowed - any or model group • <!ELEMENT name ANY> • <!ELEMENT person (name, e-mail*)>
Model Groups • Used to define content of elements • <!ELEMENT person (name, e-mail*)> • Used to define hierarchies of elements • <!ELEMENT name (fname, surname)><!ELEMENT fname (#PCDATA)><!ELEMENT surname (#PCDATA)><!ELEMENT e-mail (#PCDATA)> • Control organisation of elements • Sequence connector - ',' - (A, B, C) [then] • Choice connector - '|' - (A | B | C) [or]
Model Group Quantity Indicators • Describe constraints on elements in DTDA? May occur [0..1]A+ Must occur [1..*]A* May occur [0..*]A | B Either A or BA, B A followed by B(A, B)+ ((A,B?) | C+)*
Attributes • Provide additional information about an element • Enclosed by quotes - either " or ' • Case-sensitive • May be character data or tokenized • value="Blue Peter" (character data) • value = "blue" (single token) • value = "red green blue" (tokens) • Values may be enumerated or defaulted (DTD)
Attribute Declarations • Attributes can be attached to elements • Declared separately in ATTLIST declaration • <!ATTLIST tag … > • Rest of definition specifies • attribute name • attribute type • default value
Attribute Names and Types • Attribute name • <!ATTLIST tag nmetypedefault> • <!ATTLIST tag first_attr …secon_attr … third_attr … > • Attribute types • CDATA • NMTOKEN • NMTOKENS • ENTITY • ENTITIES • ID • IDREF • IDREFS • NOTATION • name group
CDATA Character data NMTOKEN Single token NMTOKENS Multiple tokens ENTITY Attribute is entity ref ENTITIES Multiple entity ref's ID Unique ID IDREF Match to ID IDREFS Match to multiple ID's NOTATION Describe non-XML data Name group Restricted list Attribute Types
CDATA <!ATTLIST person name CDATA … > NMTOKEN <!ATTLIST mug color NMTOKEN … > NMTOKENS <!ATTLIST temp values NMTOKENS … > ENTITY <!ATTLIST person photo ENTITY … > ENTITIES <!ATTLIST album photos ENTITIES …> ID <!ATTLIST person id ID … > IDREF <!ATTLIST person father IDREF … > IDREFS <!ATTLIST person children IDREFS … > NOTATION <!ATTLIST image format NOTATION (TeX|TIFF) …> Name group <!ATTLIST point coord (X|Y|Z) … > Attribute Types
CDATA name = "Tom Jones" NMTOKEN color="red" NMTOKENS values="12 15 34" ENTITY photo="MyPic" ENTITIES photos="pic1 pic2" ID ID = "P09567" IDREF IDREF="P09567" IDREFS IDREFS="A01 A02" NOTATION FORMAT="TeX" Name group coord="X" Attribute Types
Default Attribute Values • Can specify a default attribute value for when its missing from XML document, or state that value must be entered • #REQUIRED Must be specified • #IMPLIED May be specifed • "default" Default value if unspecified • #FIXED Only one value allowed <ATTLIST tag name type default> <!ATTLIST seqlist sepchar NMTOKEN #REQUIRED type (alpha|num) "num"
Declarations • Instructions for the XML processor • Format - <! … > or <! … [<! … >]> • Document type - <!DOCTYPE … > • Character data - <![CDATA[ … ]]> • Entities - <!ENTITY … > • Notation - <!NOTATION … > • Element - <!ELEMENT … > • Attributes - <!ATTLIST … > • <![INCLUDE[…]]> and <![IGNORE[…]]>
Document Type Declaration • Identifies the name of the document root element • <!DOCTYPE My_XML_Doc> • May also add entity definitions and DTD • <!DOCTYPE My_XML_Doc [ … ] ><My_XML_Doc> ...</My_XML_Doc>
Comment Declaration • Comments are not considered part of XML document and should not be published • <!-- A comment --> • Cannot have additional '--' in comment • Cannot embed inside other declarations
Character Data Declaration • For occasions when text must contain uninterpreted markup characters • Press <<<ENTER>>> • <![CDATA[Press <<<ENTER>>>]]>
Processing Instructions • Information required by an external application • Processing Instructions • Format - <? … ?> • XML PI - <?xml version='1.0’ ?> • Confusingly, this is called the XML declaration, but is a processing instruction
Entities • XML document may be distributed among a number of files • Each unit of information is called an entity • Each entity has a name to identify it • Defined using an entity declaration • Used by calling an entity reference
When to use Entities • Use an entity when the information • Is used in several places • May be represented differently • Is part of a larger document that needs to be split up to be manageable • Conforms to a data format other than XML
Internal Entity Stored in main document Text content only External Entity Stored externally to the main document Text or binary Can use to group many internal entities together General Entity Referred to in XML document Parameter Entity Referred to in markup declarations in DTD Types of Entity
General Entities • Declared in 'Document Type Declaration' • <!DOCTYPE My_XML_Doc [ <!ENTITY name "replacement"> ]> • <!ENTITY xml "eXtensible Markup Language"> • The &xml; includes entities • The eXtensible Markup Language includes entities
Parameter Entities • Declared in 'Document Type Declaration' • <!DOCTYPE My_XML_Doc [ <!ENTITY % name "replacement"> ]> • <!ENTITY % param "(para | list)"> • <!ELEMENT section (%param;)*>
External Entities • External Text Entities • Location specified with SYSTEM keyword • <!ENTITY ent SYSTEM "/ENTS/MYENT.XML"> • May specify with public identifier • <!ENTITY ent PUBLIC "-//EBI//ENTITIES ents//EN" … > • External Binary Entities • Need to identify format of data - NDATA • <!ELEMENT pic EMPTY><!ATTLIST pic name ENTITY #REQUIRED><!ENTITY photo SYSTEM "/ENTS/photo.tif" NDATA TIFF> • Referenced by empty element • A photograph <pic name="photo"/>.
Restrictions on Entities • General text entities • Can appear in element content • <para> … &ent; … </para> • Can appear in attribute value • <para name="&ent;"> … </para> • Can appear in internal entity content • <!ENTITY cod "&ent;"> • Cannot appear in other parts of DTD
Restrictions on Entities (2) • Binary entities • If entity content is not XML, the entity cannot be used as a textual reference • Error - <!ELEMENT sec (para|&photo;)> • Error - <para> &photo; </para> • Binary entity can only appear as an attribute of type ENTITY • <!ENTITY photo SYSTEM "photo.tif" NDATA TIFF>…<!ELEMENT pic (#PCDATA)><!ATTLIST pic name ENTITY #REQUIRED>
Parameter Entities • Use parameter entities within DTD • <!ENTITY %common "(para|list|table)"><!ELEMENT chapter ((%common;)*, section*)><!ELEMENT section (%common;)*> • Safest to include parentheses in entity definition and around entity reference
Putting it all together... • Have now been introduced to the main components and rules of XML and DTD’s • Entities, elements, declarations, processing instructions, attribute lists • Use all these components in the 'Document Definition Type' (DTD) to specify the rules about the format of the XML document