230 likes | 369 Views
XML I. Learning Objectives What is XML Features of XML Uses of XML Structure of an XML document Document Type Declaration Document Type Definitions (DTDs). What is XML? XML means Extensible markup language. It is NOT a version of HTML
E N D
Learning Objectives • What is XML • Features of XML • Uses of XML • Structure of an XML document • Document Type Declaration • Document Type Definitions (DTDs)
What is XML? • XML means Extensible markup language. • It is NOT a version of HTML • Derived from SGML (Standard Generalized Mark-up language, which was established in 1986 as a standard for generalized electronic document exchange. • Has 3 main features: structure, extensibility and validation. • XML defines a framework for transmitting structured data, hence an XML document is essentially a structured document for storing information. • Allows creation of custom mark-up tags for describing virtually anything. • XML documents are processed by an XML processor.
Uses of XML • Applied use of its capability of storage, and exchange of structural data between applications, that constitute the core of systems). • Examples of XML applications are Chemical Markup Language (CML), Extensible Financial Reporting Markup Language (XFRML), and Mathematical Markup Language. • Used in e-commerce to store, and transmit product, and other data, including financial information. • Used in Open Financial eXchange. • Used in search engines to store, and search data. • Applied use in virtually every sector.
By including, or referencing a Document type definition (DTD), XML documents can be validated. XML Syntax Fundamentals • XML syntax describes the constructs used to define the structure and layout of an XML document, as well as the constraints involved. • An XML processor is a software module that reads an XML document, and provides access to its content and structure. • XML processors typically process documents on behalf of applications, and are readily available as software plug-ins. • IE 5.0 is an e.g. of an XML application that processes and displays XML documents.
Entity: The basic building block of an XML document. Contains either parsed or unparsed data. • Parsed data consists of characters that are considered as character data or mark-up, and are processed by an XML processor. • Unparsed character is handled as raw text and is not processed. • E.g. <name>John</name>, <name> and </name> are mark-up, while John is character data. • Markup: Used to provide a description of a document’s storage structure (entities) and logical structures (elements). • Elements: Describe the logical structure. They have start tags e.g. <name> and end tags ( </name> ), or a single empty tag (<name/>).
XML mark-up components include: • Tags: Most obvious component in XML syntax, used to describe elements. • Processing instructions: Passed by the parser to the application. Begin with <? and end with ?>. E.g <?xml version=“1.0”?> indicates that the document is based on xml version 1.0 • Document type declarations: Used to specify information about the document, including the document’s root element, and the Document Type Definition (DTD). Must appear after the XML declaration, but before the root element e.g <?xml version=“1.0”> <!DOCTYPE addressbook SYSTEM “Addressbook.dtd”> <addressbook> <contact> addressbook declared in line 2 must correspond to <addressbook> in line 3, the root element of the document.
Entity references: Used to assign aliases to pieces of data. They are made within an ampersand (&) and a colon (;). E.g. ' corresponds to an apostrophe (‘) while & corresponds to ‘&’. • Comments: Used to present information that is technically not part of the document’s content. Begin with <!– and end with -- > • Marked (CDATA) Sections: Used to block off text that is to be sidestepped by the parser. Defined by enclosing it in within <![CDATA[ and ]]>. E.g. <![CDATA[<name>John</name>]]. In this example, the name element is not recognized as mark-up and John is not recognized as parsed character data. It is common to use CDATA sections to quote a piece of XML code, e.g. in a tutorial.
Styling XML for display • Accomplished in 2 ways: • With the use of CSS. • With XSL. More complex and advanced than CSS Parsing XML • Can be validating or non-validating. • Validating parsers validate XML documents against a DTD or XML Schema. • E.g.s of XML parsers are The Lark and Larval XML parsers for Java, Sun’s Project X Parser for Java, IBM’s XML Parser for Java, Oracle XML parser for Java, IBM’s XML Parser for C++.
Example of an XML Document <?xml version=“1.0”?> <!DOCTYPE addressbook SYSTEM “Addressbook.dtd”> <addressbook> <contact> <name>Tony Benn</name> <address>210 Temple road</address> <city>London</city> <postcode>NW9 0RT</postcode> <phone>02082049565</phone> </contact> <contact> <name>Peter Bloggs</name>
<address>230 The Vale</address> <city>London</city> <postcode>NW6 2BT</postcode> <phone>02082029517</phone> </contact> </addressbook> The above example is a well-formed XML document used to store contact information. However, it is not valid yet! Note that the root element (<addressbook>) has nested child elements that are defined with opening and closing tags respectively.
XML Data Modelling • Involves describing the structure of XML documents, for the purpose of validation. • After defining a data model, you can create structured XML documents that must adhere to that model, to be valid. • Valid vs Well-formed XML: It is perfectly legal to create an XML document without a data model, in which case the document could be considered well-formed, but is not valid. • There are 2 approaches to creating data models: • DTDs (Document Type Definitions) and • XML Schemas • The data model (DTD or XML Schema) defines the arrangement of mark-up and character data within a valid XML document, i.e. the order of nesting of the elements.
Modelling Data with DTDs • DTDs (Document Type Definitions) rely on specialized syntax for describing the structure of XML vocabulary (class of document). • DTDs can be broken down into 2 subsets: • Internal or Local DTD: Mark-up declarations are contained in the prolog (section of document preceding the root element) of the same document. • External DTD: External mark-up declarations that can be referenced by one or more documents. • The 2 subsets may be combined, with Internal having higher precedence. • The DTD declares every element, attribute and entity used in the XML document. • It must be declared, or referenced in the document type declaration.
Example: Addressbook.dtd <!ELEMENT addressbook (contact)+> <!ELEMENT contact (name, address, city, postcode, phone)> <!ELEMENT name (#PCDATA)> <!ELEMENT address (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT postcode (#PCDATA)> <!ELEMENT phone (#PCDATA)> <addressbook> <contact> <name>Tony Benn</name> <address>210 Temple road</address> <city>London</city> <postcode>NW9 0RT</postcode> <phone>02082049565</phone> </contact>
Document type declaration syntax: <!DOCTYPE rootElem SYSTEM ExtDTDRef [InternalDTDDecl]> where rootElem is the root element, ExtDTDRef is the External DTD reference, and InternalDTDDecl is the Internal DTD declaration. Illustration: <!DOCTYPE movies SYSTEM “Movies.dtd” [ <!ELEMENT actor (#PCDATA)> ]> <movies> <title>Lord of the rings</title> <!– the other child elements go here -- > • External DTDs are more commonly used, and are especially useful when you are creating multiple documents of the same class; when you would like to use an existing DTD; or to make your document as concise as possible.
Internal DTDs are preferable in situations where you’re creating only one document, or to reduce the overhead associated with your documents. Elements and Attributes • The primary contents described in a DTD are elements and attributes. • Think of an element as a logical unit of information, and an • Attribute as a characteristic of that information. • By looking at a document as a group of information objects, it is usually possible to associate each object with an element. Any leftover information would usually be represented as attributes. • Another approach is to consider the type of information and how it will be used.
Attributes provide tighter constraints on information, while elements on the other hand, are very loosely constrained and are better suited for long strings of text. • Attributes can be constrained against a predefined list of values, and can have default values. • Attributes are very concise, and are easier to parse. • They however can not contain nested information. Elements • Declared with element declarations in the DTD. • Syntax: <!ELEMENT ElementName Type> • ElementName corresponds to the tag used to mark up that element in the XML document. • Type specifies the content. 4 types are supported in XML:
Empty types: The element doesn’t contain any content, but may contain attributes. In the DTD, they are declared in the form: <!ELEMENT ElementName EMPTY> E.g <!ELEMENT img EMPTY> Empty elements are defined in the XML document in 2 ways: • <start tag><end tag> with no space in between e.g <img src=“pic.gif”></img>. • with an empty tag e.g <img/> or <img src=“pic.gif”/> • Element only type: The element only type contains child elements. Denoted by <!ELEMENT ElementName contentModel> The content model is specified using a combination of special element declaration symbols and child element names. The symbols represent the relationship of the child, to the container element.
Table of Special Symbols Example: <!ELEMENT resume (intro, (education| experience+)+,hobbies?,references*)>
Mixed Elements • Contain both character and child elements. The simplest mixed element is that declared to contain only character data. • Take the following form: <!ELEMENT ElementName (#PCDATA)>. E.g. <!ELEMENT city (#PCDATA)> ANY Elements • The ANY element, so named because it is declared with the symbol ANY, can contain any type of element, or a combination of elements. • Due to its lack of structure, you should avoid using it. • Typically used during development of a DTD, but should not appear in a production DTD. • Form: <!ELEMENT ElementName ANY>
Attributes • Used to specify additional information about elements. • Within an element, attributes are used to form name/value pairs that describe a particular property of the element. • Declared in a DTD with attribute list declaration which take the form: <! ATTLIST ElementName AttrName AttrType Default> • There are 4 types of default types that can be specified: • #REQUIRED: The attribute is required • #IMPLIED: The attribute is optional • #FIXED value: The attribute has a fixed value • default: The default value of the attribute • #REQUIRED implies that the attribute is required, and you must define that attribute if you use the element.
Attribute Type • Must be specified, in addition to the attribute default value. • XML supports 10 attribute types: • CDATA- Unparsed character data • Enumerated: Series of string values • NOTATION: A notation declared somewhere else in the DTD • ENTITY: An external binary entity • ENTITIES: Multiple external binary entities separated by whitespace. • ID: A unique identifier • IDREF: Reference to an ID declared somewhere else in the DTD • IDREFS: Multiple references to IDs declared somewhere else in the DTD • NMTOKEN: A name consisting of XML token characters (letters, numbers, periods, dashes, colons and underscores). • NMTOKENS: Multiple names consisting of XML token characters.
String Attributes • Most commonly used attribute • Example: <!ATTLIST player team CDATA #REQUIRED> • In the above example, the team to which a player belongs is a required character data attribute that must be defined in the player element. <!ATTLIST player team CDATA #IMPLIED> would have made the team optional. Another example: <!ELEMENT movie (Producer, Director, Actor, Writer+, Duration) <!ATTLIST movie type (comedy | thriller) #REQUIRED> In this example, the movie element contains the child elements defined, but it also has a mandatory attribute called Type which has 2 possible values.