1.14k likes | 1.31k Views
XML – Extensible Markup Language. Objectives. To understand various ways in which XML can be used History of XML Syntax of XML Difference between HTML, XML and XHTML XML Document Type Definitions (DTDs) XML Schemas To understand types of XML Parsers Validating vs. Non-Validating Parsers
E N D
Objectives • To understand various ways in which XML can be used • History of XML • Syntax of XML • Difference between HTML, XML and XHTML • XML Document Type Definitions (DTDs) • XML Schemas • To understand types of XML Parsers • Validating vs. Non-Validating Parsers • To understand different XML Parser Interfaces • Tree Based Interface Standard : DOM • Event Based Interface Standard : SAX • Evaluating Parsers • Which parser to use?
History of XML • The World Wide Web Consortium (W3C) is an international consortium where Member organizations, a full-time staff, and the public work together to develop Web standards • Tim Berners-Lee and others created W3C (1994) • Berners-Lee, who invented the World Wide Web in 1989. • In 1970 IBM Introduced SGML • SGML: Standard Generalized Markup Language • SGML is a semantic and structural language for text • documents. • SGML is complicated. • XML Working Group is formed under W3C in 1996. • In 1998 W3C introduced XML 1.0 • Extensible Markup Language (XML) is a subset of SGML
What is XML? • XML stands for eXtensible Markup Language • XML is a universal method representing data • Used in applications, web and for data exchange • XML is a markup language much like HTML, but used for different purposes • XML is not a replacement for HTML
What is XML? • XML was designed to describe data • XML is a cross-platform, software and hardware independent tool for transmitting or exchanging information. • XML is an open-standards-based technology • Extensible • Both Human and machine readable • XML Standard • XML 1.0 (1998). • XML 1.1 (Feb 2004)
What Exactly is XML used for? • Storing data in a structured manner. ( Tree structure) • Storing configuration information – typically data in an application which is not stored in a database • Most server software have configuration files in XML formats
Contd… • Transmitting data between applications • Overcomes Problems in Client Server applications which are cross-platform in nature • Ex: A Windows program talking to a mainframe • XML is a universal, standardized language used to represent data such that it can be both processed independently and exchanged between programs and applications and between clients and servers • Disparate systems can exchange information in a common format
XML Syntax • The syntax rules of XML are very simple and very strict. • XML tags are not predefined. You must define your own tags <college>GCET</college> • All XML elements must have a closing tag <para>This is a paragraph</para>
Contd… • XML tags are case sensitive • <Msg>This is incorrect</msg>Incorrect • <msg>This is correct</msg> Correct • All XML elements must be properly nested • <name>Jill<lname>Jack</name></lname> Incorrect • <name>Jill<lname>Jack</lname></name> Correct • Attribute values must always be quoted • <pen color=red>reynolds</pen> Incorrect • <pen color=“red”>reynolds</pen> Correct
XML Syntax All XML documents must have a root element <parent> <child> <subchild>.....</subchild> </child> </parent>
XML Comments • Comments in XML • Comments are similar to HTML • <!-- This is a comment --> <?xml version="1.0"?> <!–- Customer details --> <customer> <name>John</name> <email>John@jerry.com</email> </customer>
XML Code <?xml version="1.0"?> <customers> <customer> <name>John</name> <email>John@jerry.com</email> </customer> <customer> <name>Tom</name> <email/> </customer> </customers>
Extensibility in XML • A typical XML document is made up of tags enclosing the data; tag names describe the data • Because the language is extensible, you can create tags that are specific to your need
Contd… • For example, your document may contain tags to structure information about employees • The tags may include <Name>, <Designation>,and <Address> • Data stored in XML is self-descriptive • One can understand the data by just looking at tag names
XML – Exchanging Info Between Apps • Convert information stored in the database (or any other format) to an XML format • Once it is in XML format, other applications/programs can parse (read) the XML document, which is made up of the initial data • XML parsers are freely available and are part of many new programming languages
Spreadsheet Package An Application XML Database Statistical Processing CAD Package Contd…
DTD/XSD XML Doc Content Structure Presentation XSL XSD - XML Schema Definition DTD - Document Type Definition. XSL - Extensible Stylesheet Language.
Document Type Declaration (DTD) • DTD (Document Type Definition) is used to enforce structure requirements for an XML document • Document type declaration contains reference to Document Type Definition (DTD) and tells the parser which DTD to use for validation
Contd… <?xml version="1.0"?> <!DOCTYPE customers [ <!ELEMENT customers (customer)> <!ELEMENT customer (name,email)> <!ELEMENT name (#PCDATA)> <!ELEMENT email (#PCDATA)> ]> <customers> <customer> <name>John Conlon</name> <email>John@jerry.com</email> </customer> </customers>
XML Schema • An XML based alternative to DTD • Richer and more useful than DTDs • Written in XML and Simpler than DTDs • Support data type validation (DTD does not support data type validation)
<?xml version="1.0"?> <addressBook> <person> <cname>Harrison Ford</cname> <email>hford@famous.org</email> </person> <person> <cname>Julie</cname> <email>jr@pw.com</email> </person> </addressBook>
<?xml version="1.0"?> <xs:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema> <xs:complexType name="record"> <xs:sequence> <xs:element name="cname" type="xs:string"/> <xs:element name="email" type="xs:string/> </xs:sequence> </xs:complexType> <xs:element name="addressBook"> <xs:complexType> <xs:sequence> <xs:element name="person" type="record" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
XSD Syntax • Simple XML Elements with Pre-defined Data Types • Simple XML Element: An XML element that has no child elements and attributes. Simple XML elements can be defined in XSD with the following statement: <xsd:element name="element_name" type="xsd:type_name"/>
Contd… where "element_name" is the name of the XML element, and "type_name" is one of the data type names pre-defined in XSD. • XSD pre-defined data types are divided into 7 groups: • Numeric data types • Date and time data types • String data types • Binary data types • Boolean data type
XSD Syntax • Simple XML Elements with Extended Data Types • Simple XML Element: An XML element that has no child elements and attributes. Simple XML elements can be defined by using the pre-defined XSD data types.
They can also be defined by using extended data types, which are defined by "simpleType" statements: <xsd:simpleType name="my_type_name"> <xsd:restriction base="xsd:type_name"> XSD facet statements </xsd:restriction> </xsd:simpleType> <xsd:element name="element_name" type="my_type_name"/> where "element_name" is the name of the XML element, "xsd:type_name" is a pre-defined data type serving as the base data type, and "my_type_name" is the new data type extended from the base data type.
XSD Syntax • Complex XML Elements • Complex XML Element: An XML element that has at least one child element or at least one attribute. Complex XML elements must be defined with complex data types, which are defined by "complexType" statements:
<xsd:element name="element_name" type="my_type_name"/> <xsd:complexType name="my_data_type"> <xsd:sequence> <xsd:element name="child_element_1" type="data_type_1"/> <xsd:element name="child_element_2" type="data_type_2"/> ... </xsd:sequence> <xsd:attribute name="attribute_a" type="data_type_a"/> <xsd:attribute name="attribute_b" type="data_type_b"/> ... </xsd:complexType> where "attribute" statement is used to define an attribute, and "sequence" statement is used to define the group of child elements, and the order the child elements should appear in the XML structure. Note that "attribute" statements must appear after the child element definition statements.
XSD Syntax • Empty XML Elements • Empty XML Element: A special complex XML element that has one attribute or more and no child text nodes. Empty XML elements must be defined with complex data types in the following format: <xsd:complexType name="my_data_type"> <xsd:attribute name="attribute_a" type="data_type_a"/> <xsd:attribute name="attribute_b" type="data_type_b"/> ... </xsd:complexType>
XSD Syntax • Anomymous Data Types If data type is specific to a child element in a parent data type, and there is not need to share it with data types outside the parent data type, you can define it as anonymous data type - a non-named data type defined inline. For example, the following code:
<xsd:complexType name="my_data_type"> <xsd:sequence> <xsd:element name="setting"> <xsd:complexType> <xsd:sequence> <xsd:element name="property" type="xsd:string"/> <xsd:element name="value" type="xsd:integer"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> defines "my_data_type" which has a "setting" element, which has an anonymous data type defined inline.
Well-formed XML Documents • A document is made of elements; There is exactly one element, called the root, or document element • For all other elements, the elements, delimited by start- and end-tags, nest properly within each other • Attributes if any, should have their values enclosed within quotes
Valid XML Documents • An XML document is valid if it has an associated DTD or Schema and if the document complies with the constraints expressed in it • If an XML document is valid, it is also well-formed
Document Type Definitions (DTDs) • Describes syntax that explains • which elements may appear in the XML document • what are the element contents and attributes • Need for DTD • Validating parser ( a program) can be used to check whether XML data adheres to the rules in DTD • The parser can do appropriate error handling if there are any violation • Validity error is not necessary a fatal error, but some applications may treat it as fatal error
Document Type Declarations • A valid XML document must include the reference to DTD which validates it • Types of DTD • Internal DTD: DTD can be embedded into XML document • External DTD: DTD can be in a separate file
Internal DTD • DTD embedded in the XML document • The declarations appear between [and] • E.g. AddressBook.xml
<?xml version='1.0' encoding='utf-8'?> <!-- DTD for a AddressBook.xml --> <!DOCTYPE AddressBook [ <!ELEMENT AddressBook (Address+)> <!ELEMENT Address (Name, Street, City)> <!ELEMENT Name (#PCDATA)> <!ATTLIST Name salutation CDATA #REQUIRED> <!ELEMENT Street (#PCDATA)> <!ELEMENT City (#PCDATA)> ]> <AddressBook> <Address> <Name salutation="Mr.">Ram</Name> <Street>M G Road</Street> <City>Bangalore</City> </Address> </AddressBook>
External DTD • DTD is present in separate file • Example • The DTD for AddressBook.xml is contained in a file AddressBook.dtd • AddressBook.xml contains only XML Data with a reference to the DTD file • AddressBook.xml
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE AddressBook SYSTEM "file:///c:/XML/AddressBook.dtd"> <AddressBook> <Address> <Name salutation="Mr.">Ram</Name> <Street>M G Road</Street> <City>Bangalore</City> </Address> </AddressBook>
Anatomy of DTD – Defining new XML tags (Elements) • <!ELEMENT element_name content_specification> • element_name: Specifies name of the XML tag • Content_specification: Specifies what are the contents of the element • #PCDATA: Parsed character data (Extra white spaces are ignored) • #CDATA: Character data (White spaces retained as is) • Nested elements • Empty • Any (generally avoided but used in mixed content model)
Example: • <!ELEMENT Street (#PCDATA)> • element Street contains the parsed character Data • <!ELEMENT Address (Name, Street, City)> • element Address contains three nested tags Name, Street and City respectively • <!ELEMENT AddressBook (Address+)> • Element AddressBook contains one or more occurrences of element Address
Anatomy of DTD – Dealing with multiple children • To declare the children of an element we use syntax similar to regular expression in Perl. To define the children of an element we use the following syntax: (Assume a and b are child elements of the element being declared)
A+ -One or more occurrences of a A* - Zero or more occurrences of a A?-a or nothing A, B – A followed by B A|B – a or b, but not both (expression) – Surrounding an expression with parentheses means that it is treated as a unit and may have the suffix operator ?,*or +
Some examples • <!ELEMENT ITEM (PRODUCT,NUMBER,(PRICE|CHARGEACCT|SAMPLE))> • <!ELEMENT ITEM (PRODUCT,NUMBER,(PRICE|CHARGEACCT*|SAMPLE)+)> • <!ELEMENT ITEM (#PCDATA|PRODUCTID)*> • <!ELEMENT BOOK(OPENER,SUBTITLE?,INTRODUCTION?,(SECTION|PART)+)>
Anatomy of DTD – Attribute Declarations • Specifies allowable attributes of each element • <!ATTLIST Tag-name Attr-Name Attr-Type Restriction> • Tag-name : Element name • Attr-Name : Name of the attribute, the attribute is defined for element Tag-Name
Restriction: • Value : Shows a simple text value enclosed in quotes • #IMPLIED:Indicates that there is no default value for this attribute, and this attribute need not be used • #REQUIRED:Indicates that there is no default value for this attribute, but that a value must be assigned to this attribute • #FIXED Value:In this case, Value is the attribute’s value, and the attribute must always have this value
Anatomy of DTD – Attribute Declarations • Example • <!ATTLIST Name salutation CDATA #REQUIRED> • The element Name has attribute salutation which is of type CDATA • The attribute salutation must be specified in the Name tag
Anatomy of DTD – Entity Declarations (1 of 2) • Way to escape special characters • Some special characters such as <, >, & are not used as #PCDATA • This escaping of the characters is called as “Entity reference”
Following different entity references are used in the XML document • Built-in Entities :&, <, >, ', " • Characters Entities : ó representing ó • Example • <State>Jammu & Kashmir</State>
Anatomy of DTD – Entity Declarations(2 of 2) • Data that is frequently used can be declared as an General Entity • <!ENTITY entity_name entity_contents> • entity_name : Name of the new Entity • entity_contents : Contents of the new entity