320 likes | 472 Views
XML. eXtensible Markup Language. XML. A descendent of SGML (Standard Generalized Markup Language) A Recommendation of W3C in 1998 A universal language for data on the Web HTML for the presentation of data XML for the structuring of data A meta markup language
E N D
XML eXtensible Markup Language
XML • A descendent of SGML (Standard Generalized Markup Language) • A Recommendation of W3C in 1998 • A universal language for data on the Web • HTML for the presentation of data • XML for the structuring of data • A meta markup language • Enables the creation of new markup languages to markup anything imaginable (math formulas, molecular structure of chemical, etc.) • Gives developers the power to deliver structured data from a wide variety of applications to the desktop for local computation and presentation • An ideal format for server-to-server transfer of structured data
XML and its Derivatives • FpML (http://www.fpml.org/) • XDBML (master thesis at ITK, 2005) • MathML • ChemML • VoiceML • SMIL (Synchronized Multimedia Interface Language) • XMI (XML Metadata Interchange) • XML + UML • => universal format for exchanging OO system analysis and design documents • More …
How XML is similar to HTML • XML uses tags just like HTML, but those tags don’t define text formatting. Instead the tags are used to create data structures Let’s see some examples…
Examples of HTML and XML HTML Code: <b> This is bold text… </b> XML Code: <President> Clinton </President> Note: XML is case-sensitive! Using our own custom tag named “President,” we have stored a small piece of information.
Detailed Example • XML documents is organized in a hierarchal fashion. Each tag or node can have “sub” nodes under it. • Well-Formedness: Any number of nodes can be created under any given node. But each node must be “closed” using a closing tag, like </President>. • Exception: “empty” element does not have a closing tag • E.g., <flag id = “Y” /> <President> <Name>Clinton, Bill</Name> <Age>52</Age> <Terms>2</Terms> </President> Must end with a forward slash
XML Elements • A “Node” in an XML document is known as an Element. • An XML document can have any number of elements. • For example we could store information about 10 Presidents in a document. • However, there is only one root element, i.e., <Presidents> <President> </President> …</Presidents>
Multiple Elements <Cars> <Car> <Manufacturer>Mitsubishi</Manufacturer> <Model>Eclipse</Model> <Year>1998</Year> </Car> <Car> <Manufacturer>Pontiac</Manufacturer> <Model>Sun Fire</Model> <Year>1997</Year> </Car> <Car> <Manufacturer>Nissan</Manufacturer> <Model>X-Terra</Model> <Year>2000</Year> <SUV>Yes</SUV> </Car> </Cars>
Attributes • Besides having “sub-elements,” every element can also have what are known as Attributes. • Attributes are declared “inside” the tag. You may already know how to use attributes if you have used the <IMG> or <A> tags in HTML. • For example: <A HREF=“somepage.html”>click here</A>
XML Attributes • Here’s an example of an XML element with an Attribute…. <Vehicle VIN=“3232382432832”> <Year>1997</Year> <Manufacturer>Toyota</Manufacturer> </Vehicle> • We could make any element an attribute. • For example, Manufacturer and Year could also have been made attributes. However you usually want only meta-data or scalarto be an attribute.
A Complete Example (1) <?xml version="1.0"?> <!–- Deitel 2000, Fig. 28.1: article.xml --> <!-- Article formatted with XML --> <article> <title>Simple XML</title> <date>September 6, 1999</date> <author> <fname>Tem</fname> <lname>Nieto</lname> </author> <summary>XML is pretty easy.</summary> <content>Once you have mastered HTML, XML is easily learned. You must remember that XML is not for displaying information but for managing information. </content> </article>
A Complete Example (2a) <?xml version = "1.0"?> <!-- Deitel 2000, Fig. 28.2: letter.xml --> <!-- Business letter formatted with XML --> <!DOCTYPE letter SYSTEM "letter.dtd"> <letter> <contact type = "from"> <name>John Doe</name> <address1>123 Main St.</address1> <address2></address2> <city>Anytown</city> <state>Anystate</state> <zip>12345</zip> <phone>555-1234</phone> <flag id = "P"/> </contact>
A Complete Example (2b) <contact type = "to"> <name>Joe Schmoe</name> <address1>Box 12345</address1> <address2>15 Any Ave.</address2> <city>Othertown</city> <state>Otherstate</state> <zip>67890</zip> <phone>555-4321</phone> <flag id = "B"/> </contact> <paragraph>Dear Sir,</paragraph> <paragraph>It is our privilege to inform you about our new database managed with XML. This new system will allow you to reduce the load of your inventory list server by having the client machine perform the work of sorting and filtering the data.</paragraph> <paragraph>Sincerely, Mr. Doe</paragraph> </letter>
DTD • DTD = Document Type Definition • Defines the grammatical rules for the document • Not required for XML but recommended for document conformity • Can check the Validity of a XML document (contains proper elements, attributes, etc.) • Uses EBNF grammar • Represented by the DOCTYPE tag, which contains three parts if it refers to an external subset: • Root element applied • Flag (e.g., SYSTEM (personal, non-standardized), PUBLIC (standardized, publicly available)) • DTD name and location
DTD: Example <!ELEMENT letter (contact+, paragraph+)> <!ELEMENT contact (name, address1, address2, city, state, zip, phone, flag)> <!ATTLIST contact type CDATA #IMPLIED> <!ELEMENT name (#PCDATA)> <!ELEMENT address1 (#PCDATA)> <!ELEMENT address2 (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT zip (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT flag EMPTY> <!ATTLIST flag id CDATA #IMPLIED> <!ELEMENT paragraph (#PCDATA)>
DTD: Example (cont’d) • !ELEMENT element type declaration • Specifies that an element is being created • Here, a letter is being created with one or more contact element and one or more paragraph element, in that order. • Operator + means one or more occurrences • Operator * means zero or more occurrences • Operator ? means zero or exactly one occurrence • If no operator is included, exactly one occurrence is assumed. • Others: “|” - alternatives
DTD: Example (cont’d) • !ATTLIST element type declaration • Defines the attribute of an element • Here, the type of contract is defined to have: • A string (as given by CDATA), which is unspecified and optional (as given by #IMPLIED). • The string will not be parsed by XML processor and will simply be passed directly to the application • Others: • #PCDATA means this element can store parsed character data (i.e., text) • EMPTY means the element does not contain any element • Commonly used for an element’s attribute • More Others: • IDs and IDREFs (your next assignment!)
XML Schema [Silberschatz et al. ’02] • XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs. Supports • Typing of values • E.g. integer, string, etc • Also, constraints on min/max values • User defined types • Is itself specified in XML syntax, unlike DTDs • More standard representation, but verbose • Is integrated with namespaces • Many more features • List types, uniqueness and foreign key constraints, inheritance .. • BUT: significantly more complicated than DTDs, not yet widely used (yet!).
XML Schema: Example <xsd:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema> <xsd:element name=“bank” type=“BankType”/> <xsd:element name=“account”><xsd:complexType> <xsd:sequence> <xsd:element name=“account-number” type=“xsd:string”/> <xsd:element name=“branch-name” type=“xsd:string”/> <xsd:element name=“balance” type=“xsd:decimal”/> </xsd:squence></xsd:complexType> </xsd:element> …..definitions of customer and depositor …. <xsd:complexTypename=“BankType”><xsd:squence> <xsd:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/> <xsd:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/> <xsd:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/> </xsd:sequence> </xsd:complexType> </xsd:schema>
Querying and Transforming XML Data [Silberschatz et al. ’02] • Translation of information from one XML schema to another • Querying on XML data • Above two are closely related, and handled by the same tools • Standard XML querying/translation languages • XPath • Simple language consisting of path expressions • XSLT • Simple language designed for translation from XML to XML and XML to HTML • XQuery • An XML query language with a rich set of features • Wide variety of other languages have been proposed, and some served as basis for the XQuery standard • XML-QL, Quilt, XQL, …
Tree Model of XML Data • Query and transformation languages are based on a tree model of XML data • An XML document is modeled as a tree, with nodes corresponding to elements and attributes • Element nodes have children nodes, which can be attributes or subelements • Text in an element is modeled as a text node child of the element • Children of a node are ordered according to their order in the XML document • Element and attribute nodes (except for the root node) have a single parent, which is an element node • The root node has a single child, which is the root element of the document • We use the terminology of nodes, children, parent, siblings, ancestor, descendant, etc., which should be interpreted in the above tree model of XML data.
XPath • XPath is used to address (select) parts of documents usingpath expressions • A path expression is a sequence of steps separated by “/” • Think of file names in a directory hierarchy • Result of path expression: set of values that along with their containing elements/attributes match the specified path • E.g. /bank-2/customer/name evaluated on the bank-2 data we saw earlier returns <name>Joe</name> <name>Mary</name> • E.g. /bank-2/customer/name/text( ) returns the same names, but without the enclosing tags
XPath • The initial “/” denotes root of the document (above the top-level tag) • Path expressions are evaluated left to right • Each step operates on the set of instances produced by the previous step • Selection predicates may follow any step in a path, in [ ] • E.g. /bank-2/account[balance > 400] • returns account elements with a balance value greater than 400 • /bank-2/account[balance] returns account elements containing a balance subelement • Attributes are accessed using “@” • E.g. /bank-2/account[balance > 400]/@account-number • returns the account numbers of those accounts with balance > 400
XPath • Operator “|” used to implement union • E.g. /bank-2/account/id(@owner) |/bank-2/loan/id(@borrower) • gives customers with either accounts or loans • However, “|” cannot be nested inside other operators. • “//” can be used to skip multiple levels of nodes • E.g. /bank-2//name • finds any name element anywhere under the /bank-2 element, regardless of the element in which it is contained. • A step in the path can go to: parents, siblings, ancestors and descendants of the nodes generated by the previous step, not just to the children • “//”, described above, is a short from for specifying “all descendants” • “..” specifies the parent.
Functions in XPath • XPath provides several functions • The function count() at the end of a path counts the number of elements in the set generated by the path • E.g. /bank-2/account[customer/count() > 2] • Returns accounts with > 2 customers • Also function for testing position (1, 2, ..) of node w.r.t. siblings • Boolean connectives and and or and function not() can be used in predicates • IDREFs can be referenced using function id() • id() can also be applied to sets of references such as IDREFS and even to strings containing multiple references separated by blanks • E.g. /bank-2/account/id(@owner) • returns all customers referred to from the owners attribute of account elements.
XSL • Extensible Style Language (XSL) • Defines the layout of an XML (much like CSS defines the layout of an HTML document) • XSL style sheet provides the rules for displaying an XML document • XSL also defines rules on how an XML document is transformed into another XML document (i.e., XSLT for XSL Transformation)
XSL: Example • See Program listing
XSL: Example (cont’d) <xsl:for-each order-by = “+Lastname;+Firstname” select = “contact” xmlns:xsl = http://www.w3.org/TR/WD-xsl> • for-each • Iterate over each element of contact • order-by • + means ascending; - means descending • select • Defines which elements are selected • xmlns • XML namespace • Indicates where the specification for this element is located
XSL: Example (cont’d) <Lastname><xsl:value-of select = “Lastname”/> • xsl:value-of • Retrieves the data specified in attribute select • Empty element and thus the ‘/’ <xsl:for-each select = “contact[Lastname=‘Neito’]” • contact[Lastname=‘Neito’] • [] specifies XSL conditional statement
XSL: Example (cont’d) var xmldoc = xmlData.cloneNode( true ); • Copy xmlData object so that we don’t lose the original • true means recursively copy function sort( xsldoc ) { xmldoc.documentElement.transformNodeToObject( xsldoc.documentElement, xmlData.XMLDocument ); } • transformNodeToObject • Applies a specified XSL style sheet to the data contained in the parent object • documentElement gets the root element • XMLDocument accesses the XML document to which xmlData refers
XML Resources • W3C XML Standards Body • http://www.w3c.org/xml • Microsoft Developer Network (MSDN) • http://msdn.microsoft.com/xml • The BizTalk Framework • http://www.biztalk.org • IBM’s XML Zone • http://www.ibm.com/developer/xml/