1 / 58

XML

XML. http://www.flickr.com/photos/nics_events/2349632625/. Making your marks. SGML and HTML. SGML is a meta-markup language A language for making languages Standardized in 1986 HTML was specified with SGML

Download Presentation

XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML http://www.flickr.com/photos/nics_events/2349632625/ Making your marks

  2. SGML and HTML • SGML is a meta-markup language • A language for making languages • Standardized in 1986 • HTML was specified with SGML • Fixed set of tags/attributes. Users cannot customize which means that HTML must have anticipated all possible documents and their structure. • Little structure. Tags can occur almost anywhere in any order • XML is a ‘lite’ beer (a stripped down version of SGML)

  3. XML • XML is not a replacement for HTML • HTML describes layout • XML can be used to describe anything (content or layout). The semantics are user-defined. • XML has no predefined tags • All documents described as XML documents can be parsed with a single parser (not so with SGML) • Our book refers to • TAG SET: an xml-based markup language. Others refer to this as an XML APPLICATION • XML PROCESSOR: a parser that provides xml data to a program • XML DOCUMENT: a document that conforms to XML

  4. XML Overview • XML provides no more than a baseline on which complex semantic models can be built. All those more restricted applications will share some common invariants. • An XML document is a linearization of a tree structure. • At every node in the tree there are several character strings. • The tree structure and the character strings together form the information content of an XML document. Almost everything will follow naturally from that. • Some of the characters in the document are only there to support the linearization, others are part of the information content.

  5. XML Overview <p> <q id="x7">The first q</q> <q id="x8">The second q</q> <q href="#x7">The third q</q> </p>

  6. XML Syntax • Two parts to an xml tag set • The low-level rules that apply to all XML documents • The rules that apply to a particular tag-set. These rules are formalized as either a • Document type definition • XML Schema • Generally the low-level rules are easily understood to those familiar with HTML • The tag-set specific rules tend to be more complex.

  7. XML Tags • Elements in XML are denoted by tags • A tag has a type and may have attributes and content • A tag is denoted by an opening/closing pair • <BREWERS></ BREWERS > • < BREWERS /> • A single tag will typically have many children. • < BREWERS ><PLAYER></PLAYER><PLAYER></PLAYER></BREWERS> • Attributes are name/value pairs • <BREWERS YEAR="2011"> • <PLAYER NUMBER="8">Ryan Braun</PLAYER> • <PLAYER NUMBER="28">Prince Fielder</PLAYER> • <MANAGER NUMBER="10">Ron Roenicke</MANAGER> • </BREWERS>

  8. XML/HTML Syntax • Attributes are name/value pairs that can be attached to an element. • In HTML, you only need to quote an attribute value if it contains a space, or a character that is not allowed in a name. • <body id=main> • In XML, attribute values must always be quoted. • <happiness type="joy" /> • Element types. • In HTML there is a built-in set of element names and allowed attributes. • In XML, there are no built-in names/attributes (a couple of exceptions). • Entities. • Since some characters have a special meaning in HTML (<,>,/, etc..) HTML provides a pre-defined set of characters names. These are called 'entities'. • In XML, there are only five built-in character entities: &lt;, &gt;, &amp;, &quot; and &apos; for <, >, &, " and ' respectively. You can define your own entities in a Document Type Definition, or you can use any Unicode character

  9. XML Syntax • All XML documents must begin with an XML declaration <?xml version=“1.1” encoding=“utf-8”?> • XML Names • Must begin with a letter or underscore • Can include digits, hyphens and periods • No length limitations • CaSeSeNsItIvE • Every document defines a single root element. The opening tag of this ‘root’ must be the first line of the document. The ‘root’ is the root node of the document tree.

  10. XML Syntax • An XML document that follows all of these low-level rules is ‘well formed’ <?xml version = "1.0" encoding = "utf-8" ?> <ad> <year> 1960 </year> <make> Cessna </make> <model> Centurian </model> <color> Yellow with white trim </color> <location> <city> Gulfport </city> <state> Mississippi </state> </location> </ad>

  11. XML Syntax • One question that always arises is when to use attributes and when to use a nested element. Issues to consider: • If the information in question could be itself marked up with elements, put it in an element. • If the information is suitable for attribute form, but could end up as multiple attributes of the same name on the same element, use child elements instead. • If the information has a standardized format use an attribute. (Dates, Serial numbers, times, etc…, identifiers) • If the information should not be normalized for white space, use elements. XML processors normalize attributes in ways that can change the raw text of the attribute value.

  12. Examples <player> <number>8</number> <name>Ryan Braun</name> </player> <player number="8"> <name> <first>Ryan</first> <last>Braun</last> </name> </player> <player number="8"> <name>Ryan Braun</name> </player> <player number="8" name="Ryan Braun"/>

  13. XML Document Structure • An XML document often refers to two other files • A document that specifies the structure • A document that specifies the style • The document structure is defined by • DTD • SCHEMA • Although a document may be well-formed, it may not be valid. • Well-formed: conforms to the XML specification. Denotes syntactic correctness. • Valid: conforms to the DTD/SCHEMA. Denotes semantic correctness.

  14. DTD • Document Type Definitions • A set of declarations which specify elements and where these elements can appear • The DTD is not an XML document. A DTD is described in a special DTD language. • The DTD language relies heavily on regular-expressions and BNF-like notation.

  15. DTD • An XML document can refer to a DTD by using the DOCTYPE element. Note that DTD elements begin with a bang. • <!DOCTYPE root_element […]> • root_elementis the name of the documents type • The […] content is the DTD • The DOCTYPE element must • Be placed between the XML declaration and the root element • Name (and define) the root element

  16. DTD • There are four possible DTD elements at the top level: • ELEMENT: declares the name of an element and it’s structure • ATTLIST: declares the attributes of an element • ENTITY: declares an entity • NOTATION: declares a notation

  17. DTD ELEMENTS • What information would you have to give to specify an elements structure? • An element declaration specifies the name of an element and the element’s structure • #PCDATA forms the lower-level character data • General form: • <!ELEMENTelement_name (list of child names)> • Example: • <!ELEMENT memo (from, to, date, re, body)> • The vertical bar can be used to indicate OR • <!ELEMENT contact (mother | father | caregiver)>

  18. DTD ELEMENTS • Child elements can have modifiers, +, *, ? which correspond to regular-expression multiplicities • * denotes zero-or-more occurrences • + denotes one-or-more occurrences • ? denotes zero-or-one occurrence (optional) • Example: <!ELEMENT person (parent+, age, spouse?, sibling*)> • Leaf nodes specify data types: • PCDATA: (parsed character data – entities will be expanded and if tags or markup appear they will be recognized [or parsed]) • CDATA: (character data – entities will not be expanded and if tags or markup appear they will not be recognized) • EMPTY: (no content) • ANY: (can have any content) • Example of a leaf declaration: <!ELEMENT name (#PCDATA)>

  19. DTD Attributes • What information would you have to give to specify an elements attributes? • Attributes are defined by the ATTLIST element • <!ATTLIST elem_nameatt_nameatt_type modifiers default_value> • There are ten att_types, we will use CDATA for now (others include ID, IDREF, IDREFS, ENTITY, ENTITIES…) • Modifiers: • #FIXED: every element has the default value • #REQUIRED: this attribute must be present • #IMPLIED: no default and not required

  20. DTD Attributes • This DTD allows us to interpret the car element • A car is, by default, a 4-door • A car must have an engine_type • A car may have a price • The make of all cars is FORD • Consider • <car year="1992" engine_type="V6"/> • <car make="GMC" doors="2" engine_type="V4" price="1235"/> <!ATTLIST car doors CDATA "4"> <!ATTLIST car engine_type CDATA #REQUIRED> <!ATTLIST car price CDATA #IMPLIED> <!ATTLIST car make CDATA #FIXED "Ford"> <car doors = "2" engine_type = "V8"> ... </car>

  21. DTD Entities • What information would it take to define a new entity? • Recall that when an entity occurs in an XML document that it is simply a textual-replacement. • This is an &lt;example&gt; • Entity declaration syntax: <!ENTITYentity_name"entity_value"> • Example Declaration: <!ENTITY jfk "John Fitzgerald Kennedy"> • Example Use: • &jfk was born in 1917.

  22. General and Parameter Entities • Two types of entities: • General (defined in the previous slide) entities can be used anywhere in the XML document • Parameter entities can be used only in the DTD • Parameter entity syntax: <!ENTITY%entity_name"entity_value"> • Example Declaration: <!ENTITY % abbr "bob | bill | sue | cindy"> • Example Use: • &jfk was born in 1917.

  23. Internal DTD <?xml version="1.0"?><!DOCTYPE note [<!ELEMENT note (to,from,heading,body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>]><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend</body></note>

  24. External DTD <?xml version="1.0"?><!DOCTYPE note SYSTEM "note.dtd"><note>  <to>Tove</to>  <from>Jani</from>  <heading>Reminder</heading>  <body>Don't forget me this weekend!</body></note> <!ELEMENT note (to,from,heading,body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>

  25. <?xml version="1.0"?> <!DOCTYPE BOOK [ <!ELEMENT p (#PCDATA)> <!ELEMENT BOOK (OPENER,SUBTITLE?,INTRODUCTION?,(SECTION | PART)+)> <!ELEMENT OPENER (TITLE_TEXT)*> <!ELEMENT TITLE_TEXT (#PCDATA)> <!ELEMENT SUBTITLE (#PCDATA)> <!ELEMENT INTRODUCTION (HEADER, p+)+> <!ELEMENT PART (HEADER, CHAPTER+)> <!ELEMENT SECTION (HEADER, p+)> <!ELEMENT HEADER (#PCDATA)> <!ELEMENT CHAPTER (CHAPTER_NUMBER, CHAPTER_TEXT)> <!ELEMENT CHAPTER_NUMBER (#PCDATA)> <!ELEMENT CHAPTER_TEXT (p)+> ]> <BOOK> <OPENER> <TITLE_TEXT>All About Me</TITLE_TEXT> </OPENER> <PART> <HEADER>Welcome To My Book</HEADER> <CHAPTER> <CHAPTER_NUMBER>CHAPTER 1</CHAPTER_NUMBER> <CHAPTER_TEXT> <p>Glad you want to hear about me.</p> <p>There's so much to say!</p> <p>Where should we start?</p> <p>How about more about me?</p> </CHAPTER_TEXT> </CHAPTER> </PART> </BOOK>

  26. DTD Expressive limitations <!ELEMENT collection (description,recipe*)> <!ELEMENT description ANY> <!ELEMENT recipe (title,ingredient*,preparation,comment?,nutrition)> <!ELEMENT title (#PCDATA)> <!ELEMENT ingredient (ingredient*,preparation)?> <!ATTLIST ingredient name CDATA #REQUIRED amount CDATA #IMPLIED unit CDATA #IMPLIED> <!ELEMENT preparation (step*)> <!ELEMENT step (#PCDATA)> <!ELEMENT comment (#PCDATA)> <!ELEMENT nutrition EMPTY> <!ATTLIST nutrition protein CDATA #REQUIRED carbohydrates CDATA #REQUIRED fat CDATA #REQUIRED calories CDATA #REQUIRED alcohol CDATA #IMPLIED> • Cannot express that: • Protein, must contain a non-negative number • Unit should only be allowed when amount is present • The comment element should be allowed to appear anywhere • Nested ingredient elements should only be allowed when amount is absent

  27. XML Schema • DTD’s have limitations • Syntax is not XML and requires a dtd-specific parser • Structural logic is not as expressive as sometimes needed • Limited data types • XML Schema • A document that describes the structure of a family of XML documents. • Identical in purpose to a DTD. • The structure of an XML schema document is defined by an XML schema!. The namespace is given as • http://www.w3.org/2001/XMLSchema

  28. Namespaces • An XML document may use tags from multiple tag sets. • What if two tag-sets have a tag that is defined differently in each tag set? When the tag is used, which tag set is being referred to? • Namespaces resolve conflicts by affixing a prefix to the actual tags • A namespace declaration has the form: • <elementNamexmlns[:prefix]=URI> • For example • <gmcarsxmlns:gm=“http://www.gm.com/names”> • The gm prefix is associated with tags in 'http://www.gm.com/names' for the gmcars tag and all of the gmcars content • Can now have XML elements such as • <gm:pontiac doors=“12”/>

  29. Namespaces • Can have multiple namespaces of course • <cars xmlns:gm=“http://www.gm.com/names” • xmlns:ford=“http://www.ford.com/names”> • Can now use elements such as • <gm:LaCrosse doors=“4”/> • <ford:LaCrosse doors=“8”/>

  30. XML Schema • Every schema has ‘schema’ as the root element • This element must specify the schema namespace • Each schema defines a tag set which is named via the targetNamespace attribute • The default tags must be qualified <?xml version="1.0"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://charity.cs.uwlax.edu" elementFormDefault="qualified"></xs:schema>

  31. XML Schema • The root element of a conforming XML document must then specify the namespaces it uses: • The default namespace (parsers now know which tag set this document uses) • The standard instance namespace (parsers now know to validate against a schema rather than a DTD) • The location of the schema (parsers now know which schema to validate against) <?xml version="1.0"?><classroom xmlns="http://charity.cs.uwlax.edu"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://charity.cs.uwlax.edu/classroom.xs"> <room-number>Wing 218<room-number> <capacity>12</capacity> <projector>YES</projector> <karaoke-machine>NO</karaoke-machine></classroom>

  32. XML Schema • An XML Schema can define two types of elements • SIMPLE: elements that are strings without attributes or nested elements • COMPLEX: Of course, these are non-simple • Schemas have 44 defined data types • Primitives: string, boolean, float, base64binary, data, et.. • Derived: byte, decimal, positiveInteger,… • Derived data types • Are those types that are defined with respect to some other type • Users can define their own derived types

  33. Simple Elements • Just like a variable declaration in programming languages defines a name and a type, an XML element is declared by giving the name and type. • <xs:element name="XXXX" type="YYYY"/> • Common built-in type names: • xs:string • xs:decimal • xs:integer • xs:boolean • xs:date • xs:time

  34. Simple Elements • Consider the following schema declarations • <xs:element name="lastname" type="xs:string"/> • <xs:element name="age" type="xs:integer"/> • <xs:element name="birthdate" type="xs:date"/> • An XML document that uses this tag set could contain • <lastname>Xercesanthony</lastname> • <age>32</age> • <birthdate>1983-03-12</birthdate> • Note that an XML document that uses this tag set could not contain • <age>Thirty two</age> • <birthdate>Aug, third, Nineteen eighty three</birthdate>

  35. Simple Elements • Simple elements may have a default value or a fixed value • Default values are automatically assigned if not provided • Fixed values are automatically assigned if not provided and, if specified, cannot be something other than the fixed value. • Schema examples: • <xs:element name="color" type="xs:string" default="red"/> • <xs:element name="pcolor" type="xs:string" fixed="red"/> • Instance examples: • <color>red</color> • <color>green</color> • <pcolor>red</color> • <pcolor>green</color>

  36. Complex Elements • A complex element is an XML element that contains other elements and/or has attributes • There are four kinds of complex elements • empty elements • <sku number="1234"/> • elements that contain only other elements • <name><first>Kenny</first><last>Hunt</last></name> • elements that contain only text • <first>Kenny</first> • elements that contain both elements and text • <chapter><title>Chapter 1</title>This is the story of…</chapter>

  37. Complex Elements • Consider an XML document that contains the following element: • <name><first>Kenny</first><last>Hunt</last></name> • What kind of information would have to be specified in order to define the structure of the "name" element? Would the following elements be valid? • <name><last>Hunt</last><first>Kenny</first></name> • <name><first>Kenny</first></name> • <name><last>Hunt</last><first>Kenny</first><mi>A</mi></name> • <name><first>Kenny</first><first>Kenneth</first><last>Hunt</last></name> • <name verified="yes"><first>Kenny</first><hunt>Hunt</hunt></name> • In order to know for sure which of the above are valid, must be able to define • The allowable children • The order of the children • The multiplicities (occurrences) of the children • The attributes that the element might take • This is done by defining a new type and then defining an element of that type

  38. Complex Types • To define a complex type you must give the type some structure and a name. • The basic syntax for defining a new type is: • <xs:complexType name="new_type_name"> • …. • </xs:complexType>

  39. Complex Types • The allowed children and ordering of the children within an element type is controlled by order indicators • <xs:all>…</xs:all> • An unordered list of elements referred to in the all (there are some significant constraint when using this one) • <xs:sequence>…</xs:sequence> • An ordered list of elements referred to in the sequence • <xs:choice>…</xs:choice> • Any one of the elements referred to in the choice

  40. Complex Types <xs:complexTypename = "nametype"> <xs:sequence> <xs:elementname = "first"type = "xs:string" /> <xs:elementname = "last"type = "xs:string" /> </xs:sequence> </xs:complexType> <xs:complexTypename = "nametype"> <xs:choice> <xs:elementname = "first"type = "xs:string" /> <xs:elementname = "last"type = "xs:string" /> </xs:choice> </xs:complexType> <xs:complexTypename = "nametype"> <xs:all> <xs:elementname = "first"type = "xs:string" /> <xs:elementname = "last"type = "xs:string" /> </xs:all> </xs:complexType>

  41. Complex Elements • Note that we haven't defined a complex element but a complex type. • We can now declare elements of that type. The element declaration can take on two forms. • The type definition is a child element of the element definition • The type definition is referenced by an attribute of the element definition.

  42. Complex Elements <xs:element name="name"> <xs:complexType> <xs:sequence> <xs:elementname = "first"type = "xs:string" /> <xs:elementname = "last"type = "xs:string" /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="name" type="nametype"/> <xs:complexType name="nametype"> <xs:sequence> <xs:elementname = "first"type = "xs:string" /> <xs:elementname = "last"type = "xs:string" /> </xs:sequence> </xs:complexType>

  43. Occurrence Indicator • The number of times an element can occur is constrained by occurrence indicator attributes. • minOccurs: • gives the minimum number of occurrences. Defaults to 1. • must be a non-negative integer • maxOccurs: • gives the maximum number of occurrences. Defaults to 1. • must be a non-negative integer or 'unbounded' <xs:elementname="name"> <xs:complexType> <xs:sequence> <xs:element name="first" type="xs:string" minOccurs="1" maxOccurs="1" /> <xs:element name="last" type="xs:string" minOccurs="1" maxOccurs="1" /> </xs:sequence> </xs:complexType> </xs:element>

  44. Occurrence Indicator • Consider these variants. What is their interpretation? <xs:elementname="name"> <xs:complexType> <xs:sequence> <xs:element name="first" type="xs:string" minOccurs="1" maxOccurs="unbounded" /> <xs:element name="last" type="xs:string" minOccurs="1" maxOccurs="1" /> </xs:sequence> </xs:complexType> </xs:element> <xs:elementname="name"> <xs:complexType> <xs:sequenceminOccurs="2" maxOccurs="unbounded"> <xs:element name="first" type="xs:string" minOccurs="1" maxOccurs="unbounded" /> <xs:element name="last" type="xs:string" minOccurs="1" maxOccurs="1" /> </xs:sequence> </xs:complexType> </xs:element>

  45. Complex Elements • Consider making an element 'extensible' • The xs:any element allows an element of any type to appear. • This serves as a placeholder into which users of the tag-set can place data of their own choosing. <xs:elementname="name"> <xs:complexType> <xs:sequence> <xs:element name="first" type="xs:string" minOccurs="1" maxOccurs="1" /> <xs:element name="last" type="xs:string" minOccurs="1" maxOccurs="1" /> <xs:anyminOccurs="0"> </xs:sequence> </xs:complexType> </xs:element> <name><first>Kenny</first><last>Hunt</last><mi>A</mi></name> <name><first>Kenny</first><last>Hunt</last><alias>Kenneth</alias></name>

  46. Mixed Types • What if you wanted to specify an xml document that looked like: • This doc has text and elements as children. This is known as a 'mixed' type. <letter> Dear Mr.<name>John Smith</name>. Your order <orderid>1032</orderid> will be shipped on <shipdate>2001-07-13</shipdate>. </letter> <xs:element name="letter"> <xs:complexType mixed="true"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="orderid" type="xs:positiveInteger"/> <xs:element name="shipdate" type="xs:date"/> </xs:sequence> </xs:complexType> </xs:element>

  47. Element Attributes • The syntax for defining an attribute is nearly identical to the syntax for defining a simple element • <xs:attribute name="XXX" type="YYY"/> • Recall that simple elements can't have attributes • Attributes can also have default or fixed values. • Use the default attribute of xs:attribute • Use the fixed attribute of xs:attribute • Attributes can also be required • Use the use attribute of xs:attribute. Values are "optional" and "required" and "prohibited".

  48. Element Attributes • Examples: • <xs:attribute name="verified" type="xs:boolean"/> • <xs:attribute name="expiration" type="xs:date"/> • <xs:attribute name="verified" type="xs:boolean" use="required"/> • <xs:attribute name="verified" type="xs:boolean" use="required" default="false"/> • <xs:elementname="name"> • <xs:complexType> • <xs:attribute name="verified" type="xs:boolean" use="required" default="false"/> • <xs:sequence> • <xs:element name="first" type="xs:string" minOccurs="1" maxOccurs="1" /> • <xs:element name="last" type="xs:string" minOccurs="1" maxOccurs="1" /> • <xs:anyminOccurs="0"> • </xs:sequence> • </xs:complexType> • </xs:element>

  49. XML Datatypes • There are many pre-defined data types • Derivative types can be formed by • Placing restrictions on the allowed values of another type • Listing values from another type • Building the union of values from other types • Data types have properties or “Facets” • Fundamental Facets: ordered, bounded, cardinality, numeric • Constraining Facets: length, minLength, maxLength, pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minExclusive, minInclusive, totalDigits, fractionDigits, maxScale, minScale, Assertions, explicitTimeZone

More Related