1 / 100

Chapter 29

Chapter 29. Semistructured Data and XML Transparencies. Chapter - Objectives. What semistructured data is . Concepts of the Object Exchange Model (OEM), a model for semistructured data . Basics of Lore, a semistructured DBMS, and its query language, Lorel . Main language elements of XML.

donovan
Download Presentation

Chapter 29

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 29 Semistructured Data and XML Transparencies

  2. Chapter - Objectives • What semistructured data is. • Concepts of the Object Exchange Model (OEM), a model for semistructured data. • Basics of Lore, a semistructured DBMS, and its query language, Lorel. • Main language elements of XML. • Difference between well-formed and valid XML documents. • How Document Type Definitions (DTDs) can be used to define the valid syntax of an XML document.

  3. Chapter - Objectives • How Document Object Model (DOM) compares with OEM. • About other related XML technologies. • Limitations of DTDs and how the W3C XML Schema overcomes these limitations. • How RDF and RDF Schema provide a foundation for processing meta-data.

  4. DTD: XML Names and NMTOKEN • Name Characters are letters, digits, hyphens, underscores, colons or full stops. • An NMTOKEN is any collection of Name Characters • NMTOKENSis any list ofNMTOKEN’s separated bywhite space(space, tab, newline etc.) • Case is significant: PERSON and person are distinct names • Attribute and Elementnames must be (a subset of) NMTOKEN with restriction • Names cannot begin with a digit • Names cannot begin with xml (or any variant gotten by case changes) – system will use this prefix

  5. Element Declarations: EMPTY • Keyword ELEMENT Introduces a new element<!ELEMENT NAME CONTENT_MODEL> • Element name must begin with a letter, and may additionally contain digits and some punctuations, i.e. ‘.’, ‘-’, ‘_’, and ‘:’ as we described earlier under NMTOKEN • If an element can hold no child elements, and also no text, then it is known as empty element and denoted by EMPTY for CONTENT_MODEL • This seems trivial but it isn’t because the present or absence of this element in an XML file can be used as a flag • As an example we can find several in HTML such as HR and IMG which never have children and include no text. Here we would write<!ELEMENT HR EMPTY>and then<HR/>or <HR></HR>generates a horizontal line • EMPTY ELEMENTScan have attributes such as theSRCattribute in<IMG/>to specify source of image.

  6. Element Declarations: ANY • An element declared to have a content of ANY may contain all of the other elements declared in theDTD • This is not quite the same as no DTD for the file <!DOCTYPE fred [<!ELEMENT fred ANY >]> <fred> <people>Me and You</people> <people>Them</people></fred> • Gets an error due to presence of<people>tag • Adding<!ELEMENT people ANY >inside DTD declaration produces a valid document.

  7. Entities • The DTD of an XML document can contain entity declarations. These are like macro substitutions in other languages. • ENTITY’s are defined in DTD and consist of several flavors: • General Entities are referenced as &EntName; • Parameter Entities are referenced as %Entname; • We have already seen the character entities • &amp; for & • &apos; for ‘ • &gt; for > • &lt; for < • &quot; for “ • These are built in but you could add other such entities with • <!ENTITY aitself “A” > and &aitself; would be replaced by A

  8. General Entities • As another example, we can use in DTD<!ENTITY TODAY “May 12 2003” > and<comment>&TODAY; was very quiet in Irvine</comment>is parsed as<comment>May 12 2003 was very quiet in Irvine</comment> • General Entity references can be nested inside a DTD, e.g., one can write<!ENTITY YEAR “2003” > <!ENTITY TODAY “May 12 &YEAR;” > • However one must use Parameter Entities and not General Entities for macro substitution in other DTD declarations like <!ATTLIST and <!ELEMENT • Parameter entities are defined as in<!ENTITY % CUSTARDTAGS “(NAME,DATE,ORDERS)” >

  9. Parameter Entities • <!ENTITY %peopletags “(firstname,lastname,dateofbirth)” ><!ELEMENT student %peopletags; > <!ELEMENT teacher %peopletags; > <!ELEMENT administrator %peopletags; > • Defines a bunch of people ELEMENTS to have the same child elements • Parameter entities are even more commonly used for attributes because almost always several ELEMENTS share the same attributes (with often a basic set being augmented in different ways for different ELEMENTS) • This basic set can be set in a parameter Entity

  10. Defining Implied Attributes • Attributes must be declared in the DTD to be able to be used • “Implied” means that this attribute optional and there is no default value • <!ELEMENT population (#PCDATA)> <!ATTLIST population year CDATA #IMPLIED> • The attribute year can be defined or undefined in the element population. Valid Examples: • <population year=“2000”>80</population> • <population>80</population>

  11. Defining Required Attributes • <!ELEMENT population (#PCDATA)> <!ATTLIST population year #REQUIRED> • The population must contain a year attribute: <population year=“1996”>80</population> • <!ELEMENT population (#PCDATA)> <!ATTLIST population year (2000|2001) #REQUIRED> • The population must contain a year attribute of 2000 or 2001 <population year=“2000”>80</population> • No quotes on the enumeration values

  12. Defining Default Attributes • <!ELEMENT population (#PCDATA)> <!ATTLIST population year CDATA “2000”> • All these are valid • <population year=“2001”>80</population> • <population year=“2000”>80</population> • <population>80</population>

  13. Defining Fixed Attributes • <!ELEMENT population (#PCDATA)> <!ATTLIST population year CDATA #FIXED “2000”> • Invalid <population year=“2001”>80</population> • Valid <population year=“2000”>80</population> • Valid <population>80</population>

  14. Defining Unique Attributes • <!ELEMENT animal (name)> <!ATTLIST animal code ID #REQUIRED> • The code attribute has to be unique in the XML document • <animal code=“T50”><name>Lion</name> </animal> <animal code=“T51”><name>Rabbit</name> </animal>

  15. Referring Unique Attributes • <!ELEMENT website (url)> <!ATTLIST website animal_refer IDREF #REQUIRED> • animal_refer attribute refers to previous ID attribute defined • <website animal_refer=“T50”> <url>http://www.lions.com</url> </website>

  16. Referring Multiple Unique Attributes • <!ELEMENT website (url)> <!ATTLIST website contents IDREFS #REQUIRED> • contents attribute contain series of IDs • <website contents=“T50 T51”> <url>http://www.animals.com</url> </website>

  17. XML Example - the DTD <!ELEMENT addressBook (person)+> <!ELEMENT person (name, email*, link?) > <!ATTLIST person id ID #REQUIRED > <!ATTLIST person gender (male|female) #IMPLIED> <!ELEMENT name (#PCDATA|(family,given))> <!ELEMENT family (#PCDATA)> <!ELEMENT given (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ELEMENT link EMPTY ><!ATTLIST link manager IDREF #IMPLIED subordinates IDREF #IMPLIED>

  18. DOCTYPE declarations • Internal: local definition of DTD • External: to an external file • Can combine both

  19. Internal DTD <?xml version="1.0" standalone="yes" ?> <!--open the DOCTYPE declaration - the open square bracket indicates an internal DTD--> <!DOCTYPE foo [ <!--define the internal DTD--> <!ELEMENT foo (#PCDATA)> <!--close the DOCTYPE declaration--> ]> <foo>Hello World.</foo>

  20. Internal DTD: rules • The document type declaration must be placed between the XML declaration and the first element (root element) in the document . • The keyword DOCTYPE must be followed by the name of the root element in the XML document . • The keyword DOCTYPE must be in upper case .

  21. External DTD • Useful for creating a common DTD that can be shared between multiple documents. • Any changes that are made to the external DTD automatically updates all the documents that reference it. • Two types: private, and public. • Rules: • If any elements, attributes, or entities are used in the XML document that are referenced or defined in an external DTD, standalone="no" must be included in the XML declaration .

  22. "Private" External DTDs • Identified by the keyword SYSTEM • Intended for use by a single author or group of authors. • Example: <!DOCTYPE root_element SYSTEM "DTD_location"> where: DTD_location is relative or absolute URL (such as “http:/” and “file:/”).

  23. "Private" External DTDs (cont) XML document: <?xml version="1.0" standalone="no" ?> <!DOCTYPE document SYSTEM "subjects.dtd"> <document> … </document> subjects.dtd: <!ELEMENT document …> …

  24. “Public" External DTDs • Identified by the keyword PUBLIC • Intended for broad use. <!DOCTYPE root_element PUBLIC "DTD_name" "DTD_location"> where: • DTD_location: relative or absolute URL • DTD_name: follows the syntax: "prefix//owner_of_the_DTD// description_of_the_DTD//ISO 639_language_identifier“ • "DTD_location" is used to find the public DTD if it cannot be located by the "DTD_name".

  25. “Public" External DTDs (cont) <?xml version="1.0" standalone="no" ?> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <HTML> <HEAD> <TITLE>A typical HTML file</TITLE> </HEAD> <BODY> … </BODY> </HTML>

  26. “Public" External DTDs (cont) Valid DTD_name Prefix: ISO :The DTD is an ISO standard. All ISO standards are approved. + : The DTD is an approved non-ISO standard. - : The DTD is an unapproved non-ISO standard.

  27. Combining Internal and External DTDs • A document can use both internal and external DTD subsets. • The internal DTD subset is specified between the square brackets of the DOCTYPE declaration. • The declaration for the external DTD subset is placed before the square brackets immediately after the SYSTEM keyword. • Declaring an ELEMENT with the same name in both the internal and external DTD subsets is invalid

  28. Example <?xml version="1.0" standalone="no" ?> <!DOCTYPE document SYSTEM "subjects.dtd" [ <!ATTLIST assessment assessment_type (exam | assignment | prac)> <!ELEMENT results (#PCDATA)> ]> subjects.dtd <!ELEMENT document (title*,subjectID,subjectname,prerequisite?, classes,assessment,syllabus,textbooks*)> <!ELEMENT prerequisite (subjectID,subjectname)> …

  29. DTD Validation • A XML content can be well-formed but invalid under DTD rules • e.g. DTD rule: <!ELEMENT name (#PCDATA)> • Acceptable: <name> Giancarlo Succi </name> • Unacceptable: <name> <first_name> Giancarlo </first_name> <last_name> Succi </last_name> </name>

  30. Beyond DTDs… • DTD limitations • Simple document structures • Lack of “real” datatypes • Advanced schema languages • XML Schema • Relax NG • …

  31. Limitations of DTDs • No typing of text elements and attributes • All values are strings, no integers, reals, etc. • Difficult to specify unordered sets of subelements • Order is usually irrelevant in databases • (A | B)* allows specification of an unordered set, but • Cannot ensure that each of A and B occurs only once • IDs and IDREFs are untyped • The owners attribute of an account may contain a reference to another account, which is meaningless • owners attribute should ideally be constrained to refer to customer elements

  32. XML Schema • XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs. Supports • Typing of values • E.g. integer, string, etc • Also, constraints on min/max values • User defined types • Is itself specified in XML syntax, unlike DTDs • More standard representation, but verbose • Is integrated with namespaces • Many more features • List types, uniqueness and foreign key constraints, inheritance .. • BUT: significantly more complicated than DTDs.

  33. XML Schema – Simple Types • Elements that do not contain other elements or attributes are of type simpleType. <xsd:element name=“STAFFNO” type = “xsd:string”/> <xsd:element name=“DOB” type = “xsd:date”/> <xsd:element name=“SALARY” type = “xsd:decimal”/> • Attributes must be defined last: <xsd:attribute name=“branchNo” type = “xsd:string”/>

  34. XML Schema – Complex Types • Elements that contain other elements are of type complexType. • List of children of complex type are described by sequence element. <xsd:element name = “STAFFLIST”> <xsd:complexType> <xsd:sequence> <!-- children defined here --> </xsd:sequence> </xsd:complexType> </xsd:element>

  35. Cardinality • Cardinality of an element can be represented using attributes minOccurs and maxOccurs. • To represent an optional element, set minOccurs to 0; to indicate there is no maximum number of occurrences, set maxOccurs to “unbounded”. <xsd:element name=“DOB” type=“xsd:date” minOccurs = “0”/> <xsd:element name=“NOK” type=“xsd:string” minOccurs = “0” maxOccurs = “3”/>

  36. References • Can use references to elements and attribute definitions. <xsd:element name=“STAFFNO” type=“xsd:string”/> …. <xsd:element ref = “STAFFNO”/> • If there are many references to STAFFNO, use of references will place definition in one place and improve the maintainability of the schema.

  37. Defining New Types • Can also define new data types to create elements and attributes. <xsd:simpleType name = “STAFFNOTYPE”> <xsd:restriction base = “xsd:string”> <xsd:maxLength value = “5”/> </xsd:restriction> </xsd:simpleType> • New type has been defined as a restriction of string (to have maximum length of 5 characters).

  38. Groups • Can define both groups of elements and groups of attributes. Group is not a data type but acts as a container holding a set of elements or attributes. <xsd:group name = “StaffType”> <xsd:sequence> <xsd:element name=“StaffNo” type=“StaffNoType”/> <xsd:element name=“Position” type=“PositionType”/> <xsd:element name=“DOB” type =“xsd:date”/> <xsd:element name=“Salary” type=“xsd:decimal”/> </xsd:sequence> </xsd:group>

  39. Constraints • XML Schema provides XPath-based features for specifying uniqueness constraints and corresponding reference constraints that will hold within a certain scope. <xsd:unique name = “NAMEDOBUNIQUE”> <xsd:selector xpath = “STAFF”/> <xsd:field xpath = “NAME/LNAME”/> <xsd:field xpath = “DOB”/> </xsd:unique>

  40. <xsd:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema> <xsd:element name=“bank” type=“BankType”/> <xsd:element name=“account”><xsd:complexType> <xsd:sequence> <xsd:element name=“account-number” type=“xsd:string”/> <xsd:element name=“branch-name” type=“xsd:string”/> <xsd:element name=“balance” type=“xsd:decimal”/> </xsd:squence></xsd:complexType> </xsd:element> ….. definitions of customer and depositor …. <xsd:complexType name=“BankType”><xsd:squence> <xsd:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/> <xsd:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/> <xsd:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/> </xsd:sequence> </xsd:complexType> </xsd:schema> XML Schema Version of Bank

  41. References http://www.java.sun.com/xml/docs/tutorial/TOC.html http://www.xml.com/pub/a/1999/09/expat/index.html http://xmlfiles.com/dtd/dtd_attributes.asp http://xmlwriter.net/xml_guide/doctype_declaration.shtml

  42. What is an XML Parsing API? • Programming model for accessing an XML document • Sits on top of an XML parsing engine • Language/platform independent

  43. Java XML Parsing Specification • The Java XML Parsing Specification is a request to include a standardised way of parsing XML into the Java standard library • The specification defines the following packages: • javax.xml.parsers • org.xml.sax • org.xml.sax.helpers • org.w3c.dom • The first is an all-new plugability layer, the others come from existing packages

  44. Two ways of using XML parsers: SAX and DOM • The Java XML Parsing Specification specifies two interfaces for XML parsers: • Simple API for XML (SAX) is a flat, event-driven parser • Document Object Model (DOM) is an object-oriented parser which translates the XML document into a Java Object hierarchy

  45. SAX • Simple API for XML • Event-based XML parsing API • Not governed by any standards body • Guy named David Megginson basically owns it… • SAX is simply a programming model that the developers of individual XML parsers implement • SAX parser written in Java would expose the equivalent events • "serial access" protocol for XML

  46. SAX (cont) • A SAX parser reads the XML document as a stream of XML tags: • starting elements, ending elements, text sections, etc. • Every time the parser encounters an XML tag it calls a method in its HandlerBase object to deal with the tag. • The HandlerBase object is usually written by the application programmer. • The HandlerBase object is given as a parameter to the parse() method in the SAX parser. It includes all the code that defines what the XML tags actually ”do”.

  47. endElement & endDocument endElement startElement & characters startElement & characters startElement endElement startElement & characters startElement & characters startElement startDocument How Does SAX work? XML Document SAX Objects <?xml version=“1.0”?> Parser <addressbook> </addressbook> Parser <person> </person> <name>John Doe</name> Parser <email>jdoe@yahoo.com</email> Parser Parser <person> </person> Parser <name>Jane Doe</name> Parser Parser <email>jdoe@mail.com</email> Parser Parser

  48. SAX structure

  49. SAX tutorial http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/sax/index.html Notes: some files are at http://www.ics.uci.edu/~ics185/handouts/slides13-sax/

  50. More info about SAX • Read the tutorial http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/sax/index.html

More Related