930 likes | 1.07k Views
XML & XML Query. Ling Wang Luping Ding. Introduction. The Web opens a new challenges in: - information technology - database framework. Why? - Data sources on the Web do NOT typically conform to any well-known structure.
E N D
XML & XML Query Ling Wang Luping Ding XML & XML Query
Introduction • The Web opens a new challenges in: • - information technology • - database framework. • Why? • - Data sources on the Web do NOTtypically conform to any well-known structure. • - Traditional databases technology is not adequate in dealing with rich data: • eg: audio, video, nested data structures … XML & XML Query
Features of Web Data • Web data characteristics, called semistructured: • Object-like • a collection of complex objects from CODM. • Schema-less • Not typically conform to any type traditional structure. • Self-describing • meaning of the data is carried along with the data itself. • So, we need new database technologies to support those Web-based applications. XML & XML Query
What is XML? • XML---- Extensible Markup Language • - A mark up language for documents containing structured information. • - Universal format for structured documents and data on the Web. • - An HTML-like language. • XML specification defines a standard way to add markup to documents. • Note: Structured information , Markup language XML & XML Query
What is XML ---- example A XML example for customer information: <customer-details id="AcPharm39156"> <name>Acme Pharmaceuticals Co.</name> <address country="US"> <street>7301 Smokey Boulevard</street> <city>Smallville</city> <state>Indiana</state> <postal>94571</postal> </address> </customer-details> XML & XML Query
XML vs. HTML? XML & XML Query
Overview of XML • Mechanisms for specifying document structure: • ---- a set of rules for structuring an XML document. • DTD ---- Document type definition language • (A part of XML standard ) • XML Schema ---- A more recent specification • Query languages for XML: • XPath , XSLT, XQuery XML & XML Query
Attribute Value name Basic concept in XML ---- element & attributes • XML element • Any properly nested piece of text of the form • <sometag>…</sometag>. • eg: <street>7301 Smokey Boulevard</street> • XML Attributes • also a tools for datapresentation. • eg: <customer-details id="AcPharm39156"> </customer-details> content name XML & XML Query
Basic concept in XML ---- namespace • Namespaces • - Why? • Element names in XML are not fixed, name conflict. • - How? • Different authors use different namespace identifiers for different domains. • The general structure “namespace:local-name” • Namespace ---- URI (uniform resource identifier): URL (uniform resource locator) or URN (universal resource name). • Local name ---- same form as regular XML tags. • No a “:” in it. XML & XML Query
Basic concept in XML ---- namespace • An example of Namespaces : • <item xmlns="http://www.acmeinc.com/jp#supplies"> • xmlns:toy=“http://www.acmeinc.com/jp#toys”> • <name>African Coffee Table</name> • <feature> • <toy:item> • <toy:name>cyberpet</toy:name> • </toy:item> • </feature> • </item> default namespace XML & XML Query
DTD ---- Document Type Definitions • Why DTD? • - XML files carry a description of its own format with it. • - Independent groups of people can agree with interchanging data. • - Application verify received data from the outside world • - Also verify own data. • How? • - DTD is included in your XML source file • <!DOCTYPE root-element [element-declarations]> • - DTD is external to your XML source file • <!DOCTYPE root-element SYSTEM "filename"> XML & XML Query
DTD ---- example • Example XML document with a DTD: • <?xml version="1.0"?> • <!DOCTYPE note [ • <!ELEMENT note (to,from,heading,body)> • <!ELEMENT to (#PCDATA)> • <!ELEMENT from (#PCDATA)> • <!ELEMENT heading (#PCDATA)> • <!ELEMENT body (#PCDATA)> • ]> • <note> • <to>Tove</to> • <from>Jani</from> • <heading>Reminder</heading> • <body>Don't forget me this weekend</body> • </note> XML & XML Query
DTD ---- example XML document with an external DTD: <?xml version="1.0"?> <!DOCTYPE note SYSTEM "note.dtd"> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> "note.dtd" containing the DTD: <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> XML & XML Query
DTD ---- Inadequacy • Inadequacy of DTD: • - Not designed with namespaces.. • - Use syntax ---- quite different from XML document. • - A very limited set of basic types • - Provide only limited means for expressing data consistency constraints. • No keys • Referential integrity is weak: • Attributes can be type ID, IDREF, IDREFS. • No for element. XML & XML Query
DTD ---- Inadequacy • Inadequacy of DTD: • - No ways of enforcing referential integrity for elements. • - Use alternatives to state that the order of elements is immaterial. Terrible as the number of attributes grows. • - Element definitions are global to the entire document. XML & XML Query
XML Schema • XML Schemas • An attempt to solve all those problems in DTD • - Powerful data typing • - Range checking • - Namespace-aware validation based on namespace URIs rather than on prefixes • - Extensibility and scalability XML & XML Query
XML Schema ---- example • Here is a simple example about XML Schema: • <?xml version="1.0"?> • <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> • <xsd:element name="SONG" type="SongType"/> • <xsd:complexType name="SongType"> • <xsd:sequence> • <xsd:element name="TITLE" type="xsd:string"/> • <xsd:element name="COMPOSER" type="xsd:string"/> • <xsd:element name="PRODUCER" type="xsd:string"/> • <xsd:element name="PUBLISHER" type="xsd:string"/> • <xsd:element name="LENGTH" type="xsd:string"/> • <xsd:element name="YEAR" type="xsd:string"/> • <xsd:element name="ARTIST" type="xsd:string"/> • <xsd:element name="PRICE" type="xsd:string"/> • </xsd:sequence> • </xsd:complexType> • </xsd:schema> XML & XML Query
XML Schema ---- example • The root element ---- “schema”. • Default namespace ---- http://www.w3.org/2001/XMLSchema with prefix xsd or xs. • Elements ---- xsd:element. • divided into simple type and complex type. • simple type element is one that can only contain text and does not have any attributes. It cannot contain any child elements. • Syntax: <xs:element name="name" type="type"/> • Examples: <xs:element name="to" type="xs:string"/> XML & XML Query
XML Schema ---- example Complex type define a new type which can have attributes and can have child elements. This is very flexible. Syntax: <xs:element name="name"> <xs:complexType> . element content </xs:complexType> </xs:element> Example: <xs:element name="note"> <xs:complexType> <xs:sequence> <element name="to" type="xs:string"/> <element name="from" type="xs:string"/> <element name="heading" type="xs:string"/> <element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> XML & XML Query
XML Schema ---- features • Simple Types • - 44 built-in simple types in the W3C XML Schema language. • - Divided into seven groups: • Numeric types • Time types • XML types • String types • The boolean type • The URI reference type • The binary types XML & XML Query
XML Schema ---- features • Deriving Simple Types • Not limited to the 44 simple types • Create new data types by deriving from the existing types • restrict a type to a subset of its normal values. • eg: A schema that derives a Str255 data type from xsd:string • <xsd:simpleType name="Str255"> • <xsd:restriction base="xsd:string"> • <xsd:minLength value="1"/> • <xsd:maxLength value="255"/> • </xsd:restriction> • </xsd:simpleType> XML & XML Query
XML Schema ---- features • create enumerated types • Example: • <xsd:simpleType name="PublisherType"> • <xsd:restriction base="xsd:string"> • <xsd:enumeration value="Warner-Elektra-Atlantic"/> • <xsd:enumeration value="Universal Music Group"/> • <xsd:enumeration value="Sony Music Entertainment,Inc."/> • <xsd:enumeration value="Capitol Records, Inc."/> • <xsd:enumeration value="BMG Music"/> • </xsd:restriction> • </xsd:simpleType> XML & XML Query
XML Schema ---- features • create new types by join existing types through a union. • Example: • <xsd:simpleType name="MoneyOrDecimal"> • <xsd:union> • <xsd:simpleType> • <xsd:restriction base="xsd:decimal"> • </xsd:restriction> • </xsd:simpleType> • <xsd:simpleType> • <xsd:restriction base="xsd:string"> • <xsd:pattern value="\p{Sc}\p{Nd}+(\.\p{Nd}\p{Nd})?"/> • </xsd:restriction> • </xsd:simpleType> • </xsd:union> • </xsd:simpleType> XML & XML Query
XML Schema ---- features • Namespaces • http://www.w3.org/2001/XMLSchema • the namespace that identifies the names of tags and attributes used in a schema. • The name is understood by all schema aware XML processors. • http://www.w3.org/2001/XMLSchema-instance • a small number of special names used in instance documents, not schema. • - target namespace • the set of names defined by a particular schema document • the user-defined names that are to be used in the instance documents. XML & XML Query
XML Schema ---- features • Grouping • - Does order really mattered? ? • - How? • xsd:all group ---- each element in the group must occur at most once, but that order is not important. • xsd:choice group ---- any one element from the group should appear. • xsd:sequence group ---- each element in the group appear exactly once, in the specified order. XML & XML Query
XML Schema ---- features Example for xsd:all group <xsd:complexType name="PersonType"> <xsd:sequence> <xsd:element name="NAME"> <xsd:complexType> <xsd:all> <xsd:element name="GIVEN" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="FAMILY" type="xsd:string" minOccurs="1" maxOccurs="1"/> </xsd:all> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> XML & XML Query
XML Schema ---- features Example for XML Choice group: <xsd:complexType name="SongType"> <xsd:sequence> <xsd:element name="TITLE" type="xsd:string"/> <xsd:choice> <xsd:element name="COMPOSER" type="PersonType"/> <xsd:element name="PRODUCER" type="PersonType"/> </xsd:choice> <xsd:element name="PUBLISHER" type="xsd:string" minOccurs="0"/> <xsd:element name="LENGTH" type="xsd:string"/> <xsd:element name="YEAR" type="xsd:string"/> <xsd:element name="ARTIST" type="xsd:string" maxOccurs="unbounded"/> <xsd:element name="PRICE" type="xsd:string" minOccurs="0"/> </xsd:sequence> </xsd:complexType> XML & XML Query
XML Schema ---- features • Schemas address limitations of DTDs: • a strange, non-XML syntax • namespace incompatibility • lack of data typing • limited extensibility and scalability. • XML Schemas • - Powerful data typing • - Range checking • - Namespace-aware validation based on namespace URIs rather than on prefixes • - Extensibility and scalability XML & XML Query
XML Constrains ---- DTD • DTD • No keys, its Referential integrity is weak • Attributes :ID, IDREF, IDREFS. • ID ---- Unique value • IDREF ---- Valid ID declared in same document IDREFS ---- Valid ID, space-separated • But these are also based on type string. • Element: no corresponding parts. XML & XML Query
XML Constrains ---- Schema • XML keys: • Similar with SQL, but complicated. • - complex structures • - a key might be composed of a sequence of values • - located at different depths inside an element. • Two ways: • - tag unique ---- UNIQUE constraint • - tag key ---- PRIMARY KEY , not null • eg: <key name=“PrimaryKeyForClass”> • <selector xpath=“Classes/Class”/> • <field xpath=“CrsCode”/> • <field xpath=“Semester”/> • </key> XML & XML Query
XML Constrains ---- Schema • Foreign keys: • eg: <complexType> • …… • <keyref name=“NoBogusTranscripts” refer=“adm:PrimaryKeyForClass”> • <selector xpath=“Students/Student/CrsTaken”/> • <field xpath=“@CrsCode”/> • <field xpath=“@Semester”/> • </keyref> • … … • </complexType> • Powerful? XML & XML Query
Question • Is XML data model relational or object-relational? • Is XML a database? XML & XML Query
References [1] Chapter 17, XML and Web Data [2] Chapter 24, XML Bible (2nd edition): Schemas http://www.ibiblio.org/xml/books/bible2/index.html#toc [3] http://www.w3schools.com http://www.w3.org/ http://www.xml.com/ XML & XML Query
Part II • XML Query Language • Counterpart of SQL in XML World XML & XML Query
XML Query Language • Desired Characteristics for XML Query Language - also Requirements • Good candidate: XQuery Language • Use Cases for XQuery Language XML & XML Query
Desired Characteristics • XML Output • Declarative - what has to be done? • Query Operation • No Schema Required • Preserve Order and Association • Mutually Embedding with XML • Support for New Datatypes • Suitable for Metadata • Ability to add update capabilities in future versions XML & XML Query
Details • XML Output • define derived database (virtual views) • provide transparency to application (why?) • The XML Query Language MUST be declarative - like SQL • specifies what has to be done • it MUST not enforce a particular evaluation strategy XML & XML Query
Details (cont.) • Query Operation • Projection, selection, join, and restructuring should all be possible in a single XML Query (why?) • for optimization reason XML & XML Query
Query Operations XML & XML Query
Example - Sample Data • <bib> • <book year="1999" isbn="1-55860-622-X"> • <title>Data on the Web</title> • <author>Abiteboul</author> • <author>Buneman</author> • <author>Suciu</author> • </book> • <book year="2001" isbn="1-XXXXX-YYY-Z"> • <title>XML Query</title> • <author>Fernandez</author> • <author>Suciu</author> • </book> • </bib> XML & XML Query
Example - XML Schema • <xs:group name="Bib"> • <xs:element name="bib"> • <xs:complexType> • <xs:group ref="Book" • minOccurs="0" maxOccurs="unbounded"/> • </xs:complexType> • </xs:element> • </xs:group> XML & XML Query
Example - XML Schema (Cont.) • <xs:group name="Book"> • <xs:element name="book"> • <xs:complexType> • <xs:attribute name="year" type="xs:integer"/> • <xs:attribute name="isbn" type="xs:string"/> • <xs:element name="title" type="xs:string"/> • <xs:element name="author"type="xs:string" maxOccurs="unbounded"/> • </xs:complexType> • </xs:element> • </xs:group> XML & XML Query
Variable Binding • LET $bib0 := • <bib> • <book year="1999" isbn="1-55860-622-X"> • <title>Data on the Web</title> • <author>Abiteboul</author> • <author>Buneman</author> • <author>Suciu</author> • </book> • <book year="2001" isbn="1-XXXXX-YYY-Z"> • <title>XML Query</title> • <author>Fernandez</author> • <author>Suciu</author> • </book>), • </bib> XML & XML Query
Projection • $bib0/book/author • ==> <author>Abiteboul</author>, • <author>Buneman</author>, • <author>Suciu</author>, • <author>Fernandez</author>, • <author>Suciu</author> • Notes: the document order of author elements is preserved XML & XML Query
Selection • FOR $b IN $bib0/book • WHERE $b/@year/data() <= 2000 • RETURN $b • ==> <book year="1999" isbn="1-55860-622-X"> • <title>Data on the Web</title> • <author>Abiteboul</author> • <author>Buneman</author> • <author>Suciu</author> • </book> XML & XML Query
Join - Sample Data • LET $review0 := • <reviews> • <book> • <title>XML Query</title> • <review>A darn fine book.</review> • </book>, • <book> • <title>Data on the Web</title> • <review>This is great!</review> • </book> • </review> : Reviews XML & XML Query
Join • FOR $b IN $bib0/book, $r IN $review0/book • WHERE $b/title/data() = $r/title/data() • RETURN <book>{ $b/title, $b/author, $r/review }</book> • ==> <book> • <title>Data on the Web</title> • <author>Abiteboul</author> • <author>Buneman</author> • <author>Suciu</author> • <review>A darn fine book.</review> • </book>, • <book> • <title>XML Query</title> • <author>Fernandez</author> • <author>Suciu</author> • <review>This is great!</review> • </book> XML & XML Query
Restructuring • FOR $a IN distinct-value($bib0/book/author/data()) RETURN • <biblio> • <author>{ $a }</author> • { FOR $b IN $bib0/book, $a2 IN $b/author/data() • WHERE $a = $a2 RETURN • $b/title • } • </biblio> XML & XML Query
Restructuring (Cont.) • ==> <biblio> • <author>Abiteboul</author> • <title>Data on the Web</title> • </biblio>, • <biblio> • <author>Buneman</author> • <title>Data on the Web</title> • </biblio>, • <biblio> • <author>Suciu</author> • <title>Data on the Web</title> • <title>XML Query</title> • </biblio>, • <biblio> • <author>Fernandez</author> • <title>XML Query</title> • </biblio> XML & XML Query
Details (cont.) • No Schema Required • XML Query should be usable on XML data when there is no schema (DTD or XML Schema) known in advance. But it should be able to exploit the schema if the schema is available. • Preserve Order and Association • XML Query should preserve order and association of elements in XML data (why?) XML & XML Query