520 likes | 666 Views
Introduction to XML and XQuery. Guangjun (Kevin) Xie. Road Map. XML data model XML data vs Relational data XPath 2.0 XQuery Processing XQuery. XML Data Model XML Information Set (Infoset). Infoset is an abstract data set containing all information in an XML document
E N D
Introduction to XML and XQuery Guangjun (Kevin) Xie
Road Map • XML data model • XML data vs Relational data • XPath 2.0 • XQuery • Processing XQuery York University
XML Data ModelXML Information Set (Infoset) • Infoset is an abstract data set containing all information in an XML document • provide a consistent set of definitions to refer to the information in a well-formed XML document • Usually, Infosets result from parsing XML documents; but it could also be synthetic • By use of an API, such as DOM • By transforming from existing infoset • An infoset consists of a number of information items. York University
XML Data ModelXML Infoset • "information set" and "information item" are similar in meaning to the generic terms "tree" and "node” • An information item is an abstract description of some part of an XML document. • Each information item has a set of associated named properties, indicated as [property name] York University
XML Data ModelInformation Items • 11 types of information items • Document Information Item • Element Information Items • Attribute Information Items • Character Information Items • Processing Instruction Information Items • Unexpanded Entity Reference Information Items • Comment Information Items • The Document Type Declaration Information Item • Unparsed Entity Information Items • Notation Information Items • Namespace Information Items • We will discuss the first 3 today York University
XML Data ModelDocument Information Item • Exactly one doc item in an infoset • Other information accessible thru its properties: • [children] – containing PIs, comments, etc • [document element] – element item corresponding to the document element • [version] – XML version of the document • … • etc York University
XML Data ModelElement Information Items • One element item for each element in XML document • The “root” element item is the [document element] prop. of document info item • Properties: • [namespace name] – the ns part of tag name • [local name] – the local part of tag name • [children] – all other info items inside • [attributes] – attributes elems of this item • [parent] – info. Item containing this item • … etc. York University
XML Data ModelAttribute information items • One attribute item for each attribute in an XML element • Properties: • [namespace name] – the ns part of tag name • [local name] – the local part of tag name • [attribute type] – the data type of this attribute • [owner element] – the element info item containing this attr • … • etc York University
XML Data ModelInfoset example <?xml version="1.0"?> <msg:message doc:date="19990421" xmlns:doc=“http://doc.example.org/namespaces/doc” xmlns:msg="http://message.example.org/" >Phone home!</msg:message> • The information set contains: • A document information item. • An element information item with namespace name "http://message.example.org/", local part "message", and prefix "msg". • An attribute information item with the namespace name "http://doc.example.org/namespaces/doc", local part "date", prefix "doc", and normalized value "19990421". • Three namespace information items for the http://www.w3.org/XML/1998/namespace, http://doc.example.org/namespaces/doc, and http://message.example.org/ namespaces. • Two attribute information items for the namespace attributes. • Eleven character information items for the character data. York University
xmlns:doc xmlns:msg P h o n e h o m e ! XML Data ModelInfoset Example Legend: • Document info. Item • Element info. Item • Attribute info. Item • Character info. Item Version=1.0 msg:message doc:date York University
Road Map • XML data model • XML data vs Relational data • XPath 2.0 • XQuery • Processing XQuery York University
XML Data vs Relational Data • Relational DB stems from commercial data processing • Information usually has regular structure • XML has roots in text documents processing • Often have irregular structure. • Both are general model and capable of representing all forms of information. • Different heritages cause them to be optimized for different types of applications. York University
XML Data vs Relational Data Nesting • XML Model • Deeply nested structure • Flexible (un-predefined) • Query easily handled by “descendants” axis in XPath 2.0 • Relational Model • Flat table structure • Primary-foreign keys represent nesting relationship • Complex and flexible nesting may result in awkward queries York University
XML Data vs Relational Data Metadata • XML Model • Metadata mixed with ordinary data • Hight ratio of metadata to ordinary data • Relational Model • Metadata easily factored out • Difficult when query involve metadata • Ex: find the names of columns containing the value “red” York University
XML Data vs Relational Data Ordering • XML Model • Intrinsic ordering can’t derived from value • Ex: sentences in a book is essential • Impose challenge for the query language • Relational Model • Ordering is dependent on values • Rows not considered to have ordering York University
XML Data vs Relational Data Null Values • XML Model • Representing missing value by absence of element • Retrieving missing value results empty list • Need rule on how handle empty list • Relational Model • “null” value to represent missing value • Rules for operators in the presence of null York University
XML Data vs Relational Data Structural Transformations • XML Model • Queries on XML documents and generate new XML documents • XPath 2.0 – navigating inside a document • XQuery – joining elements, constructing new elements/structures • Relational Model • Queries on tables and generate new tables York University
XML Data vs Relational Data Data Definition • XML Model • Mixture of primitive data and nested elements • Elements may be optional • Constraints on cardinality and order • Impose challenges on type inference • Ex: proving output satisfies a given schema? • Relational Model • Specifying the properties of columns • All rows have same columns • Relatively simple York University
Road Map • XML data model • XML data vs Relational data • XPath 2.0 • XQuery • Processing XQuery York University
XPath 2.0What’s XPath? • XPath is a specification for defining parts of an XML document. • XPath 2.0 provides a method to locate individual node or set of nodes in a XML data model. • XPath 2.0 is close related to XQuery • Same data model based on XML data model (infoset) • XQuery uses XPath to refer to information in the data model • XPath 2.0 uses path expressions to navigate in XML documents • XPath 2.0 uses path expressions to select nodes in an XML document. • An XPath expression evaluates to a sequence of nodes • These path expressions look very much like the expressions you see when you work with a traditional computer file system. • XPath 2.0 is a W3C recommendation York University
XPath 2.0Data model • Represent various values including • the input and the output of a query • all values of expressions used during the intermediate calculations. • Based on XML infoset data model • Shared with XQuery • Model XML data as trees • Sequence based data model • Using sequence to represent set of trees or tree fragments • Everything is sequence • Sequences never contain other sequences York University
XPath 2.0Data model • A tree whose root node is a Document Node is referred to as a document. • A tree whose root node is not a Document Node is referred to as a fragment. York University
XPath 2.0Data model • Every instance of the data model is a sequence • A sequence may contain nodes, atomic values, or any mixture of nodes and atomic values • A sequence is an ordered collection of zero or more items • An item is either a node or an atomic value • A single item appearing on its own is modeled as a sequence containing one item. York University
XPath 2.0Data model • There are seven kinds of Nodes in the data model: • Document node • Element node • Attribute node • Text node • Namespace node • processing instruction node • Comment node York University
<?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category="WEB"> <title lang="en">XQuery Kick Start</title> <author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> </bookstore> XPath 2.0Sample XML Document Books.xml York University
<book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category="WEB"> <title lang="en">XQuery Kick Start</title> <author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> XPath 2.0Example /bookstore/book evaluated to a sequence of nodes, each node corresponding to a book element: //book evaluated to the same result York University
XPath 2.0Example //book[@category=“WEB”] evaluates to a sequence containing 2 book element nodes: <book category="WEB"> <title lang="en">XQuery Kick Start</title> <author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> York University
XPath 2.0Example • some $x in //book satisfies $x/price > 49 evaluates to a sequence containing a atomic value TRUE • every $x in //book satisfies $x/price > 49 evaluates to a sequence containing a atomic value FALSE York University
XPath 2.0Example /bookstore/book[position()=1] evaluated to a sequence containing one element node: <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> York University
Road Map • XML data model • XML data vs Relational data • XPath 2.0 • XQuery • Processing XQuery York University
XQueryWhat’s XQuery? • The language for querying XML data • XQuery is a language for finding and extracting elements and attributes from XML documents. • XQuery for XML is like SQL for relational databases • Lots of the concepts and techniques used in SQL processing and optimization can be applied to XQuery processing and optimization. York University
XQueryWhat’s XQuery? • XQuery is built on XPath 2.0 expressions • XQuery 1.0 and XPath 2.0 share the same data model • Support the same functions and operators. • Understanding XPath 2.0 is essential to understanding XQuery. • Supported by all the major database venders • IBM • Oracle • Microsoft • etc York University
XQueryWhat’s XQuery? • closed with respect to a data model • value of every expression in the language is guaranteed to be in the data model. • XPath 2.0 is also closed • Designed to be a functional language • No side-effect • Processing and producing sequences • XQuery is becoming a W3C standard • Current draft version is XQuery 1.0 • Not yet a W3C Recommendation (XQuery is a Working Draft) York University
XQueryFLWOR expression • For expression binds a variable with each element in a sequence iteratively • Let expression binds a variable with a sequence • Where expression applies conditions during For expression binding • Order By sort the output of the For expression • Return expression returns a sequence York University
XQuerysample XML document – bib.xml <bib> <book year="1994"> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price>39.95</price> </book> <book year="1999"> <title>The Economics of Technology and Content for Digital TV</title> <editor> <last>Gerbarg</last><first>Darcy</first> <affiliation>CITI</affiliation> </editor> <publisher>Kluwer Academic Publishers</publisher> <price>129.95</price> </book> </bib> York University
XQuerysample XML document – reviews.xml <reviews> <entry> <title>Data on the Web</title> <price>34.95</price> <review> A very good discussion of semi-structured database systems and XML. </review> </entry> <entry> <title>Advanced Programming in the Unix environment</title> <price>65.95</price> <review> A clear and detailed discussion of UNIX programming. </review> </entry> <entry> <title>TCP/IP Illustrated</title> <price>65.95</price> <review> One of the best books on TCP/IP. </review> </entry> </reviews> York University
XQuerysample XML document – prices.xml <prices> <book> <title>Advanced Programming in the Unix environment</title> <source>bstore2.example.com</source> <price>65.95</price> </book> <book> <title>Advanced Programming in the Unix environment</title> <source>bstore1.example.com</source> <price>65.95</price> </book> <book> <title>TCP/IP Illustrated</title> <source>bstore2.example.com</source> <price>65.95</price> </book> <book> <title>TCP/IP Illustrated</title> <source>bstore1.example.com</source> <price>65.95</price> </book> <book> <title>Data on the Web</title> <source>bstore2.example.com</source> <price>34.95</price> </book> <book> <title>Data on the Web</title> <source>bstore1.example.com</source> <price>39.95</price> </book> </prices> York University
Solution in XQuery: <bib> { for $b in doc("bib.xml")/bib/book where $b/publisher = "Addison-Wesley" and $b/@year > 1991 return <book year="{ $b/@year }"> { $b/title } </book> } </bib> Result: <bib> <book year="1994"> <title>TCP/IP Illustrated</title> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> </book> </bib> XQueryExample 1 • List books published by Addison-Wesley after 1991, including their year and title York University
Solution in XQuery: for $b in doc("bib.xml")/bib/book, $t in $b/title, $a in $b/author return <result> { $t } { $a } </result> Result: <result> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> </result> <result> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author> </result> <result> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> </result> <result> <title>Data on the Web</title> <author><last>Buneman</last><first>Peter</first></author> </result> <result> <title>Data on the Web</title> <author><last>Suciu</last><first>Dan</first></author> </result> XQueryExample 2 • Create a flat list of all the title-author pairs York University
Solution in XQuery: for $b in doc("bib.xml")/bib/book return <result> { $b/title } { $b/author } </result> Result: <result> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> </result> <result> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author> </result> <result> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> </result> <result> <title>The Economics of Technology and Content for Digital TV</title> </result>> XQueryExample 3 • For each book in the bibliography, list the title and authors York University
Solution in XQuery: <books-with-prices> { for $b in doc("bib.xml")//book, $a in doc("reviews.xml")//entry where $b/title = $a/title return <book-with-prices> { $b/title } <bib-price> { $a/price/text() } </bib-price> <review-price> { $b/price/text() } </review-price> </book-with-prices> } </books-with-prices> Result: <books-with-prices> <book-with-prices> <title>TCP/IP Illustrated</title> <price-bstore2>65.95</price-bstore2> <price-bstore1>65.95</price-bstore1> </book-with-prices> <book-with-prices> <title>Advanced Programming in the Unix environment</title> <price-bstore2>65.95</price-bstore2> <price-bstore1>65.95</price-bstore1> </book-with-prices> <book-with-prices> <title>Data on the Web</title> <price-bstore2>34.95</price-bstore2> <price-bstore1>39.95</price-bstore1> </book-with-prices> </books-with-prices> XQueryExample 4 • For each book found at both bib.xml and reviews.xml, list the title of the book and its price from each source York University
Solution in XQuery: <bib> { for $b in doc("bib.xml")//book where $b/publisher = "Addison-Wesley" and $b/@year > 1991 order by $b/title return <book> { $b/@year } { $b/title } </book> } </bib> Result: <bib> <book year="1992"> <title> Advanced Programming in the Unix environment </title> </book> <book year="1994"> <title>TCP/IP Illustrated</title> </book> </bib> XQueryExample 5 • List the titles and years of all books published by Addison-Wesley after 1991, in alphabetic order York University
Solution in XQuery: <results> { let $doc := doc("prices.xml") for $t in distinct-values($doc//book/title) let $p := $doc//book[title = $t]/price return <minprice title="{ $t }"> <price>{ min($p) }</price> </minprice> } </results> Result: <results> <minprice title="Advanced Programming in the Unix environment"> <price>65.95</price> </minprice> <minprice title="TCP/IP Illustrated"> <price>65.95</price> </minprice> <minprice title="Data on the Web"> <price>34.95</price> </minprice> </results> XQueryExample 6 • In the document “prices.xml”, find the minimum price for each book, in the form of a “miniprice” element with the book title as its title attribute York University
<?xml version="1.0"?> <book> <title>Data on the Web</title> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author>Dan Suciu</author> <section id="intro" difficulty="easy" > <title>Introduction</title> <p>Text ... </p> <section> <title>Audience</title> <p>Text ... </p> </section> <section> <title>Web Data and the Two Cultures</title> <p>Text ... </p> <figure height="400" width="400"> <title>Traditional client/server architecture</title> <image source="csarch.gif"/> </figure> <p>Text ... </p> </section> </section> <section id="syntax" difficulty="medium" > <title>A Syntax For Data</title> <p>Text ... </p> <figure height="200" width="500"> <title>Graph representations of structures</title> <image source="graphs.gif"/> </figure> <p>Text ... </p> <section> <title>Base Types</title> <p>Text ... </p> </section> <section> <title>Representing Relational Databases</title> <p>Text ... </p> <figure height="250" width="400"> <title>Examples of Relations</title> <image source="relations.gif"/> </figure> </section> <section> <title>Representing Object Databases</title> <p>Text ... </p> </section> </section> </book> XQuerysample XML document – book.xml York University
Solution in XQuery: declare function local:toc( $book-or-section as element()) as element()* { for $section in $book-or-section/section return <section> { $section/@*, $section/title, local:toc($section) } </section> }; <toc> { for $s in doc("book.xml")/book return local:toc($s) } </toc> <toc> <section id="intro" difficulty="easy"> <title>Introduction</title> <section> <title>Audience</title> </section> <section> <title>Web Data and the Two Cultures</title> </section> </section> <section id="syntax" difficulty="medium"> <title>A Syntax For Data</title> <section> <title>Base Types</title> </section> <section> <title>Representing Relational Databases</title> </section> <section> <title>Representing Object Databases</title> </section> </section> </toc> XQueryExample 7 • Prepare a (nested) table of contents, listing all sections and their titles. Preserve the original attributes of each <section> element, if any York University
Road Map • XML data model • XML data vs Relational data • XPath 2.0 • XQuery • Processing XQuery York University
Processing XQueryApproaches for querying XML data • Mapping XML data into relational data • Query with SQL • May produces too many relations • Loses of information may occurs • Ex: ordering, explicit hierarchical relationship between elements • Using specific query languages • Usually integrated with SQL and relational data management • SQL/XML or XQuery York University
A new XQuery parser is added to the existing relational query processing All components extended to process XQuery Processing XQueryIBM System RX SQL/XQuery compiler York University
Parser convert XQuery into XQueryX XQueryX is an XML representation of XQuery (another W3C candidate recommendation) XML parser construct a DOM tree from XQueryX Work on the DOM afterward Corresponding components are extended for XQuery too Processing XQueryOracle XQuery Compilation Engine York University
XQuery compiled into XML algebra tree, which is an internal representation Algebra tree can be optimized and executed by relational query processor Optimizations are rule-based Mapper traverses the algebra tree, converting each XML operator into a relational operator sub-tree Processing XQueryMicrosoft XQuery compilation York University