270 likes | 414 Views
CIS 550. Handout 7 -- XPATH and XQuery. URLs -- XPath. http://www.w3.org/TR/xpath This is the “recommendation”. Dense. Few examples. Difficult to extract the “big picture” from the morass of detail http://www.zvon.org/xxl/XPathTutorial/ General/examples.html
CIS 550 Handout 7 -- XPATH and XQuery Fall 2001
URLs -- XPath • http://www.w3.org/TR/xpath This is the “recommendation”. Dense. Few examples. Difficult to extract the “big picture” from the morass of detail • http://www.zvon.org/xxl/XPathTutorial/ General/examples.html A tutorial with some simple examples. Maybe too simple. There are lots of tutorials on the web. Fall 2001
URLs -- XQuery • http://www.w3.org/TR/xquery/ The basic recommendation. Plenty of examples, so work through these first. • http://www.w3.org/TR/query-semantics/ A formal semantics for XQuery. Despite its forbidding title, it is remarkably readable. It also discusses a type system for XQuery. • http://www.w3.org/TR/xmlquery-use-cases A bunch of example queries and their solution in XQuery (not surprising, since XQuery is Turing-complete!) Fall 2001
db dept depts emps mgr emp emp name name name How to Identify nodes in a Tree -- Regular Path Expressions In the normal syntax of regular expressions: db.emps.emp db.(depts.dept.mgr |emps.emp) db._*.name “Mary” “Bill” “John” N.B. Regular path expressions have nothing to do with regular expresions in DTDs Fall 2001
More examples With the DTD: <!ELEMENT PERSON (NAME, FATHER, MOTHER)> <!ELEMENT MOTHER (PERSON?)> … the regular path expression (PERSON.MOTHER)* identifies matrilineal ancestry XPATH is a “superset of a subset” of regular path expressions. (It cannot express this set of nodes.) However, it is not limited to moving “down” the tree. Fall 2001
XPath • Primary goal = to permit to access some nodes from a given document • XPath main construct : axis navigation • An XPath path consists of one or more navigation steps, separated by / • A navigation step is a triplet: axis + node-test + list of predicates • Examples • /descendant::node()/child::author • /descendant::node()/child::author[parent/attribute::booktitle = “XML”][2] • XPath also offers some shortcuts • no axis means child • // º /descendant-or-self::node()/ Fall 2001
context node aaa ccc aaa aaa ccc 2 3 1 bbb bbb 4 5 6 7 XPath- child axis navigation • author is shorthand for child::author. Examples: • aaa -- all the child nodes labeled aaa (1,3) • aaa/bbb -- all the bbb grandchildren of aaa children (4) • */bbb all the bbb grandchildren of any child (4,6) • . -- the context node • / -- the root node Fall 2001
XPath- child axis navigation (cont) • /doc -- all the doc children of the root • ./aaa -- all the aaa children of the context node (equivalent to aaa) • text() -- all the text children of the context node • node() -- all the children of the context node (includes text and attribute nodes) • .. -- parent of the context node • .// -- the context node and all its descendants • // -- the root node and all its descendants • //para-- all the para nodes in the document • //text() -- all the text nodes in the document • @font the font attribute node of the context node Fall 2001
Predicates • [2] -- the second child node of the context node • chapter[5] -- the fifth chapter child of the context node • [last()] -- the last child node of the context node • chapter[title=“introduction”] -- the chapter children of the context node that have one or more title children whose string-value is “introduction” (the string-value is the concatenation of all the text on descendant text nodes) • person[.//firstname = “joe”] -- the person children of the context node that have in their descendants a firstname element with string-value “Joe” • From the XPath specification: NOTE: If $x is bound to a node set then $x = “foo” does not mean the same as not ($x != “foo”) . Fall 2001
Unions of Path Expressions • employee | consultant -- the union of the employee and consultant nodes that are children of the context node • For some reason person/(employee|consultant) --as in regular path expressions -- is not allowed • However person/node()[boolean(employee|consultant)] is allowed!! • From the XPATH specification: • The boolean function converts its argument to a boolean as follows: • a number is true if and only if it is neither positive or negative zero nor NaN • a node-set is true if and only if it is non-empty • a string is true if and only if its length is non-zero • an object of a type other than the four basic types is converted to a boolean in a way that is dependent on that type Fall 2001
Axis navigation • So far, nearly all our expressions have moved us down the by moving to child nodes. Exceptions were • . -- stay where you are • / go to the root • // all descendants of the root • .// all descendants of the context node • All other expressions have been abbreviations for child::… e.g. child::para. child:is an example of an axis • XPath has several axes: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, self • Some of these (self, parent) describe single nodes, others describe sequences of nodes. Fall 2001
XPath Navigation Axes(merci, Arnaud Sahuguet) ancestor preceding-sibling following-sibling self child attribute preceding following namespace descendant Fall 2001
XPath abbreviated syntax (nothing) child:: @ attribute:: // /descendant-or-self::node() . self::node() .// descendant-or-self::node .. parent::node() / (document root) Fall 2001
XPath • Reasonably widely adopted -- in XML-Schema and query languages. • Neither more expressive nor less expressive than regular path expressions (can’t do (ab)* ) • Particularly messy in some areas: • defining order of results • overloading of operations, • e.g. [chapter/title = “Introduction”] • why not [ “Introduction” IN chapter/title] ? Fall 2001
XQuery proposed by Chamberlin, Robbie and Florescu (from the authors’ slides) • Leverage the most effective features of several existing and proposed query languages • Design a small, clean, implementable language • Cover the functionality required by all the XML Query use cases in a single language • Write queries that fit on a slide Fall 2001
bind variables where <pattern> in <XML-expression> <pattern> in <XML-expression> … <condition> construct <expression> use variables bind variables for x in <XPath-expression> y in <XPath-expression> … where <condition> return <expression> use variables XQuery = XPath + “comprehension” syntax • XML -QL • Quilt Fall 2001
Examples from XQuery List the titles of books published by Morgan Kaufmann in 1998. FOR $b IN document("bib.xml")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998" RETURN $b/title XPath expressions inorange Fall 2001
Examples from XQuery (cont) List each publisher and the average price of its books. FOR $p IN distinct(document("bib.xml")//publisher) LET $a := avg( document("bib.xml")//book[publisher = $p]/price) RETURN <publisher> <name> {$p/text()} </name> <avgprice> {$a} </avgprice> </publisher> LET binds a variable to a value. It does not cause an iteration. Does this create a (well-formed) XML document? Fall 2001
Examples from XQuery (cont) List the publishers who have published more than 100 books. <big_publishers> { FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")//book[publisher = $p] WHERE count($b) > 100 RETURN $p } </big_publishers> What about efficiency? Fall 2001
Examples from XQuery (cont) Invert the structure of the input document so that each distinct author element contains a sequence of book-titles. <author_list> { FOR $a IN distinct(document("bib.xml")//author) RETURN <author> <name> {$a/text()} </name> { FOR $b IN document("bib.xml")//book[author = $a] RETURN $b/title } </author> } </author_list> Fall 2001
More Examples (Quilt)(from http://db.cis.upenn.edu/Kweelt/useCases/R/Q1.qlt ) Relational data -- two DTDs: <?xml version="1.0" ?> <!DOCTYPE items [ <!ELEMENT items (item_tuple*)> <!ELEMENT item_tuple (itemno, description, offered_by, start_date?, end_date?, reserve_price? )> <!ELEMENT itemno (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT offered_by (#PCDATA)> <!ELEMENT start_date (#PCDATA)> <!ELEMENT end_date (#PCDATA)> <!ELEMENT reserve_price (#PCDATA)> ]> <?xml version="1.0" ?> <!DOCTYPE bids [ <!ELEMENT bids (bid_tuple*)> <!ELEMENT bid_tuple (userid, itemno, bid, bid_date)> <!ELEMENT userid (#PCDATA)> <!ELEMENT itemno (#PCDATA)> <!ELEMENT bid (#PCDATA)> <!ELEMENT bid_date (#PCDATA)> ]> Fall 2001
The data <items> <item_tuple> <itemno>1001</itemno> <description>Red Bicycle</description> <offered_by>U01</offered_by> <start_date>1999-01-05</start_date> <end_date>1999-01-20</end_date> <reserve_price>40</reserve_price> </item_tuple> <item_tuple> <itemno>1002</itemno> <description>Motorcycle</description> <offered_by>U02</offered_by> <start_date>1999-02-11</start_date> <end_date>1999-03-15</end_date> <reserve_price>500</reserve_price> </item_tuple> … </items> <bids> <bid_tuple> <userid>U02</userid> <itemno>1001</itemno> <bid>35</bid> <bid_date>99-01-07</bid_date> </bid_tuple> <bid_tuple> <userid>U04</userid> <itemno>1001</itemno> <bid>40</bid> <bid_date>99-01-08</bid_date> </bid_tuple> … </bids> Fall 2001
Query 1 FUNCTION date() { "1999-02-01" } <result> ( FOR $i IN document("items.xml")//item_tuple WHERE $i/start_date LEQ date() AND $i/end_date GEQ date() AND contains($i/description, "Bicycle") RETURN <item_tuple> $i/itemno , $i/description </item_tuple> SORTBY (itemno) ) </result> simple function definitions dates are formatted so that lexicographic ordering gives the right result Fall 2001
Output from Q1 <?xml version="1.0" ?> <result> <item_tuple> <itemno> 1003 </itemno> <description> Old Bicycle </description> </item_tuple> <item_tuple> <itemno> 1007 </itemno> <description> Racing Bicycle </description> </item_tuple> </result> Fall 2001
Query Q2 For all bicycles, list the item number, description, and highest bid (if any), ordered by item number. <result> ( FOR $i IN document("items.xml")//item_tuple LET $b := document("bids.xml")//bid_tuple[itemno = $i/itemno] WHERE contains($i/description, "Bicycle") RETURN <item_tuple> $i/itemno , $i/description , IF ($b) THEN <high_bid> NumFormat("#####.##", max(-1, $b/bid)) </high_bid> ELSE "" </item_tuple> SORTBY (itemno) ) </result> lots of coercion Fall 2001
Output from Q2 <result> <item_tuple> <itemno> 1001 </itemno> <description> Red Bicycle </description> <high_bid> 55 </high_bid> </item_tuple> <item_tuple> <itemno> 1003 </itemno> <description> Old Bicycle </description> <high_bid> 20 </high_bid> </item_tuple> <item_tuple> <itemno> 1007 </itemno> <description> Racing Bicycle </description> <high_bid> 225 </high_bid> </item_tuple> <item_tuple> <itemno> 1008 </itemno> <description> Broken Bicycle </description> </item_tuple> </result> Fall 2001
Query Q3 Find cases where a user with a rating worse (alphabetically greater than "C" ) offers an item with a reserve price of more than 1000. <result> ( FOR $u IN document("users.xml")//user_tuple, $i IN document("items.xml")//item_tuple WHERE $u/rating GT 'C' AND $i/reserve_price GT 1000 AND $i/offered_by = $u/userid RETURN <warning> <user_name>$u/name/text()</user_name>, <user_rating>$u/rating/text()</user_rating>, <item_description>$i/description/text()</item_description>, $i/reserve_price </warning> ) </result> Comparing sets with singletons Same rules as in XPath? In this case the DTD gives uniqueness Fall 2001