1 / 49

Web Data Management

Web Data Management. XQuery. In this lecture. Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections and sorting Resources XQuery: A Query Language for XML Chamberlin, Florescu, et al. W3C recommendation: www.w3.org/TR/xquery/.

adamma
Download Presentation

Web Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Data Management XQuery

  2. In this lecture • Summary of XQuery • FLWOR expressions – For, Let, Where, Order by, Return • FOR and LET expressions • Collections and sorting Resources XQuery: A Query Language for XML Chamberlin, Florescu, et al. W3C recommendation: www.w3.org/TR/xquery/

  3. XQuery • Based on Quilt (which is based on XML-QL) • Uses XPath to express more complex queries • http://www.w3.org/TR/xquery • XML Query data model (of course ) • Ordered !

  4. Sample Data for Queries <bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><bookprice=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book> </bib>

  5. bib Data Model for XPath The root The root element book book publisher author . . . . Addison-Wesley Serge Abiteboul

  6. FLWOR (“Flower”) Expressions • FLWOR expression supports iteration and binding of variables to intermediate results • useful for computing joins between two or more documents and for restructuring data FOR ... LET... WHERE...ORDER BY…RETURN... • The for and let clauses generate an ordered sequence of tuples of bound variables, called the tuple stream • The optional where clause serves to filter the tuple stream • The optional order by clause can be used to reorder the tuple stream • The return clause constructs the result of the FLWOR expression

  7. FOR-WHERE-RETURN Find all book titles published after 1995: FOR$xINdocument("bib.xml")/bib/book WHERE$x/year/text() > 1995 RETURN$x/title Result: <title> abc </title> <title> def </title> <title> ghi </title>

  8. Equivalently (perhaps more geekish) FOR-WHERE-RETURN FOR$xINdocument("bib.xml")/bib/book[year/text() > 1995] /title RETURN$x And even shorter: document("bib.xml")/bib/book[year/text() > 1995] /title

  9. FOR-WHERE-RETURN • Find all book titles and the year when they were published: FOR$xINdocument("bib.xml")/ bib/bookRETURN <answer> <title>{ $x/title/text() } </title> <year>{ $x/year/text() } </year> </answer> Result: <answer> <title> abc </title> <year> 1995 </ year > </answer> <answer> <title> def </title> < year > 2002 </ year > </answer> <answer> <title> ghk </title> < year > 1980 </ year > </answer>

  10. FOR-WHERE-RETURN • Notice the use of “{“ and “}” • What is the result without them ? FOR$xINdocument("bib.xml")/bib/bookRETURN <answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>

  11. Nesting For each author of a book by Morgan Kaufmann, list all books she published: FOR$b IN document(“bib.xml”)/bib,$aIN$b/book[publisher /text()=“Morgan Kaufmann”]/author RETURN <result> { $a, FOR$tIN$b/book[author/text()=$a/text()]/title RETURN$t} </result> In the RETURN clause comma concatenates XML fragments

  12. Result <result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>

  13. Aggregates Find all books with more than 3 authors: FOR$x IN document("bib.xml")/bib/bookWHEREcount($x/author)>3 RETURN$x count = a function that counts avg = computes the averagesum = computes the sumdistinct-values = eliminates duplicates

  14. Aggregates Same thing: FOR$x IN document("bib.xml")/bib/book[count(author)>3] RETURN$x

  15. Aggregates Print all authors who published more than 3 books – be aware of duplicates ! FOR$b IN document("bib.xml")/bib, $aINdistinct-values($b/book/author/text())WHEREcount($b/book[author/text()=$a])>3 RETURN <author> { $a } </author>

  16. Aggregates <big_publishers> FOR$pINdistinct(document("bib.xml")//publisher) LET$b := document("bib.xml")/book[publisher = $p] WHEREcount($b) > 100 RETURN$p </big_publishers> count = a (aggregate) function that returns the number of elms

  17. Aggregates Find books whose price is larger than average: LET$a=avg(document("bib.xml")/bib/book/@price) FOR$b in document("bib.xml")/bib/book WHERE$b/@price > $a RETURN$b

  18. Flattening • “Flatten” the authors, i.e. return a list of (author, title) pairs FOR$bINdocument("bib.xml")/bib/book,$xIN$b/title/text(),$yIN$b/author/text()RETURN <answer> <title> { $x } </title> <author> { $y } </author> </answer> Result:<answer> <title> abc </title> <author> efg </author></answer><answer> <title> abc </title> <author> hkj </author></answer>

  19. Re-grouping • For each author, return all titles of her/his books Result:<answer> <author> efg </author> <title> abc </title> <title> klm </title> . . . . </answer> FOR$b IN document("bib.xml")/bib,$xIN$b/book/author/text()RETURN <answer> <author> { $x } </author> { FOR$yIN$b/book[author/text()=$x]/titleRETURN$y } </answer> What aboutduplicateauthors ?

  20. Re-grouping • Same, but eliminate duplicate authors: FOR$b IN document("bib.xml")/bibLET$a := distinct-values($b/book/author/text())FOR$xIN$aRETURN <answer> <author> {$x} </author> { FOR$yIN$b/book[author/text()=$x]/titleRETURN$y } </answer>

  21. Re-grouping • Same thing: FOR$b IN document("bib.xml")/bib,$xINdistinct-values($b/book/author/text())RETURN <answer> <author> {$x} </author> { FOR$yIN$b/book[author/text()=$x]/titleRETURN$y } </answer>

  22. Another Example Find book titles by the coauthors of “Database Theory”: FOR$b IN document("bib.xml")/bib,$xIN$b/book[title/text() = “Database Theory”],$yIN$b/book[author/text() = $x/author/text()]RETURN <answer> { $y/title/text() } </answer> Result: <answer> abc </ answer > < answer > def </ answer > < answer > abc </ answer > < answer > ghk </ answer > Question: Why do we get duplicates ?

  23. Distinct-values Same as before, but eliminate duplicates: FOR$b IN document("bib.xml")/bib,$xIN$b/book[title/text() = “Database Theory”]/author/text(),$yINdistinct-values($b/book[author/text() = $x] /title/text()) RETURN <answer> { $y } </answer> Result: <answer> abc </ answer > < answer > def </ answer > < answer > ghk </ answer > distinct-values = a function that eliminates duplicates Need to apply to a collectionof text values, not of elements – note how query has changed

  24. FOR$x in document(“db.xml”)/db/Product/rowORDER BY$x/price/text()RETURN <answer> { $x/name, $x/price } </answer> SELECT x.name, x.priceFROM Product xORDER BY x.price SQL XQuery SQL and XQuery Side-by-side Find all product names, prices,sort by price Product(pid, name, maker, price)

  25. Xquery’s Answer <answer> <name> abc </name> <price> 7 </price></answer> <answer> <name> def </name> <price> 23 </price></answer> . . . . Notice: this is NOT awell-formed document !(WHY ???)

  26. Producing a Well-Formed Answer <myQuery> { FOR$x in document(“db.xml”)/db/Product/rowORDER BY$x/price/text()RETURN <answer> { $x/name, $x/price } </answer> }</myQuery>

  27. Xquery’s Answer <myQuery> <answer> <name> abc </name> <price> 7 </price> </answer> <answer> <name> def </name> <price> 23 </price> </answer> . . . .</myQuery> Now it is well-formed !

  28. FOR$r in document(“db.xml”)/db,$x in $r/Product/row,$y in $r/Company/rowWHERE$x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle”RETURN { $x/name } SELECT x.nameFROM Product x, Company yWHERE x.maker=y.cid and y.city=“Seattle” SQL XQuery FOR$y in /db/Company/row[city/text()=“Seattle”],$x in /db/Product/row[maker/text()=$y/cid/text()]RETURN { $x/name } CoolXQuery SQL and XQuery Side-by-side Product(pid, name, maker, price)Company(cid, name, city, revenues) Find all products made in Seattle

  29. <product> <row> <pid> 123 </pid> <name> abc </name> <maker> efg </maker> </row> <row> …. </row> … </product><product> . . . </product>. . . .

  30. SQL and XQuery Side-by-side For each company with revenues < 1M count the products over $100 SELECT y.name, count(*)FROM Product x, Company yWHERE x.price > 100 and x.maker=y.cid and y.revenue < 1000000GROUP BY y.cid, y.name FOR$r in document(“db.xml”)/db,$y in $r/Company/row[revenue/text()<1000000]RETURN <proudCompany> <companyName> { $y/name/text() } </companyName> <numberOfExpensiveProducts> { count($r/Product/row[maker/text()=$y/cid/text()][price/text()>100]) } </numberOfExpensiveProducts> </proudCompany>

  31. SQL and XQuery Side-by-side Find companies with at least 30 products, and their average price SELECT y.name, avg(x.price)FROM Product x, Company yWHERE x.maker=y.cidGROUP BY y.cid, y.nameHAVING count(*) > 30 An element FOR$r in document(“db.xml”)/db,$y in $r/Company/rowLET$p := $r/Product/row[maker/text()=$y/cid/text()]WHEREcount($p) > 30RETURN <theCompany> <companyName> { $y/name/text() } </companyName> <avgPrice> avg($p/price/text()) </avgPrice> </theCompany> A collection

  32. Sorting in XQuery <publisher_list> FOR$bIN document("bib.xml")//book[publisher = ‘M.K’] ORDERBY$b/price/text() RETURN <book> { $b/title , $b/price } </book> </publisher_list>

  33. If-Then-Else FOR$h IN //holding RETURN <holding> $h/title, IF$h/@type = "Journal" THEN$h/editor ELSE$h/author </holding> SORTBY (title)

  34. Existential Quantifiers FOR$b IN //book WHERESOME$p IN $b//paraSATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN$b/title

  35. Universal Quantifiers FOR$b IN //book WHEREEVERY$p IN $b//paraSATISFIES contains($p, "sailing") RETURN$b/title

  36. Duplicate Elimination • distinct-values(list-of-text-values) • How do we eliminate duplicate “tuples” ? <row> <a>3</a> <b>100</b> </row> <row> <a>8</a> <b>500</b> </row> <row> <a>3</a> <b>100</b> </row> <row> <a>3</a> <b>200</b> </row> <row> <a>8</a> <b>500</b> </row> <row> <a>3</a> <b>100</b> </row> <row> <a>8</a> <b>500</b> </row> <row> <a>3</a> <b>200</b> </row>

  37. FOR v.s. LET • FOR$x in expr -- binds $x to each element in the list expr • LET$x = expr -- binds $x to the entire list expr • Useful for common subexpressions and for aggregations

  38. FOR v.s. LET Summary: • FOR-LET-WHERE-RETURN = FLWR FOR/LET Clauses List of tuples WHERE Clause List of tuples RETURN Clause Instance of Xquery data model

  39. FOR v.s. LET FOR • Binds node variables iteration LET • Binds collection variables one value

  40. FOR v.s. LET Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ... FOR$xINdocument("bib.xml")/bib/book RETURN <result> $x </result> LET$x:=document("bib.xml")/bib/book RETURN <result> $x </result> Returns: <result> <book>...</book> <book>...</book> <book>...</book> ... </result>

  41. Collections in XQuery • Ordered and unordered collections • /bib/book/author = an ordered collection • Distinct(/bib/book/author) = an unordered collection • LET$a = /bib/book $a is a collection • $b/author  a collection (several authors...) Returns: <result> <author>...</author> <author>...</author> <author>...</author> ... </result> RETURN <result> $b/author </result>

  42. Collections in XQuery What about collections in expressions ? • $b/@price list of n prices • $b/@price * 0.7  list of n numbers • $b/@price * $b/@quantity  list of n x m numbers ?? • $b/@price * ($b/@quant1 + $b/@quant2)  $b/@price * $b/@quant1 + $b/@price * $b/@quant2 !!

  43. Sorting in XQuery <publisher_list> FOR$pINdistinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR$bIN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/@price </book> SORTBY(priceDESCENDING) </publisher> SORTBY(name) </publisher_list>

  44. Sorting in XQuery • Sorting arugments: refer to the name space of the RETURN clause, not the FOR clause • To sort on an element you don’t want to display, first return it, then remove it with an additional query.

  45. Other Stuff in XQuery • BEFORE and AFTER • for dealing with order in the input • FILTER • deletes some edges in the result tree • Recursive functions • Currently: arbitrary recursion • Perhaps more restrictions in the future ?

  46. Group-By in Xquery ?? • No GROUPBY currently in XQuery • A recent proposal (next) • What do YOU think ?

  47. Group-By in Xquery ?? FOR$bIN document("http://www.bn.com")/bib/book, $yIN$b/@year WHERE$b/publisher="Morgan Kaufmann" RETURNGROUPBY $y WHERE count($b) > 10 IN <year> $y </year>  with GROUPBY SELECT year FROM Bib WHERE Bib.publisher="Morgan Kaufmann" GROUPBY year HAVING count(*) > 10 Equivalent SQL 

  48. Group-By in Xquery ?? FOR $b IN document("http://www.bn.com")/bib/book,$a IN $b/author,$y IN $b/@yearRETURN GROUPBY $a, $y IN <result> $a, <year> $y </year>, <total> count($b) </total> </result>  with GROUPBY FOR $Tup IN distinct(FOR $b IN document("http://www.bn.com")/bib,$a IN $b/author,$y IN $b/@year RETURN <Tup> <a> $a </a> <y> $y </y> </Tup>),$a IN $Tup/a/node(),$y IN $Tup/y/node() LET $b = document("http://www.bn.com")/bib/book[author=$a,@year=$y] RETURN <result> $a, <year> $y </year>, <total> count($b) </total> </result> Without GROUPBY 

  49. Group-By in Xquery ?? FOR$bIN document("http://www.bn.com")/bib/book,$aIN$b/author,$yIN$b/@year,$tIN$b/title,$pIN$b/publisherRETURNGROUPBY$p, $yIN <result> $p, <year> $y </year>,GROUPBY$aIN <authorEntry> $a,GROUPBY $tIN$t <authorEntry> </result>  Nested GROUPBY’s

More Related