450 likes | 586 Views
Managing XML and Semistructured Data. Part 3: Query Languages. In this section…. Lorel (A Lightweight Object REpository Language - developed at Standford) XPath specification data model Examples [ xpath , axis ] syntax XQuery FLWR expressions FOR and LET expressions
E N D
Managing XML and Semistructured Data Part 3: Query Languages
In this section… • Lorel (A Lightweight Object REpository Language - developed at Standford) • XPath specification • data model • Examples [xpath, axis] • syntax • XQuery • FLWR expressions • FOR and LET expressions • Collections and sorting • (XML-QL the earlier version in AT&T Labs) • Resources: The Lorel Query Language for Semistructured Data by Abiteboul, Quass, McHugh, Widom, Wiener, in International Journal on Digital Libraries, 1997. A formal semantics of patterns in XSLT by Phil Wadler. XML Path Language (XPath) www.w3.org/TR/xpath XQuery: A Query Language for XML Chamberlin, Florescu, et al. W3C recommendation: www.w3.org/TR/xquery/
Querying XML Data • A core query language (extracting + restructuring) • XPath (core expressions) allows simple navigation through the tree • XQuery is used as the SQL of XML • XSLT (Extensible Stylesheet Language Transformation) = recursive traversal based on pattern matching - will not discuss here
Sample Data for Queries <biblio><paper> … </paper> <book><author> Smith </author> <date> 1999 </date> <title> Database Systems </title></book><book><author> Roux</author><author> Combalusier</author> <date> 1976 </date><title> Database Systems </title></book> </biblio>
A Core Query Language A SQL-like language for querying semi-structured data biblio &o1 book paper book &o12 &o24 &o29 . . . title author date title author author date &30 &o52 &25 &96 Database Systems Roux 1976 &o50 Will illustrate with: XML DB = &o47 &o48 Combalusier Smith 1999 Database Systems
Query 1: biblio &o1 book paper book &o12 &o24 &o29 . . . title author date author title author date &30 &o52 &25 &96 1976 Database Systems &o50 &o47 &o48 Roux Combalusier 1999 Database Systems Smith SELECT author: XFROM biblio.book.author X answer author Answer ={author: “Smith”, author: “Roux”, author: “Combalusier”} author author
Query 2: biblio &o1 book paper book &o12 &o24 &o29 . . . title author date author title author date &30 &o52 &25 &96 1976 Database Systems &o50 &o47 &o48 Roux Combalusier 1999 Database Systems Smith SELECT row: XFROM biblio._XWHERE “Smith” in X.author row answer . . . Answer ={row: {author:“Smith”, date: 1999, title: “Database…”},row: …} row
Query 3: biblio &o1 book paper book &o12 &o24 &o29 . . . title author date author title author date &30 &o52 &25 &96 1976 Database Systems &o50 &o47 &o48 Roux Combalusier 1999 Database Systems Smith SELECT row: ( SELECT author: Y FROM X.author Y)FROM biblio.book X row answer &a1 row Answer ={row: {author:“Smith”},row: {author:“Roux”,author:“Combalusier”,},} &a2 author author author
Query 4: biblio &o1 book paper book &o12 &o24 &o29 . . . title author date author title author date &30 &o52 &25 &96 1976 Database Systems &o50 &o47 &o48 Roux Combalusier 1999 Database Systems Smith SELECT ( SELECT row: {author: Y, title: T} FROM X.author Y, X.title T)FROM biblio.book XWHERE “Roux” in X.author row answer Answer ={row: {author:“Roux”, title: “Database…”},row: {author:“Combalusier”, title: “Database…”},} &a1 row author &a2 title title author
Lorel • Minor syntactic differences in regular path expressions (% instead of _, # instead of _*) • Common path convention:becomes: SELECT biblio.book.authorFROM biblio.bookWHERE biblio.book.year = 1999 SELECT X.authorFROM biblio.book XWHERE X.year = 1999
Lorel • Existential variables: • What happens with books having multiple authors ? Author is existentially quantified: SELECT biblio.book.yearFROM biblio.bookWHERE biblio.book.author = “Roux” SELECT X.yearFROM biblio.book X, X.author YWHERE Y = “Roux”
Lorel • Path variables. @P in: • What happens on graphs with cycles ? • Constructing new results • Several default rules • Casting between datatypes • Very useful in practice SELECT @PFROM biblio.# @P X
XPath • http://www.w3.org/TR/xpath (11/99) • Building block for other W3C standards: • XSL Transformations (XSLT) • XML Link (XLink) • XML Pointer (XPointer) • XML Query • Was originally part of XSL
XPath: Summary bib matches a bib element * matches any element / matches the root element /bib matches a bib element under root bib/paper matches a paper in bib bib//paper matches a paper in bib, at any depth //paper matches a paper at any depth paper|book matches a paper or a book @price matches a price attribute bib/book/@price matches price attribute in book, in bib bib/book/[@price<“55”]/author/lastname matches…
Example for XPath Queries <bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><bookprice=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book> </bib>
bib Data Model for XPath The root The root element book book publisher author . . . . Addison-Wesley Serge Abiteboul
XPath: Simple Expressions /bib/book/year Result: <year> 1995 </year> <year> 1998 </year> Result: empty (there were no papers) /bib/paper/year
XPath: Restricted Kleene Closure //author Result:<author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author> Result: <first-name> Rick </first-name> /bib//first-name
XPath: Text Nodes /bib/book/author/text() Result: Serge Abiteboul Jeffrey D. Ullman !Rick Hull doesn’t appear because he has firstname, lastname Functions in XPath: • text() = matches the text value • node() = matches any node (= * or @* or text()) • name() = returns the name of the current tag
XPath: Wildcard Result: <first-name> Rick </first-name> <last-name> Hull </last-name> * Matches any element //author/*
XPath: Attribute Nodes /bib/book/@price Result: “55” @price means that price is has to be an attribute
XPath: Predicates /bib/book/author[firstname] Result: <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author>
XPath: More Predicates Result: <lastname> … </lastname> <lastname> … </lastname> /bib/book/author[firstname][address[//zip][city]]/lastname
XPath: More Predicates /bib/book[@price < “60”] /bib/book[author/@age < “25”] /bib/book[author/text()]
XQuery FLWOR (flower) Expressions • Based on Quilt(which is based on XML-QL) • http://www.w3.org/TR/xquery/2/2001 • XML Query data model • Ordered ! FOR ... LET... WHERE... ORDER BY… RETURN...
XQuery Query: Find all book titles published after 1995: FOR$xINdocument("bib.xml")/bib/book WHERE$x/year > 1995 RETURN$x/title <bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><bookprice=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book> </bib> * bib.xml is shown on slide 15 Result: <title> Principles of Database…</title>
XQuery Query: Find book titles by the coauthors of “Foundations of Databases”: FOR$xINbib/book[title/text() = “Foundations …”]/author$yINbib/book[author/text() = $x/text()]/title RETURN <answer> $y/text() </answer> Result: <answer> Foundations … </ answer > < answer> Foundations …</ answer > The answer willcontain duplicates !
XQuery Same as before, but eliminate duplicates: FOR$xINbib/book[title/text() = “Database Theory”]/author$yINdistinct(bib/book[author/text() = $x/text()]/title) RETURN <answer> $y/text() </answer> Result: < answer> Foundations …</ answer > distinct = a function that eliminates duplicates
FOR$yIN /db/Company/row[city/text()=“Seattle”]$xIN /db/Product/row[maker/text()=$y/cid/text()]RETURN$x/name CoolXQuery SQL and XQuery Side-by-side Product(pid, name, maker)Company(cid, name, city) Query: Find all products made in Seattle FOR$xIN /db/Product/row$yIN /db/Company/rowWHERE$x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle”RETURN$x/name SELECT x.nameFROM Product x, Company yWHERE x.maker=y.cid and y.city=“Seattle” SQL XQuery
Result: <result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result> XQuery: Nesting Query: For each author of a book by Morgan Kaufmann, list all books s/he published: FOR$aINdistinct(document("bib.xml")/bib/book[publisher=“Morgan Kaufmann”]/author) RETURN <result> { $a, FOR$tIN /bib/book[author=$a]/title RETURN$t} </result>
XQuery • FOR$xIN expr -- binds $x to each value in the list expr • LET$x = expr -- binds $x to the entire list expr • Useful for common subexpressions and for aggregations <big_publishers> FOR$pINdistinct(document("bib.xml")//publisher) LET$b := document("bib.xml")/book[publisher = $p] WHEREcount($b) > 100 RETURN$p </big_publishers> count = a (aggregate) function that returns the number of elms
XQuery Query: Find books whose price is larger than average: FOR$aIN /bib/book LET$b:=avg(/bib/book/price/text()) WHERE$a/price/text() > $b RETURN$a
XQuery Query: Find all publishers that published more than 100 books: <big_publishers> { FOR$pINdistinct(//publisher/text()) LET$b := document("bib.xml")/book[publisher/text() = $p] WHEREcount($b) > 100 RETURN <publisher> $p </publisher> } </big_publishers> $bis a collection of elements, not a single element count= a (aggregate) function that returns the number of elements
FOR v.s. LET FOR • Binds node variables iteration LET • Binds collection variables one value Examples Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ... FOR$xINdocument("bib.xml")/bib/book RETURN <result> $x </result> Returns: <result> <book>...</book> <book>...</book> <book>...</book> ... </result> LET$x:=document("bib.xml")/bib/book RETURN <result> $x </result>
Sorting in XQuery <publisher_list> FOR$pINdistinct(document("bib.xml")//publisher) RETURN <publisher> {<name> $p/text() </name> , FOR$bIN document("bib.xml")//book[publisher = $p] RETURN <book> {$b/title , $b/@price} </book> SORTBY(priceDESCENDING) } </publisher> SORTBY(name) </publisher_list>
Sorting in XQuery • Sorting arguments: refer to the name space of the RETURN clause, not the FOR clause • To sort on an element you don’t want to display, first return it, then remove it with an additional query. <publisher_list> FOR$pINdistinct(document("bib.xml")//publisher) RETURN <publisher> { <name> $p/text() </name> , FOR$bIN document("bib.xml")//book[publisher = $p] RETURN <book> { $b/title , $b/price } </book> ORDER BY priceDESCENDING } </publisher> ORDER BY name </publisher_list>
Collections in XQuery • Ordered and unordered collections • /bib/book/author = an ordered collection • Distinct(/bib/book/author) = an unordered collection • LET$b = /bib/book $b is a collection • $b/author a collection (several authors...) Returns: <result> <author>...</author> <author>...</author> <author>...</author> ... </result> RETURN <result> $b/author </result>
If-Then-Else FOR$h IN //holding RETURN <holding> { $h/title, IF$h/@type = "Journal" THEN$h/editor ELSE$h/author } </holding> ORDER BYtitle
Quantifiers FOR$b IN //book WHERESOME$p IN $b//paraSATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN$b/title Existential Quantifiers FOR$b IN //book WHEREEVERY$p IN $b//paraSATISFIES contains($p, "sailing") RETURN$b/title Universal Quantifiers
Other Stuff in XQuery • BEFORE and AFTER • for dealing with order in the input • FILTER • deletes some edges in the result tree • Recursive functions • Currently: arbitrary recursion • Perhaps more restrictions in the future ?
Group-By in XQuery ?? • No GROUPBY currently in XQuery • A recent proposal (next) • What do YOU think ?
Group-By in XQuery ?? FOR$bIN document("http://www.bn.com")/bib/book, $yIN$b/@year WHERE$b/publisher="Morgan Kaufmann" RETURNGROUPBY $y WHERE count($b) > 10 IN <year> $y </year> with GROUPBY SELECT year FROM Bib WHERE Bib.publisher="Morgan Kaufmann" GROUPBY year HAVING count(*) > 10 Equivalent SQL
Group-By in XQuery ?? FOR $b IN document("http://www.bn.com")/bib/book,$a IN $b/author,$y IN $b/@yearRETURN GROUPBY $a, $y IN <result> $a, <year> $y </year>, <total> count($b) </total> </result> with GROUPBY FOR $Tup IN distinct(FOR $b IN document("http://www.bn.com")/bib,$a IN $b/author,$y IN $b/@year RETURN <Tup> <a> $a </a> <y> $y </y> </Tup>),$a IN $Tup/a/node(),$y IN $Tup/y/node() LET $b = document("http://www.bn.com")/bib/book[author=$a,@year=$y] RETURN <result> $a, <year> $y </year>, <total> count($b) </total> </result> Without GROUPBY
Group-By in XQuery ?? FOR$bIN document("http://www.bn.com")/bib/book,$aIN$b/author,$yIN$b/@year,$tIN$b/title,$pIN$b/publisherRETURNGROUPBY$p, $yIN <result> $p, <year> $y </year>,GROUPBY$aIN <authorEntry> $a,GROUPBY $tIN$t <authorEntry> </result> Nested GROUPBY’s
FOR/LET Clauses List of tuples of bounded variables WHERE Clause List of pruned tuples of bounded variables RETURN Clause Instance of XQuery data model XQuery Summary:[Demo] FOR-LET-WHERE-RETURN = FLWR