340 likes | 466 Views
Database Systems I Query Languages for XML. Query Languages for XML. XPath is a simple query language based on describing similar paths in XML documents. XQuery extends XPath in a style similar to SQL, introducing iterations, subqueries, etc.
E N D
Query Languages for XML • XPath is a simple query language based on describing similar paths in XML documents. • XQuery extends XPath in a style similar to SQL, introducing iterations, subqueries, etc. • XPath and XQuery expressions are applied to an XML document and return a sequence of qualifying items. • Items can be primitive values or nodes (elements, attributes, documents). • The items returned do not need to be of the same type.
XPath • A path expression returns the sequence of all qualifying items that are reachable from the input item following the specified path. • A path expression is a sequence consisting of tags or attributes and special characters such as slashes (“/”). • Absolute path expressions are applied to some XML document and returns all elements that are reachable from the document’s root element following the specified path. • Relative path expressions are applied to an arbitrary node.
XPath <?XML version=“1.0” standalone =“yes” ?> <bibliography> <book bookID = “b100“> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> … </bibliography> • Applied to the above document, the XPath expression /bibliography/book/author returns the sequence <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> . . .
Attributes • If we do not want to return the qualifying elements, but the value one of their attributes, we end the path expression with @attribute. • Applied to the above document, the XPath expression /bibliography/book/@bookID returns the sequence “b100“ . . .
Axes • XPath provides a variety of axes, i.e. modes of navigation through semistructured data. • At each step of a path expression, we can prefix a tag or attribute name by an axis name and a colon. • For example, the path expression /child::bibliography/child::book/attribute::bookID is equivalent to /bibliography/book/@bookID. • Descendants are all direct and indirect children of a node.
Axes • Axes include • parent, • ancestor, • descendant, • next-sibling, • previous-sibling, • self, and • descendant-or-self. • XPath has the following shorthands for axes: / child, // descendant-or-self, @ attribute, . self, .. parent.
Axes <bibliography> <book bookID = “b100“> <title> Foundations… </title> <author affiliation = “IBM“> Abiteboul </author> <author> Hull </author> . . . </book> <article articleID = “a245“> <header> <author authorID = “a739“> Codd </author> <title> A relational database model </title> </header> <body> . . . </body> </article> </bibliography> • Applied to the above document, the path expression /bibliography//author returns the sequence <author> Abiteboul </author> <author> Hull </author> <author> Codd </author> .
Wildcards • We can use wildcards instead of actual tags and attributes:* means any tag, and @* means any attribute. • Examples/bibliography/*/author returns the sequence <author> Abiteboul </author> <author> Hull </author>./bibliography//author/@* returns the sequence “IBM“ “a739“.
Conditions • We can restrict the qualifying paths to those that satisfy a given condition, surrounded by square brackets. • Conditions can be anything returning a boolean value. • In particular, conditions can be: [<subpath>=<value>] there exists a subpath with the specified value [i] the element is the i-th element of the specified type • Example /bibliography/book[/title=“Foundations…”]/author[2] returns <author> Hull </author>.
for/let clauses sequence of items where clause sequence of items order-by/return clause XQuery • XQuery extends XPath, i.e. every XPath expression is an XQuery expression. • Beyond XPath expressions, XQuery introduces FLWOR expressions. • Format: for let where order-by return
XQuery • FLWOR expressions are similar to SQL select . . from . . . where . . . queries. • XQuery allows zero, one or more for and let clauses. • The where clause is optional. • There is one optional order-by clause. • Finally, there is exactly one return clause. • XQuery is case-sensitive. • XQuery (and XPath) is a W3C standard.
XQuery • XQuery is a functional language. • Any XQuery expression can be used in any place that an expression is expected. • SQL also allows subqueries in many places. However, SQL does, e.g., not allow any subquery to be any operand of any comparison in a WHERE clause. • This implies that every XQuery operator must be defined for operands that are sequences of items, not just for individual items.
XQuery Clauses • for $x in expr • Defines node variable$x. • The expression expr evaluates to a sequence of items. • The variable $x is assigned to each item, in turn, and the body of the for clause is executed once for each assignment. • let $x := expr • Defines collection variable$x. • The expression expr evaluates to a sequence of items. • The variable is bound to the entire sequence of items. • Useful for common subexpressions and for aggregations.
XQuery Clauses • where condition • The condition is a boolean expression. • The clause is applied to some item. • If and only if the condition evaluates to true, the following return clause is executed for that item. • return expression • The result of a FLWOR clause is a sequence of items. • Expression defines the result format for the current (qualifying) item. • The sequence of items produced by expression is appended to the sequence of items produced so far.
Document Nodes • The context for a for or let clause is often provided by a document node. • Typically, the document comes from a file. • The doc function constructs a document node from a file with a given name. • Examplesdoc("bib.xml") doc(“infolab.stanford.edu/~hector/movies.xml”)
Interpretation as XQuery Expression • XQuery expressions can be used wherever an XML expression of any kind is permitted. • Any text string is acceptable as content of a tag or value of an attribute. • If a string contains an XQuery expression that should be evaluated, this substring must be surrounded by curly brackets {}. • Examplefor $b indoc("bib.xml")/bibliography/bookreturn <result id = {$b/@bookID}>{$b/title}</result>
XQuery Examples • Find all books. • for vs. let Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ... for$xindoc("bib.xml")/bibliography/book return <result> {$x} </result> Returns: <result> <book>...</book> <book>...</book> <book>...</book> ... </result> let$x:=doc("bib.xml")/bibliography/book return <result> {$x} </result>
XQuery Examples • Find all titles of books published after 1995. for$xindoc("bib.xml")/bibliography/book where$x/year > 1995 return$x/title Result: <title> abc </title> <title> def </title> <title> ghi </title>
Ordering the Query Result • The order-by clause allows you to order the results of an XQuery expression. order-by list of expressions • The sort order is based on the value of the first expression. Ties are broken based on the value of the second (if necessary third etc.) expression. • By default, the order is ascending. • A descending sort order can be specified using descending.
Elimination of Duplicates • The built-in function distinct-values eliminates duplicates from a sequence of result items. • In principle, it applies only to primitive (atomic) types. • It can also be applied to elements, but then it will remove their tags, replacing them by quotes “”. • Example If return $b/title produces <title> aaa </title> <title> bbb </title> <title> aaa </title> then distinct-values (return $b/title) produces “aaa” “bbb”.
XQuery Examples • Find all books published by Morgan Kaufman and list them in descending order of their prices. • Uses order-by withoption descending. for$bindoc("bib.xml")/bibliography/book[publisher=“Morgan Kaufmann”]) order-by $b/price descending return $b
XQuery Examples • For each author of a book published by Morgan Kaufmann, list the author and the titles of all books she published. • Uses nested subquery andfunction distinct-values. for$aindistinct-values(doc("bib.xml")/bibliography/book[publisher=“Morgan Kaufmann”]/author) return <result> {$a} {for$tin /bib/book[author=$a]/title return$t} </result>
XQuery Examples Result: <result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>
Joins • We can join two or more documents, by using one variable for each of the documents . • We let a variable range over the elements of the corresponding document, within a for-clause. • Need to be careful when comparing elements for equality, since their equality is by element identity, not by element content. • Typically, we want to compare the element content. • The built-in function data(E) returns the content of an element E.
Example • Find all pairs of titles of books from the same year. • Uses two variables ranging over booksand the data function applied to their year elements. let $books:=doc("bib.xml") for$b1indoc("bib.xml")/bibliography/book,$b2indoc("bib.xml")/bibliography/book wheredata($b1/year) = data($b2 /year) return <result>{$b1/title} {$b2/title} </result>
Comparison Operators • XQuery supports the standard comparison operators such as <, >, =. • Comparison operators are applied to a sequence of items. • Comparisons have an existential nature. I.e., they return true if and only if at least one of the items satisfies the condition of the comparison. • for $b in doc("bib.xml")/bibliography/book/ where $b/author/firstname = “A” and $b/author/lastname = “B” return $b Books returned can have one author with firstname A and another author with lastname B.
Comparison Operators • XQuery also supports special comparison operators that only compare sequences consisting of a single item: eq, ne, lt, gt, ge. • These comparisons fail if one of the operands contains more than one item. • XQuery also provides built-in functions for approximate string matching, in particular contains($p, "windsurfing").
Quantification • XQuery supports the existential and the universal quantifier. • Universal quantifierevery $v in expression1 satisfies expression 2 • Existential quantifier some $v in expression1 satisfies expression 2 • Expression1 evaluates to a sequence of items, expression 2 is a boolean expression.
Aggregation • XQuery provides built-in functions for the standard aggregations such as SUM, MIN, COUNT and AVG. • They can be applied to any XQuery expression, i.e. to any sequence of items. • Exampleavg(doc("bib.xml")/bibliography/book/price)count(doc("bib.xml")/bibliography/book/price)Computes the average book price and the number of books, resp.
XQuery Examples • Find books whose price is larger than the average price. • Uses aggregate operator (avg), applied to the result of a path expression. let$a:=avg(doc("bib.xml")/bibliography/book/price) for$bindoc("bib.xml")/bibliography/book where$b/price > $a return$b
XQuery Examples • Find title of books with a paragraph containing the terms “sailing” and “windsurfing”. • Uses existential quantifier (some) and string matching (contains). for$bindoc("bib.xml")//book where some$pin$b//parasatisfies contains($p, "sailing") and contains($p, "windsurfing") return$b/title
XQuery Examples • Find the title of books where every paragraph contains the terms “sailing”. • Uses universal quantifier (every) and string matching (contains). for$bindoc("bib.xml")//book where every$pin$b//parasatisfies contains($p, "sailing") return$b/title
Summary • XQuery is the standard XML query language. • It is a functional language, i.e. any XQuery expression can be used in any place where an expression is expected. • An XQuery expression consists of for, let, where, order and return clauses, of which some are optional. • The main new concept compared to SQL are path expressions that return sets of elements reachable via the given path. • Path expressions are defined in XPath, a sublanguage of XQuery. • In addition, XQuery has equivalent constructs for most of the main SQL constructs, in particular quantifiers and aggregate functions.