580 likes | 677 Views
XML Data Management 8. XQuery. Werner Nutt. Requirements for an XML Query Language. David Maier, W3C XML Query Requirements: Closedness : output must be XML Composability : wherever a set of XML elements is required, a subquery is allowed as well Support for key operations : selection
E N D
XML Data Management 8. XQuery Werner Nutt
Requirements for an XML Query Language David Maier, W3C XML Query Requirements: • Closedness: output must be XML • Composability: wherever a set of XML elements is required, a subquery is allowed as well • Support for key operations: • selection • extraction, projection • restructuring • combination, join • fusion of elements
Requirements for an XML Query Language • Can benefit from a schema, but should also be applicable without • Retains the order of nodes • Formal semantics: • structure of results should be derivable from query • defines equivalence of queries • Queries should be representable in XML documents can have embedded queries
How Does One Design a Query Language? • In most query languages, there are two aspects to a query: • Retrieving data (e.g., from … where … in SQL) • Creating output (e.g., select … in SQL) • Retrieval consists of • Pattern matching (e.g., from … ) • Filtering (e.g., where … ) … although these cannot always be clearly distinguished
XQuery Principles • Data Model identical with the XPath data model • documents are ordered, labeled trees • nodes have identity • nodes can have simple or complex types (defined in XML Schema) • A query result is an ordered list/sequence of items (nodes, values, attributes, etc., but not lists) • special case: the empty list ()
XQuery Principles (cntd) • XQuery can be used without schemas, but can be checked against DTDs and XML schemas • XQuery is a functional language • no statements • evaluation of expressions • function definitions • modules
The Recipes DTD (Reminder) <!ELEMENT recipes (recipe*)> <!ELEMENT recipe (title, ingredient+, preparation, nutrition)> <!ELEMENT title (#PCDATA)> <!ELEMENT ingredient (ingredient*, preparation?)> <!ATTLIST ingredient name CDATA #REQUIRED amount CDATA #IMPLIED unit CDATA #IMPLIED> <!ELEMENT preparation (step+)> <!ELEMENT step (#PCDATA)> <!ELEMENT nutrition EMPTY> <!ATTLIST nutrition calories CDATA #REQUIRED fat CDATA #REQUIRED>
A Query over the Recipes Document <titles> {for $r in doc("recipes.xml")//recipe return $r/title} </titles> returns <titles> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> <title>Ricotta Pie</title> … </titles>
Part to be returned as it is given {To be evaluated} Iteration $var - variables XPath Query Features <titles> {for $r in doc("recipes.xml")//recipe return $r/title} </titles> doc(String) returns input document Sequence of results,one for each variable binding
An Equivalent Stylesheet Template <xsl:templatematch="/"> <titles> <xsl:for-eachselect="//recipe"> <xsl:copy-ofselect="title"/> </xsl:for-each> </titles> </xsl:template>
Features: Summary • The result is a new XML document • A query consists of parts that are returned as is • ... and others that are evaluated (everything in {...} ) • Calling the function doc(String) returns an input document • XPath is used to retrieve node sets and values • Iteration over node sets: forbinds a variable to all nodes in a node set • Variables can be used in XPath expressions • return returns a sequence of results, one for each binding of a variable
XPath is a Fragment of XQuery • doc("recipes.xml")//recipe[1]/title • returns • <title>Beef Parmesan with Garlic Angel Hair Pasta</title> • doc("recipes.xml")//recipe[position()<=3] /title • returns • <title>Beef Parmesan with Garlic Angel Hair Pasta</title>, • <title>Ricotta Pie</title>, • <title>Linguine alla Pescadora</title> anelement a list of elements
Beware: Attributes in XPath • doc("recipes.xml")//recipe[1]/ingredient[1] /@name • → attribute name {"beef cube steak"} • string(doc("recipes.xml")//recipe[1] /ingredient[1]/@name) • → "beef cube steak" an attribute, represented as a constructor for an attribute node (not in Saxon) a value of type string
Beware: Attributes in XPath (cntd.) • <first-ingredient>{string(doc("recipes.xml")//recipe[1] /ingredient[1]/@name)}</first-ingredient> • → <first-ingredient>beef cube steak</first-ingredient> an element with string content
Beware: Attributes in XPath (cntd.) • <first-ingredient>{doc("recipes.xml")//recipe[1] /ingredient[1]/@name} • </first-ingredient> • →<first-ingredient name="beef cube steak"/> an element with an attribute • Note: The XML that we write down is only the surface structure ofthe data model that is underlying XQuery
An attribute is cast as a string Beware: Attributes in XPath (cntd.) • <first-ingredient • oldName="{doc("recipes.xml")//recipe[1] /ingredient[1]/@name}">Beef</first-ingredient> • → <first-ingredient oldName="beef cube steak"> • Beef • </first-ingredient>
element constructor attribute constructor Constructor Syntax For all constituents of documents, there are constructors element first-ingredient { attribute oldName {string(doc("recipes.xml")//recipe[1] /ingredient[1]/@name)}, "Beef" } equivalent to the notation on the previous slide
Iteration with the For-Clause • Syntax: for $var in xpath-expr • Example: for $r in doc("recipes.xml")//recipe return string($r) • The expression creates a list of bindings for a variable $var • If $var occurs in an expression exp, then exp is evaluated for each binding • For-clauses can be nested: for $r in doc("recipes.xml")//recipefor $v in doc("vegetables.xml")//vegetable return ...
What Does This Return? for $i in (1,2,3)for $j in (1,2,3) return element {concat("x",$i * $j)} {$i * $j}
Nested For-clauses: Example <my-recipes> {for $r in doc("recipes.xml")//recipe return <my-recipe title="{$r/title}"> {for $i in $r//ingredient return <my-ingredient> {string($i/@name)} </my-ingredient> } </my-recipe> } </my-recipes> Returns my-recipes with titles as attributes and my-ingredients with names as text content
The Equivalent Stylesheet Template <xsl:templatematch="/"> <my-recipes> <xsl:for-eachselect=".//recipe"> <my-recipetitle="{title}"> <xsl:for-eachselect="ingredient"> <my-ingredient> <xsl:value-ofselect="@name"/> </my-ingredient> </xsl:for-each> </my-recipe> </xsl:for-each> </my-recipes> </xsl:template>
The Let Clause Syntax: let $var := xpath-expr • binds variable$var to a list of nodes, with the nodes in document order • does not iterate over the list • allows one to keep intermediate results for reuse (not possible in SQL) Example: let $oorecps:= doc("recipes.xml")//recipe [.//ingredient/@name="olive oil"]
Let Clause: Example <calory-content>{let $oorecps := doc("recipes.xml")//recipe [.//ingredient/@name="olive oil"] for $r in $oorecps return<calories>{$r/title/text()} {": "} {string($r/nutrition/@calories)}</calories>}</calory-content> Note the implicit string concatenation Calories of recipes with olive oil
Let Clause: Example (cntd.) The query returns: <calory-content> <calories>Beef Parmesan: 1167</calories> <calories>Linguine alla Pescadora: 532</calories> </calory-content>
The Where Clause Syntax: where <condition> • occurs beforereturn clause • similar to predicates in XPath • comparisons on nodes: “=“ for node equality “<<“ and “>>” for document order • Example: for $r in doc("recipes.xml")//recipe where $r//ingredient/@name="olive oil" return ...
Quantifiers • Syntax:some/every$varin<node-set>satisfies<expr> • $var is bound to all nodes in <node-set> • Test succeeds if<expr>is true for some/every binding • Note: if <node-set>is empty, then “some” is false and “all” is true
Quantifiers (Example) • Recipes that have some compound ingredient • Recipes where every top levelingredient is non-compound for $r in doc("recipes.xml")//recipe where some $i in $r/ingredient satisfies $i/ingredient return $r/title for $r in doc("recipes.xml")//recipe where every $i in $r/ingredient satisfies not($i/ingredient) return $r/title
an attribute an element Element Fusion “To every recipe, add the attribute calories!” <result> {let $rs := doc("recipes.xml")//recipe for $r in $rs return <recipe> {$r/nutrition/@calories} {$r/title} </recipe>} </result>
Element Fusion (cntd.) The query result: <result> <recipe calories="1167"> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> </recipe> <recipe calories="349"><title>Ricotta Pie</title></recipe> <recipe calories="532"><title>Linguine Pescadoro</title></recipe> <recipe calories="612"><title>Zuppa Inglese</title></recipe> <recipe calories="8892"> <title>Cailles en Sarcophages</title> </recipe> </result>
Fusion with Mixed Syntax We mix constructor and XML–Syntax: element result {let $rs := doc("recipes.xml")//recipe for $r in $rs return <recipe> {attribute calories {$r/nutrition/@calories}} {$r/title} </recipe>}
The Same with Constructor Syntax Only element result {let $rs := doc("recipes.xml")//recipe for $r in $rs return element recipe { attribute calories{$r/nutrition/@calories}, $r/title } }
Join condition Join “Pair every ingredient with the recipes where it is used!” let $rs := doc("recipes.xml")//recipe for $i in $rs//ingredient for $r in $rs where $r//ingredient/@name=$i/@name return <usedin> {$i/@name} {$r/title} </usedin>
Join (cntd.) The query result: <usedin name="beef cube steak"> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> </usedin>, <usedin name="onion, sliced into thin rings"> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> </usedin>, <usedin name="green bell pepper, sliced in rings"> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> </usedin>
Join Exercise Return all pairs of ingredients such that • the ingredients have the same name, • but occur with different amounts and return • the recipes where each of them is used • together with the amount being used in those recipes, while returning every pair only once. Could a query for these ingredients be expressed in XPath?
Join condition Document Inversion “For every ingredient, return all the recipes where it is used!” <result> {let $rs := doc("recipes.xml")//recipe for $i in $rs//ingredient return <ingredient> {$i/@*} {$rs[.//ingredient/@name=$i/@name]/title} </ingredient>} </result>
Document Inversion (cntd.) The query result: <result> <ingredient amount="1" name="Alchermes liquor" unit="cup"> <title>Zuppa Inglese</title> </ingredient> … <ingredient amount="2" name="olive oil" unit="tablespoon"> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> <title>Linguine Pescadoro</title> </ingredient> …
Eliminating Duplicates The function distinct-values(Node Set) • extracts the values of a sequence of nodes • creates a duplicate free list of values Note the coercion: nodes are cast as values! Example: let $rs := doc("recipes.xml")//recipereturn distinct-values($rs//ingredient/@name) yields xdt:untypedAtomic("beef cube steak"), xdt:untypedAtomic("onion, sliced into thin rings"), ... by the Galaxengine
Avoiding Multiple Results in a Join We want that every ingredient is listed only once: Eliminate duplicates using distinct-values! <result>{let $rs := doc("recipes.xml")//recipe for $in in distinct-values( $rs//ingredient/@name) return<recipes with="{$in}">{$rs[.//ingredient/@name=$in]/title}</recipes> }</result>
Avoiding Multiple Results (cntd.) The query result: <result> <recipes with="beef cube steak"> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> </recipes> <recipes with="onion, sliced into thin rings"> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> </recipes>... <recipes with="salt"> <title>Linguine Pescadoro</title> <title>Cailles en Sarcophages</title> </recipes> ...
The Order By Clause Syntax: order byexpr[ascending|descending] for $iname in doc("recipes.xml")//@name order by $iname descending return string($iname) yields "whole peppercorns","whole baby clams","white sugar",...
The Order By Clause (cntd.) let $rs := doc("recipes.xml")//@namefor $r in $rsorder by $r/nutrition/@caloriesreturn $r/title In which order will the titles come?
The Order By Clause (cntd.) The interpreter must be told whether the values should be regarded as numbersor as strings (alphanumerical sorting is default) for $r in $rsorder by number($r/nutrition/@calories)return $r/title Note: • The query returns titles ... • but the ordering is according to calories, which do not appear in the output Also possible in SQL! What if combined with distinct-values?
FLWOR Expresssions (pronounced “flower”) We have now seen the main ingredients of XQuery: • For and Let clauses, which can be mixed • a Where clause imposing conditions • an Orderby clause, which determines the order of results • a Return clause, which constructs the output. Combining these yields FLWOR expressions.
Conditionals if(expr)thenexprelseexpr Example let $is := doc("recipes.xml")//ingredientfor $i in $is[not(ingredient)]let $u := if (not($i/@unit)) then attribute unit {"pieces"} else () creates an attributeunit="pieces" if none existsand an empty item list otherwise
Collects all attributes in a list and adds a unitif needed Conditionals (cntd.) We use the conditional to construct variants of ingredients: let $is := doc("recipes.xml")//ingredientfor $i in $is[not(ingredient)] let $u := if (not($i/@unit)) then attribute {"unit"} {"pieces"} else ()return<ingredient> {$i/@* | $u}</ingredient>
Conditionals (cntd.) The query result: <ingredient name="beef cube steak" amount="1.5" unit="pound"/>, ... <ingredient name="eggs" amount="12" unit="pieces"/>,…
Exercises Write queries that produce • A list, containing for every recipe the recipe's title element and an element with the number of calories • The same, ordered according to calories • The same, alphabetically ordered according to title • The same, ordered according to the fat content • The same, with title as attribute and calories as content. • A list, containing for every recipe the top level ingredients, dropping the lower level ingredients
Sample Solution 1 A list, containing for every recipe the recipe's title element and an element with the number of calories <result> {for $r in doc("recipes.xml")//recipe return ($r/title, <calories> {number($r//@calories)} </calories>) } </result> The results returned are 2-element lists. The list constructor is “( . , . )”
Sample Solution 6 <results>{for $r in doc("recipes.xml")//recipe return<recipe>{attribute title {$r/title}, for $i in $r/ingredient return if (not($i/ingredient)) then $i else<ingredient> {$i/@*}</ingredient>}</recipe>}</results>
Aggregation Aggregation functionscount, sum, avg, min, max Example: The number of recipes with olive oil let $doc := doc("recipes.xml”)return<number> {count($doc//recipe [.//ingredient/@name = "olive oil"])}</number>