1 / 153

XQuery

XQuery. Querying XML Documents. XQuery. XQuery is a truly declarative language specifically designed for the purpose of querying XML data. As such, XML assumes the role that SQL occupies in the context of relational databases.

aviv
Download Presentation

XQuery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XQuery Querying XML Documents

  2. XQuery • XQuery is a truly declarative language specifically designed for the purpose of querying XML data. • As such, XML assumes the role that SQL occupies in the context of relational databases. • XQuery exhibits properties known from database (DB) languages as well as from (functional) programming (PL) languages. • Programming language features: • explicit iteration and variable bindings (for…in, let…in) • recursive, user-defined functions • regular expressions, strong [static] typing • ordered sequences (much like lists or arrays) • Database query language features: • filtering • grouping, joins

  3. Main Principles The design of XQuery satisfies the following rules: • Closed-form evaluation. XQuery relies on a data model, and each query maps an instance of the model to another instance of the model. • Composition. XQuery relies on expressions which can be composed to form arbitrarily rich queries. • Type awareness. XQuery may associate an XSD schema to query interpretation. But XQuery also operates on schema-free documents. • XPath compatibility. XQuery is an extension of XPath 2.0 (thus, any XPath expression is also an XQuery expression). • Static analysis. Type inference, rewriting, optimization: the goal is to exploit the declarative nature of XQuery for clever evaluation.

  4. Concepts of XQuery • Declarative/Functional • No execution order! • Document Order • all nodes are in "textual order" • Node Identity • all nodes can be uniquely identified • Atomization • Effective Boolean Value • Type system

  5. A Simple Model for Document Collections A value is a sequence of 0 to n items. An item is either a node or an atomic value. There exist 7 kinds of nodes: • Document, the document root; • Element, named, mark the structure of the document; • Attribute, named and valued, associated to an Element; • Text, unnamed and valued; • Comment; • ProcessingInstruction; • Namespace. The model is quite general: everything is a sequence of items. This covers anything from a single integer value to wide collections of large XML documents.

  6. Examples of Values The following are examples of values • 47 : a sequence with a single item (atomic value); • </a> : a sequence with a single item (Element node); • (1, 2, 3) : a sequence with 3 atomic values. • (47, <a/>, ``Hello'') : a sequence with 3 items, each of different kinds. • () : the empty sequence; • an XML document; • many XML documents (a collection).

  7. Sequences: Details • There is no distinction between an item and a sequence of length 1 (everything is a sequence). • Sequence cannot be nested (a sequence never contains another sequence) • The notion of null value does not exist in the XQuery model: a value is there, or not. • A sequence may be empty • A sequence may contain heterogeneous items (see previous examples). • A sequence can contain duplicates (by value and by identity) • Sequences are ordered: two sequences with the same set of items, but ordered differently, are different.

  8. Items: Details • Nodes have an identity; values don't. • Element and Attribute have type annotations, which may be inferred from the XSD schema (or unknown if the schema is not provided). • Nodes appear in a given order in their document. Attribute order is undefined.

  9. Syntactic Aspects of XQuery • XQuery is a case-sensitive language (keywords must be written in lowercase). • XQuery builds queries as composition of expressions. • An expression takes a value (a sequence of items) and produces a value, and is side-effect free (no modification of the context, in particular variable values). • XQuery comments can be put anywhere. Syntax: (: This is a comment :)

  10. Evaluation Context An expression is always evaluated with respect to a context. It is a slight generalization of XPath and XSLT contexts, and includes: • Bindings of namespace prefixes with namespaces URIs • Bindings for variables • In-scope functions • A set of available collections and a default collection • Date and time • Context (current) node • Position of the context node in the context sequence • Size of the sequence

  11. XML Query Structure • An XQuery basic structure: • a prolog+ an expression • Role of the prolog: • Populate the context where the expression is compiled and evaluated • Prologue contains: • namespace definitions • schema imports • default element and function namespace • function definitions • function library (=module) imports • global and external variables definitions • …

  12. XQuery Grammar XQuery Expr :=Literal | Variable | FunctionCalls | PathExpr | ComparisonExpr | ArithmeticExpr| LogicExpr | FLWRExpr | ConditionalExpr | QuantifiedExpr |TypeSwitchExpr | InstanceofExpr | CastExpr | UnionExpr | IntersectExceptExpr | ConstructorExpr | ValidateExpr Expressions can be nested with full generality! Functional programming heritage.

  13. Literal XQuery grammar has built-in support for: • Strings:"125.0" or ‘125.0’ • Integers:150 • Decimal:125.0 • Double:125.e2 • 19 other atomic types available via XML Schema • Values can be constructed • with constructors in F&O doc: fn:true(), fn:date("2002-5-20") • by casting • by schema validation

  14. Constructing Sequences (1, 2, 2, 3, 3, <a/>, <b/>) • "," is the sequence concatenation operator • Nested sequences are flattened: (1, 2, 2, (3, 3)) => (1, 2, 2, 3,3) • range expressions: (1 to 3) => (1,2,3)

  15. Variable • $ + QName • bound, not assigned • XQuery does not allow variable assignment • created by let, for, some/every, typeswitch expressions, function parameters • example: let $x := ( 1, 2, 3 ) return count($x) • above scoping ends at conclusion of return expression

  16. Function Call • Calling: • my:function(parameter, …) • Signatures: • fn:function-name($parameter-name as parameter-type, ...) as return-type • No overloading on type • Careful with sequences: • my:function(1,2,3)  my:function((1,2,3)) • Large library of built-in functions ("F&O") • Namespace http://www.w3.org/2006/xpath-functions, Prefix fn: • Shared with XSLT, XPath 2.0 • Also type constructor functions xs:atomicType(…)

  17. Retrieving Documents and Collections A query takes in general as input one of several sequences of XML documents, called collections. XQuery identifies its input(s) with the following functions: • doc() takes the URI of an XML document and returns a singleton document tree; • collection() takes a URI and returns a sequence. The result of the doc () function is the root node of the document tree, and its type is Document.

  18. Working with Sequences • XQuery comes with an extensive library of built-in functions to perform common computations over sequences:

  19. Arithmetic Expressions 1 + 4 $a div 5 5 div 6 $b mod 10 1 - (4 * 8.5) -55.5 <a>42</a> + 1 <a>baz</a> + 1 validate {<a xsi:type="xs:integer">42</a>} + 1 validate {<a xsi:type="xs:string">42</a>} + 1 <x>1</x> + 41 → 42.0 () * 42 → () (1,2) - (2,3) → type error!

  20. Arithmetic Expression - Evaluation • Apply the following rules: • atomize all operands. • if either operand is (), => () • if an operand is untyped, cast to xs:double (if unable, => error) • if the operand types differ but can be promoted to common type, do so (e.g.: xs:integer can be promoted to xs:double) • if operator is consistent w/ types, apply it; result is either atomic value or error • if type is not consistent, throw type exception

  21. Logical Expressions expr1 andexpr2 expr1 orexpr2 • return true, false • Different from SQL • twovalue logic, not three value logic • Different from imperative languages • and, or are commutative, but not in Java • false and error => both false orerror possible! (non-deterministically) • For each expression, compute EBV • then use standard two value Boolean logic on the two EBV's as appropriate

  22. Comparisons

  23. Examples of Comparisons • <a>42</a> eq "42” true • <a>42</a> eq 42 error • <a>42</a> eq "42.0" false • <a>42</a> eq 42.0 error • <a>42</a> = 42 true • <a>42</a> = 42.0 true • <a>42</a> eq <b>42</b> true • <a>42</a> eq <b> 42</b> false • <a>baz</a> eq 42 error • (<a>42</a>, <b>43</b>) = 42.0 true • (<a>42</a>, <b>43</b>) = "42" true • <x>42</x> is <x>42</x> false

  24. Examples of Comparisons • () eq 42 () • () = 42 false • ns:shoesize(5) eq ns:hatsize(5) true (shoesize, hatsize are derived types of xs:integer) • (1,2) = 1 true • (1,2) = (2,3) true • (1,2,3) > (2,4,5) true • 2 <= 1 false • (1,2,3) != 3 true • (1,2) != (1,2) true • 2 gt 1.0 true • (0,1) eq 0 error

  25. Algebraic Properties of Comparisons • General comparisons not reflexive, not transitive • (1,3) = (1,2) (but also !=, <, >, <=, >=) • Reasons: • implicit existential quantification, dynamic casts • Negation rule does not hold • fn:not($x = $y) is not equivalent to $x != $y • Value comparisons are almost transitive • Exception: • xs:decimal due to the loss of precision

  26. Conditional Expression if ($book/@year < 1980) then <old>{$x/title}</old> else <new>{$x/title}</new> • Only one branch allowed to raise execution errors • Impacts scheduling and parallelization

  27. Path Expressions • Any XPath expression is a query. • Second order expression expr1 / expr2 • Semantics: • Evaluate expr1 => sequence of nodes • Bind . to each node in this sequence • Evaluate expr2 with this binding => sequence of nodes • Concatenate the partial sequences • Eliminate duplicates • Sort by document order • Implicit iteration • E.g. collection('movies')/ movie[year =2005]/ title

  28. Path Expressions • A standalone step is an expression • Any kind of expression can be a step! • Step in the non-abbreviated syntax: axis ‘::’ nodeTest • Axis control the navigation direction in the tree • attribute, child, descendant, descendant-or-self, parent, self • The other XPath 1.0 axes are optional • Node test by: • Name (e.g. publisher, myNS:publisher, *: publisher, myNS:* , *:* ) • Kind of item (e.g. node(), comment(), text()) • Type test (e.g. element(ns:PO, ns:PoType), attribute(*, xs:integer))

  29. Constructors • XQuery expressions may construct nodes with new identity of all 7 node kinds known in XML: document nodes, elements, attributes, text nodes, comments, processing instructions (and namespace nodes). • Since item sequences are flat, the nested application of node constructors is the only way to hierarchically structure values in XQuery: • Nested elements may be used to group or compose data, and, ultimately, • XQuery may be used as an XSLT replacement, i.e., as an XML transformation language.

  30. Direct Node Constructor XQuery node constructors come in two flavors: • direct constructors and • computed constructors • The syntax of direct constructors exactly matches the XML syntax: any well-formed XML fragment f also is a correct XQuery expression (which, when evaluated, yields f ). Note: Text content and CDATA sections are both mapped into text nodes by the XQuery data model (“CDATA isn't remembered."). E.g. in XQuery: <x><![CDATA[foo & bar]]></x> ≡ <x>foo &amp; bar</x>

  31. Direct Element Constructor • To construct an element with a known name and content, use XML syntax: <book isbn="12345"> <title> The politics of Experience </title> </book> • If the content of an element or attribute must be computed, use a nested expression enclosed in { } <book isbn="{ $x }"> { $b/title } </book>

  32. Computed Element Constructor • If both the name and the content must be computed, use a computed constructor: element {e1} {e2} expression e1 (of type string or QName) determines the element name, e2 determines the sequence of nodes in the element's content. • Example: element { string-join(("foo","bar"),"-") } { 40+2 } → <foo-bar>42</foo-bar>

  33. An Application of Computed Element Constructor Consider a dictionary in XML format (bound to variable $dict) with entries like <entry word="address"> <variant lang="de">Adresse</variant> <variant lang="it">indirizzo</variant> </entry> We can use this dictionary to “translate" the tag name of an XML element $e into Italian as follows, preserving its contents: element { $dict/entry[@word=name($e)]/variant[@lang="it"] } { $e/@*, $e/node() }

  34. Direct and Computed Attribute Constructors • In direct attribute constructors, computed content may be embedded using curly braces. <x a=“{(4,2)}"/> → <x a="4 2"/> <x a=“{{" b=‘}}'/> → <x b=“}" a=“{"/> <x a=“ ‘ " b=‘ “ '/> → <x a=“ ‘ " b="&quot;"/> • A computed attribute constructor: attribute {e1} {e2} ,allows to construct parent-less attributes (impossible in XML) with computed names and content. let $a := attribute {"a"} { sum((40,2)) } return <x>{ $a }</x>

  35. Text Node Constructor Text nodes may be constructed in one of three ways: • Characters in element content, • via <![CDATA[ ]]>, or • using the computed text constructor: text {e}. • Content sequence e is atomized to yield a sequence of type anyAtomicType*. The atomic values are converted to type string and then concatenated with an intervening " ". If e is (), no text node is constructed --- the constructor yields (). text { (1,2,3) } → text { "1 2 3" }

  36. XML Documents vs. Fragments • Unlike XML fragments, an XML document is rooted in its document node. The difference is observable via XPath: Remember the (invisible) document root node! xy.xml: <x><y/></x> doc("xy.xml")/* → <x><y/></x> <x><y/></x>/* → <y/> The context node for the first expression above is the document node for document xy.xml. • A document node may be constructed via document {e}. (document { <x><y/></x> })/* → <x><y/></x>

  37. Well-Formed XML Fragments XML fragments constructed by XQuery expressions are subject to the XML rules of well-formedness, e.g., • no two attributes of the same element may share a name, • attribute nodes precede any other element content. Violating the well-formedness rules: element x { attribute {$id} {0}, attribute {"id"} {1} } (dynamic error!) <x>foo{ attribute id {0} }</x> (type error!)

  38. Processing Element Content • The XQuery element constructor is quite flexible: the content sequence is not restricted and may have type item*. • Yet, the content of an element needs to be of type node*: • Consecutive literal characters yield a single text node containing these characters. • Expression enclosed in { } are evaluated. • Adjacent atomic values are cast to type string and collected in a single text node with intervening " ". • A node is copied into the content together with its content. All copied nodes receive a new identity. • Then, adjacent text nodes are merged by concatenating their content. Text nodes with content "" are dropped.

  39. Example <x>Fortytwo{40 + 2}{ "foo", 3.1415, <y><z/></y>, ("", "!")[1] }</x> The constructed node is: x text y z “Fortytwo42foo 3.1415”

  40. FLOWR Expressions The most powerful expressions in XQuery. A (“flower”) exp. • iterates over sequences (for); • defines and binds variables (let); • sort the result (order); • apply predicates (where); • construct a result (return). An example (without let): for $m in collection('movies')/ movie where $m/ year >= 2005 return <film>{ $m/ title/ text ()} , " directed by " {$m/ director/ last_name/ text ()} </film>

  41. FLOWR Expressions • Syntax: ([ ] means optional) • for $v in e1 [ where e3 ] [ order by ... ] return e2 • let $v := e1 [ where e3 ] [ order by ... ] return e2 • May include sequences of for/let bindings • let $x:=1 let $y:=2 return $x+$y

  42. for for$xin expr1 return expr2 • Semantics: • bind the variable $x to each item returned by expr1 • for each such binding evaluate expr2 • concatenate the resulting sequences • nested sequences are automatically flattened for $x in doc("bib.xml")/bib/book return <result> { $x } </result> Returns <result> <book>...</book></result> <result> <book>...</book></result> • at keyword can be used to count the iteration: for $x at $i in doc("books.xml")/bookstore/book/title return <book>{$i}. {data($x)}</book>

  43. let let$x := expr1 return expr2 • Semantics: • bind the variable $x to the result of the expr1 • add this binding to the current environment • evaluate and return expr2 let $x := doc("bib.xml")/bib/book return <result> { $x } </result> Returns <result> <book>...</book> <book>...</book> </result> • Useful for common sub-expressions and for aggregations

  44. Defining Variables with let • let binds a name to a value, i.e., a sequence obtained by any convenient mean, ranging from literals to complex queries: let $m := doc("movies/Spider -Man.xml")/ movie return $m/ director/ last_name • A variable is just a synonym for its value: let $m := doc("movies/Spider -Man.xml")/ movie for $a in $m/ actor return $a/ last_name • The scope of a variable is that of the FLWR expression where it is defined. Variables cannot be redefined or updated within their scope.

  45. where • where is quite similar to its SQL synonym. The difference lies in the much more flexible structure of XML documents. Find the movies directed by M. Allen for $m in collection(" movies")/ movie where $m/ director / last_name =" Allen" return $m/ title Looks like a SQL query? Yes but predicates are interpreted according to the XPath rules: • if a path does not exists, the result is false, no typing error! • if a path expression returns several nodes: the result is true if there is at least one match. Find movies with Kirsten Dunst (note: many actors in a movie!) for $m in collection(" movies")/ movie where $m/ actor / last_name =" Dunst" return $m/ title

  46. Example Query for $b in doc(‘books.xml’)//book where $b/author/firstname = ‘John’ and $b/author/lastname = ‘Smith’ return <book> { $b/title, $b/price } </book> • May return: <book><title>XML</title><price>29.99</price></book> <book><title>DOM and SAX</title><price>40</price> </book>

  47. What about This? for $b in doc(‘books.xml’)//book where $b/author/firstname = ‘John’ and $b/author/lastname = ‘Smith’ return <book> $b/title, $b/price </book> • Will return: <book>$b/title,$b/price</book> <book>$b/title,$b/price</book>

  48. Equivalent Query for $b in doc(‘books.xml’)//book [author/firstname = ‘John’ and author/lastname = ‘Smith’] return <book> { $b/title, $b/price } </book> • What about this book? <book> <author><firtstname>Mary</firstname> <lastname>Smith</lastname> </author> <author><firtstname>John</firstname> <lastname>Travolta</lastname> </author> </book>

  49. Accurate Ones for $b in doc(‘books.xml’)//book where $b/author[firstname = ‘John’ and lastname = ‘Smith’] return <book> { $b/title, $b/price } </book> • Or for $b in doc(‘books.xml’)//book [author[firstname = ‘John’ and lastname = ‘Smith’]] return <book> { $b/title, $b/price } </book>

  50. Join • For each book, list the prices offered by amazon and bn <books-with-prices>{ for $a in doc(“amazon.xml”)/book, $b in doc(“bn.xml”)/book where $b/@isbn = $a/@isbn return <book> { $a/title } <price-amazon>{ $a/price }</price-amazon>, <price-bn>{ $b/price }</price-bn> </book> }</books-with prices>

More Related