1.07k likes | 1.31k Views
5 Querying XML. How to access various XML data sources? XQuery , XML Query Lang, W3C Rec , Jan '07 joint work by XML Query and XSL WGs with XPath 2.0 and XSLT 2.0 Started ~1999; 2 nd Ed. in Dec 2010 influenced by many research groups and query languages
E N D
5 Querying XML • How to access various XML data sources? • XQuery, XML Query Lang, W3C Rec, Jan '07 • joint work by XML Query and XSL WGs • with XPath 2.0 and XSLT 2.0 • Started ~1999; 2nd Ed. in Dec 2010 • influenced by many research groups and query languages • Quilt, XPath, XQL, XML-QL, SQL, OQL, Lorel, ... • A query language for any XML-represented data: both documents and databases 5. Querying XML with XQuery
Outline of this section • Quick overview of XQuery • Review of XPath, emphasizing XPath 2.0 vs 1.0 • items, types, and sequences • tree model, location path expressions; comparison operators • Central features of XQuery (over those of XPath 2.0) • element constructors, FLWOR expressions • use cases, user-defined functions, querying relational data • Comparison of XQuery and XSLT 1.0 • XQuery for problem solving • examples of application to puzzles • Summary 5. Querying XML with XQuery
Capabilities of XQuery (1) • XQuery allows to select, reorganize and transform XML data • respecting document content, structure, hierarchy, and order • Selection, filtering, and search • Combine and join • data from different parts of a document, or from multiple documents • Sort, group, and aggregate • Transform, restructure and create XML data • Operate on numbers and dates • Manipulate content strings 5. Querying XML with XQuery
Capabilities of XQuery (2) • Closure property: • Results of XML queries are also XML (well-formed document fragments) • > queries can be combined, without limit • Extensibility: • supports user-defined functions on all data types of the data model • In-place update of XML data not supported • specified in “XQuery Update Facility 1.0”, W3C Rec. March 2011 5. Querying XML with XQuery
XQuery in a Nutshell • Functional expression language (lausekekieli) • Strongly-typed: (optional) type-checking of expressions, and validation of results (We’ll concentrate to processing) • predeclaredprefix for typenames:xs="http://www.w3.org/2001/XMLSchema" • Extends XPath 2.0 • XQuery 1.0 and XPath 2.0Functions and Operators, Rec. Jan. 2007 • over 100; for numbers, strings, dates and times, Booleans, documents & URIs, nodes, and sequences • XQuery XPath 2.0 + XSLT' + SQL' (roughly) 5. Querying XML with XQuery
Example Query xquery version "1.0"; (: optionaldeclaration :) <cheapBooks> <Title>CheapBooks</Title> { for $b indoc("bib.xml")//book[@price < 50]orderby $b/titlereturn $b } </cheapBooks> • Syntax "concise and easilyunderstood" • XML-basedsyntax (XQueryX) hasalsobeenspecified • easier for applications, harder for humans 5. Querying XML with XQuery
A possible result <?xml version="1.0" encoding="UTF-8"?><cheapBooks> <Title>Cheap Books</Title> <book price="26.50"> <title>Computing with Logic</title> <author>David Maier</author> <publisher>Benjamin Cummings</publisher> <year>1999</year> </book> <book price="40.00"> <title>Designing Internet applications</title> <author>Michael Leventhal</author> <publisher>Prentice Hall</publisher> <year>1998</year> </book></cheapBooks> 5. Querying XML with XQuery
XQuery and XPath • XQuery is an extension of XPath (2.0) • Common data model, 108 functions and 68 operators • > review some XPath first • XPath used in several other contexts, too: • For pattern matching and selection in XSLT • in validation rules of Schematron • For uniqueness constraints in XML Schema • For addressing in XLink and XPointer 5. Querying XML with XQuery
XPath in a Nutshell • XPath 1.0 (W3C Rec. 11/99) • a compact non-XML syntax for addressing parts of XML documents (as node-sets) • also operations on strings, numbers and truth values • XPath 2.0 (W3C Rec. 1/07) extends and generalizes: • data manipulated as sequences of items • Item = a node or an atomic value of a simple XML Schema datatype 5. Querying XML with XQuery
Literal Atomic Values and Their Types • Examples: "-12" instance of xs:string -12 instance of xs:integer 1.2 instance of xs:decimal 1.2E3 instance of xs:double string(1.2E3) instance of xs:string number("+12") instance of xs:double xs:date("2009-05-11") instance of xs:date true() instance of xs:boolean 5. Querying XML with XQuery
XPath 2.0/XQuery Type Hierarchy 5. Querying XML with XQuery
XPath 2.0/XQuery Type Hierarchy (cont.) 5. Querying XML with XQuery
XQuery/XPath 2.0 Sequences • Expressions operate on, and return sequences of • atomic values (of simple XML Schema types) and • nodes • an item a singleton sequence • sequences are flat: no sequences as items • (1, (2, 3), (), 1) = (1, 2, 3, 1) • sequences are ordered, and can contain duplicates • Unlimited combination of expressions, often with automatic type conversions (e.g. for arithmetics) 5. Querying XML with XQuery
Sequence Expressions • Constant sequences constructed by listing values • comma (,) is a catenation operator • (1, (2, 3), (), 1) = (1, 2, 3, 1) • Range expressions for integer sequences: • 1 to 4 • 4 to 1 • reverse(1 to 4) ->(1, 2, 3, 4) ->() ->(4, 3, 2, 1) 5. Querying XML with XQuery
Accessing Documents • XQuery operates on nodes accessible by input functions • fn:doc("URI") • document-node of the XML document available at URI • roughly same as document("URI") in XSLT 1.0 • fn:collection("URI") • sequence of nodes from URI • association defined by implementation • predeclared prefix for the default function namespace: fn="http://www.w3.org/2005/04/xpath-functions" 5. Querying XML with XQuery
XQuery/XPath 2.0 Data Model • Documents are viewed as treesmade of six types of nodes: • document (additional root above document element) • element nodes • attribute nodes • text nodes • Comments and processing instructions • Obs 1: No entity nodes, and no CDATA sections • Obs 2: Namespace nodes have been deprecated 5. Querying XML with XQuery
Document Trees • Defined in Sect. 5 of XPath 1.0 spec • for XSLT/XPath 2.0 & XQuery in their joint Data Model • Element nodes have elements, text nodes, comments and processing instructions of their (direct) content as their children • NB: attribute nodes are not children (but have a parent) • > they have no siblings either • the stringvalue of an document/element is the concatenation of its all text-node descendants 5. Querying XML with XQuery
Document Order • Document order of nodes: • = their left-to-right pre-order • Document root first • Other nodes in the order of the first character of their XML markup in the document text • > an element precedes it's attribute nodes, which precede any content nodes of the element • Order btw nodes belonging to different trees is implementation dependent, but consistent and stable 5. Querying XML with XQuery
Location Paths • XPath can select any parts of a document tree using … • Location paths • evaluated with respect to a context item (.) • assigned on path steps, after the first one • Path expression typically starts with $x or doc(…) • Result: sequence of nodes, in document order, without duplicates 5. Querying XML with XQuery
Path Expressions • Similar to XPath 1.0: [/ [/]]Expr/… /Expr • but steps more liberal: • arbitrary expressions OK, but steps before the last one must produce node sequences • 6 (of 13 XPath) axes required: child, descendant, attribute, self, descendant-or-self, parent • others (except namespace) optional, available if the Full Axis Feature is supported • with document-order operators (<<, >>) sufficient for expressing queries (→ Exercises) 5. Querying XML with XQuery
Location paths • Consist of location steps separated by '/' • each step produces a sequence of items • steps evaluated left-to-right, each item in turn as the context item • Complete location step: AxisName::NodeTest ([PredicateExpr])* • axis specifies the tree relationship between the context node and the selected nodes • node test restricts the type and and name of nodes • filtered further by 0 or more predicates 5. Querying XML with XQuery
Location steps: Axes • In total 12 axes (~ directions in tree) • for staying at the context node: self • for going downwards: • child, descendant, descendant-or-self • for going upwards: • parent, ancestor, ancestor-or-self • for moving towards start/end of the document: • preceding-sibling, following-sibling, preceding, following • “Special” axes • attribute(namespace deprecated in XPath 2.0) • (Axes required in XQuery implementations underlined) 5. Querying XML with XQuery
Notes on Location Paths (1) • XPath 2.0 allows unrestricted expressions as steps • but intermediate steps must produce nodes only • Numeric predicates support array-style access: $rows[$i] • Predicates evaluated step at a time. This sometimes causes confusion with shorthand notations: • doc("doc.xml")//title[3] third title child of each parent (likely none!). Why? • = doc("doc.xml")/ descendant-or-self::node()/child::title[3] • To get the third title in the doc use(doc("doc.xml")//title)[3] 5. Querying XML with XQuery
Notes on Location Paths (2) • References to attributes and subelements easy to use as predicates • Get divisions that are of class C or have a head:doc("doc.xml")//div[@class="C" or head] • Values are coerced to Booleans on demand • string/sequence true iff non-empty • number false if and only if zero or NaN • (but a single number as a predicate tests for equality with position()) 5. Querying XML with XQuery
Filter Expressions • Location steps can be filtered by predicates: doc("foo.xml")/body/(chap | app)[last()]/title the title of the last chapter of appendix, whichever is last • Other sequences, too:(1 to 20)[. mod 5 eq 0] → (5, 10, 15, 20) • ('.' generalized from XPath 1.0 shorthand for self::node() into the context item) XPath 2.0 extendedstep 5. Querying XML with XQuery
Path Steps as a Map operator • XPath 2.0 path exprs provide a kind-of map facility, to compute a new sequence by evaluating an expression for each item of the input sequence • Example: Get all salaries incremented by 20%: doc("emps.xml")//emp/@salary/(. * 1.2) • Useful tricks, like providing defaults for missing attributes: $emp/(@salary,text{0.0})[1]/(. * 1.2) 5. Querying XML with XQuery
Path Steps as a Map (2) • Limitation: steps are applicable to node sequences only. Example: an invalidattemptto square numbers 1, 2, ..., 10: (1 to 10)/(. * .) • Work-around: translate items first to text nodes: (for $i in 1 to 10 return text{ $i })/(. * .) or simply: for $i in 1 to 10 return $i * $i • Function calls can also be used as steps: myFun:toTextNodes(1 to 10)/myFun:square(.) 5. Querying XML with XQuery
$s1 union $s2 = $s1 intersect $s2 = $s1 except $s2 = Set Operations on Node Sequences • Assume variable bindings: $s1 $s2 a b c d e • Then: w.o. duplicates, in doc. order a b c d e c based on node indentity (node1is node2) a b 5. Querying XML with XQuery
Node Comparisons • To compare single nodes, • for identity: is$book//chap[@id="ch1"] is ($book//chap)[1]true iff the chapter with id="ch1" is the first chap • for document order: <<and>> $book//chap[@id="ch2"] >> $book//title[. eq "Intro"]true iff the chapter with id="ch2" appears after<title>Intro</title> • if either operand is empty, then result is empty (~ false) 5. Querying XML with XQuery
Comparing values of sequences and items • General comparisons btw sequences: • =, !=, <, <=, >, >= • existential semantics: true iffsome pair of values from operand sequences satisfy the condition • (1,2) = (2,3); (2,3) = (3,4); (1,2) != (3,4) • Same as in XPath 1.0: //book[author = "Aho"]→ books where some author is Aho • "Is (some) author of $book Ann or Bob?":$book/author = ("Ann", "Bob") • Slice of $seqfrom pos $s to $e: 5. Querying XML with XQuery
Set operations for sequences of atomic items • General comparison as a predicate yields set operations: • Union of $A and $B: • Intersection of $A and $B: • Difference of $A and $B: • Above comparisons require items of compatible types 5. Querying XML with XQuery
Value Comparisons • For comparing single values: • eq, ne, lt, le, gt, ge • 1 eq 3 - 2; 10 lt 20 • $books[@price le 100] • the last assumes that a numeric type has been assigned to @price by validation • otherwise it has type xs:untypedAtomic, which is cast to xs:string( TYPE ERROR) general comparisons more convenient with unvalidated elements & attributes 5. Querying XML with XQuery
Working with Untyped Values • Text values may receive a specific type in a schema-validated element or attribute; Otherwise their type is xs:untypedAtomic • Automatic atomization and casting make dealing with them easy. Example: let $elem := <e>2.718281828</e> return ( "Value of", concat(substring($elem, 1, 6), ".. is about"), round-half-to-even($elem, 2) ) -> Value of 2.71828.. is about 2.72 5. Querying XML with XQuery
General vs Value Comparisons wrt Types • Comparisons atomize operands: nodes typed values • Assume that $E := <E>007</E> • Generalcomparisonstry to cast xs:untypedAtomic operands to compatible types: $E < 6 (: false: xs:untypedAtomic(007) -> xs:double(007) = 7 :), $E < "6" (: true: xs:untypedAtomic(007) -> "007" :), • Value comparisons cast xs:untypedAtomic values to strings: $E lt "6" (: true: xs:string("007") lt "6" :), $E lt 6 TYPE ERROR: cannot compare xs:untypedAtomic to xs:integer 5. Querying XML with XQuery
What does XQuery add to XPath 2.0? • A query is an expression (lauseke) • any XPath expression is a query • XQuery adds to XPath expressions • Element constructors ( XSLT templates) • FLWOR expressions (”flower”; for-let-where-order by-return) 5. Querying XML with XQuery
Central XQuery Expressions • Path expressions • Sequence expressions • Comparison operators • Conditionals: if (..) then .. else .. • Quantified expressions (some/every $varin … satisfies …) • Element constructors ( XSLT templates) • FLWOR expressions (”flower”; for-let-where-order by-return) • XPath 2.0 has a simpler for-return expression also in XPath 2.0 5. Querying XML with XQuery
Example: Quantified Expression • Find book elements which have at least 10 sections in each of their chapters : doc(”Books.xml”)//book[ every $c in .//chaptersatisfiescount($c//section) ge 10 ] 5. Querying XML with XQuery
Element Constructors • Direct element constructors ~ XSLT templates: • start and end tag enclosing the content • literal fragments written directly, expressions enclosed in braces {and } ≈ XSLT 1.0 attribute value templates • often used inside another expression that binds variables used in the element constructor • (There is no 'current node' in XQuery) • See next 5. Querying XML with XQuery
Example • An emp element with an empid attribute and child elements name and job, from values in variables $id, $n, and $j: <empempid="{$id}"> <name>{$n}</name> <job>{$j}</job> </emp> Also computed constructors: element {"emp"} { attribute {"empid"}{$id}, <name> {$n} </name>, <job> {$j} </job> } 5. Querying XML with XQuery
Identity of Component Nodes • Each node has node identity, and at most one parent. Existing nodes are copied before they get a new parent. • Example: let $x := <e>Hi</e>, $y := <p>{$x}</p> return not($x is $y/e) and deep-equal($x, $y/e) -> true 5. Querying XML with XQuery
FLWOR ("flower") Expressions • Constructed from for, let, where, order by and return clauses (~SQL select-from-where) • Syntax: (ForClause | LetClause)+ WhereClause? OrderByClause? "return" Expr • FLWOR binds variables to values, and uses these bindings to construct a result (an ordered sequence of items) 5. Querying XML with XQuery
Flow of data in a FLWOR expression tuple = monikko/rivi sequnce of items 5. Querying XML with XQuery
for clauses • for $V1inExp1 (, $V2inExp2, …) • associates each variable Vi with expression Expi (e.g. a path expression) • Result: list of tuples, each containing a binding for each of the variables • can be though of as loops iterating over the items returned by respective expressions 5. Querying XML with XQuery
Example: for clause for $i in (1,2), $j in (1 to $i)return <tuple> <i>{$i}</i> <j>{$j}</j></tuple> Result: <tuple><i>1</i><j>1</j></tuple> <tuple><i>2</i><j>1</j></tuple> <tuple><i>2</i><j>2</j></tuple> 5. Querying XML with XQuery
let clauses • let also binds variables to expressions • each variable gets the entire sequence as its value (without iterating over the items of the sequence) • results in binding a single sequence for each variable • Compare: • for $b in doc("bib.xml")//book-> many bindings (to single books) • let $bl := doc("bib.xml")//book-> a single binding (to sequence of books) 5. Querying XML with XQuery
Example: let clauses let $s := (<one/>, <two/>, <three/>) return <out> {$s} </out> Result: <out> <one/> <two/> <three/> </out> for $s in (<one/>,<two/>,<three/>) return <out> {$s} </out> --><out><one/></out> <out><two/></out> <out><three/></out> 5. Querying XML with XQuery
for/let clauses • A FLWOR expr may contain several fors and lets • each may refer to variables bound in previous clauses • the result of the for/let sequence: • an ordered list of tuples (monikko) of bound variables • number of tuples = product of the cardinalities of the sequences returned by the for expressions 5. Querying XML with XQuery
where clause • binding tuples generated by for and let clauses are filtered by an optional where clause • tuples with a true condition are used to instantiate the return clause • the where clause may contain several predicates connected by and, or, and fn:not() • usually refer to the bound variables • sequences as Booleans (similarly to node-sets in XPath 1.0): empty ~ false; non-empty ~ true 5. Querying XML with XQuery
where clause • for binds variables to single items ->value comparisons, e.g. $color eq"red" • let to whole sequences ->general comparisons, e.g. $colors = "red" (~ some $c in $colors satisfies $c eq "red") • a number of aggregation functions available: avg(), sum(), count(), max(), min() (also in XPath 1.0) 5. Querying XML with XQuery
return clause • The return clause generates the output of the FLWOR expression • instantiated once for each binding tuple • often contains element constructors, references to bound variables, and nested sub-expressions 5. Querying XML with XQuery