Query Languages for XML. XPath. The XPath Data Model. Given an XML document, XPath operates on it and produces values that are sequences of items . An item is either: A value of primitive type, e.g., integer or string; A node (defined next). Principal Kinds of Nodes.
The XPath Data Model • Given an XML document, XPath operates on it and produces values that are sequences of items. • An item is either: • A value of primitive type, e.g., integer or string; • A node (defined next).
Principal Kinds of Nodes • Documents represent entire XML documents. • Local path name or a URL. • Elements are pieces of a document consisting of some opening tag, its matching closing tag (if any), and everything in between. • Attributes are names that are given values inside opening tags.
Document Nodes • Formed by doc(filename) • filename can be a local name or a URL. • Example: doc(“univ.xml”) or doc(“/mydir/univ.xml”) • All Xpath queries refer to a doc node, either explicitly or implicitly. • Example: key definitions in XML Schema have XPath expressions that refer to the document described by the schema.
Example Document An element node An attribute node <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV> Document node is all of this, plus the header ( <? xml version… ?>).
Nodes as Semistructured Data univ.xml UNIV sno = ”007” cno = ”CS123” STUDENT COURSE theCNO = ”CS456” DB MARK theCNO = ”CS123” NAME MARK 98 James Bond 80
Paths in XML Documents • XPath is a language for describing paths in XML documents. • The result of the described path is a sequence of items.
Path Expressions • Simple path expressions are sequences of slashes (/) and tags, starting with /. • Example: /UNIV/STUDENT/MARK • Construct the result by starting with just the doc node and processing each tag in turn from the left.
Evaluating a Path Expression • Assume the first tag is the root. • Processing the doc node by this tag results in a sequence consisting of only the root element. • e.g., /UNIV • Suppose we have obtained a sequence of items from processing the previous tags, and the next tag is X. • For each item that is an element node, replace the element by all its subelements with tag X.
Example: /UNIV One item, the UNIV element <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV>
Example: /UNIV/STUDENT This STUDENT element followed by all the other STUDENT elements <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV>
Example: /UNIV/STUDENT/MARK These MARK elements followed by the MARK elements of all the other STUDENTs. <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV>
Relative Path • We can use XPath expressions that are relative to the current node or sequence of nodes. • Do not start with /. • Example • If we have arrived at node /UNIV, then we can use relative path STUDENT/NAME or COURSE to describe its subelements. Lu Chaojun, SJTU
Attributes in Paths • Instead of going to subelements with a given tag, you can go to an attribute of the elements you already have. • An attribute is indicated by putting @ in front of its name.
Evaluating Attributes • When a path expression ends in an attribute, the result is typically a sequence of values of primitive type.
Example: /UNIV/STUDENT/MARK/@theCNO These attributes contribute ”CS123” ”CS456” to the result, followed by other theCNO values. <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV>
Paths that Begin Anywhere • If the path begins with //X, then the first step can begin at the root or any subelement of the root, as long as the tag is X. • In fact, //X can appear anywhere in a path. • e.g., /UNIV//NAME • // is the shorthand of a kind of axis. (see next slide)
Axes: Modes of Navigation • In general, path expressions allow us to start at the root and execute steps to find a sequence of nodes. • At each step, we may follow any one of several axes. • The default axis is child:: --- go to all the children of the current set of nodes. • Shorthand: /
Example: Axes • /UNIV/STUDENT is really shorthand for /child::UNIV/child::STUDENT. • @ is really shorthand for the attribute:: axis. • Thus, /UNIV/STUDENT/@sno is shorthand for /child::UNIV/child::STUDENT/attribute::sno
More Axes • Some other useful axes are: • parent:: = parent(s) of the current node(s) • Shorthand: .. • self • Shorthand: the dot • descendant-or-self:: = the current node(s) and all descendants • Shorthand: // • ancestor, ancestor-or-self, next-sibling, etc.
Wild-Card * • A star (*) in place of a tag represents any one tag. • Example: /*/*/NAME represents all NAME elements at the third level of nesting. • @* represents any attribute.
Example: /UNIV/* This STUDENT element, all other STUDENT elements, the COURSE element, all other COURSE elements <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV>
Selection Conditions • A condition inside […] may follow a tag. • If so, then only paths that have that tag and also satisfy the condition are included in the result of a path expression. • Comparisons have an implied “there exists” sense: two sequences are related if any pair of items, one from each sequence, are related by the given comparison operator.
Example: Selection Condition The condition that the MARK be < 90 makes this but not the CS123 mark part of the result. /UNIV/STUDENT/MARK[. < 90] <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV>
Example: Attribute in Selection Now, this MARK element is selected, along with any other marks for CS123. /UNIV/STUDENT/MARK[@theCNO = ”CS123”] <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV>
Other Forms of Conditions • Here are some useful forms of conditions: X[i] = true for ith child of its parent X[T] = true for X having subelement with tag T X[A] = true for X having attribute A Lu Chaojun, SJTU
XQuery • XQuery extends XPath to a query language that has power similar to SQL. • Uses the same sequence-of-items data model. • XQuery is an expression/functional language. • Any XQuery expression can be an argument of any other XQuery expression. • Like relational algebra
More About Item Sequences Empty sequence • XQuery will sometimes form sequences of sequences. • All sequences are flattened. • Example: (1 2 () (3 4)) = (1 2 3 4).
FLWR Expressions • Zero or more for and/or let clauses. • Then an optional where clause. • Exactly one return clause.
Semantics of FLWR Expressions • Each for creates a loop. • let produces only a local definition. • At each iteration of the nested loops, if any, evaluate the where clause. • If the where clause returns TRUE, invoke the return clause, and append its value to the output.
FOR Clauses for var in exp, ... • Variables begin with $. • A for-variable takes on each item in the sequence denoted by the expression, in turn. • Whatever follows this for is executed once for each value of the variable.
Example: FOR “Expand the en- closed string by replacing variables and path exps. by their values.” for $c in doc(”univ.xml”)/UNIV/COURSE/@cno return <CNO> {$c} </CNO> • $c ranges over the cno attributes of all courses in our example document. • Result is a sequence of CNO elements: <CNO>CS123</CNO> <CNO>CS456</CNO> . . .
Use of Braces • When a variable name like $x, or an expression, could be text, we need to surround it by braces to avoid having it interpreted literally. • Example: <A>$x</A> is an A-element with value ”$x”. <A>{$x}</A> is correct. • But return $x is unambiguous. • You cannot return an untagged string without quoting it, as return ”$x”.
LET Clauses let var := exp, ... • Value of the variable becomes the sequenceof items defined by the expression. • Note let does not cause iteration; for does.
Example: LET let $d := document(”univ.xml”) let $c := $d/UNIV/COURSE/@cno return <CNO> {$c} </CNO> • Returns one element with all the course numbers, like: <CNO>CS123</CNO>
Order-By Clauses • FLWR is really FLWOR: an order-by clause can precede the return. • Form: order by <expression> • With optional ascending or descending. • The expression is evaluated for each assignment to variables. • Determines placement in output sequence.
Example: Order-By Generates bindings for $p to MARK elements. Order those bindings by the values inside the elements (auto- matic coersion). Each binding is evaluated for the output. The result is a sequence of MARK elements. • List all prices for Bud, lowest first. let $d := document(”univ.xml”) for $p in $d/UNIV/STUDENT/MARK[@theCNO=”CS123”] order by $p return $p
Predicates • Normally, conditions imply existential quantification. • e.g., for two sequences of items to be equal, we have only to find any pair of items, one from each side, that equate.
Strict Comparisons • To require that the things being compared are sequences of only one element, use comparison operators: eq, ne, lt, le, gt, ge. • Example: $x/NAME eq ”James Bond” is true if somebody is the only person named “James Bond”.
Boolean Values in XQuery • The effective boolean value (EBV) of an expression is: • The actual value if the expression is of type boolean. • FALSE if the expression evaluates to 0, ”” [the empty string], or () [the empty sequence]. • TRUE otherwise.
Comparison of Elements and Values • When an element is compared to a primitive value, the element is treated as its value, if that value is atomic. • Example: /UNIV/STUDENT[@sno=”007”]/ MARK[@theCNO=”CS123”] eq ”80” is true if 007 get 80 for CS123.
Comparison of Two Elements • It is insufficient that two elements look alike. • Example: /A[@C=”C1”]/B eq /A[@C=”C2”]/B is false. • For elements to be equal, they must be the same, physically, in the implied document.
Getting Data From Elements • Suppose we want to compare the values of elements, rather than their location in documents. • To extract just the value (e.g., the mark itself) from an element E, use data(E).
Eliminating Duplicates • Use function distinct-values applied to a sequence. • Subtlety: this function strips tags away from elements and compares the string values. • But it doesn’t restore the tags in the result. • Example return distinct-values( let $d= doc(”univ.xml”) return $d/UNIV/STUDENT/MARK)
Quantifier Expressions some $x in E1 satisfies E2 • Evaluate the sequence E1. • Let $x (any variable) be each item in the sequence, and evaluate E2. • Return TRUE if E2 has EBV TRUE for at least one $x. • Analogously: every $x in E1 satisfies E2
Aggregations • Take sequence as argument, and return count, sum, max, etc. • Example let $d := doc(“univ.xml”) for $s in $d/UNIV/STUDENT where count($s/MARK) > 100 return $s Lu Chaojun, SJTU
Branching Expressions • if (E1) then E2 else E3 is evaluated by: • Compute the EBV of E1. • If true, the result is E2; else the result is E3. • Example if($x/@sno eq ”007”) then $x/NAME else ()