280 likes | 393 Views
XML query. introduction. An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever they have stored in XML .
E N D
introduction • An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever they have stored in XML. • XQuery is based on the structure of XML and leverages this structure to provide query capabilities for the same range of data that XML stores.
XQuery is defined in terms of the XQuery 1.0 and XPath 2.0 Data Model [XQ-DM], which represents the parsed structure of an XML document as an ordered, labeled tree in which nodes have identity and may be associated with simple or complex types. • XQuery is a functional language
{ row: { name: “John”, phone: 3634 }, row: { name: “Sue”, phone: 6343 }, row: { name: “Dick”, phone: 6363 } } XML vs. Relational Data row row row phone phone phone name name name “Sue” “John” 3634 6343 “Dick” 6363 Relation … in XML
Relational to XML Data • A relation instance is basically a tree with: • Unbounded fanout at level 1 (i.e., any # of rows) • Fixed fanout at level 2 (i.e., fixed # fields) • XML data is essentially an arbitrary tree • Unbounded fanout at all nodes/levels • Any number of levels • Variable # of children at different nodes, variable path lengths
Query Language for XML • Must be high-level; “SQL for XML” • Must conform to XSchema • But also work in absence of schema info • Support simple and complex/nested datatypes • Support universal and existential quantifiers, aggregation • Operations on sequences and hierarchies of doc structures • Capability to transform and create XML structures
XQuery • Influenced by XML-QL, Lorel, Quilt, YATL • Also, XPath and XML Schema • Reads a sequence of XML fragments or atomic values and returns a sequence of XML fragments or atomic values • Inputs/outputs are objects defined by XML-Query data model, rather than strings in XML syntax
Overview of XQuery • Path expressions • Element constructors • FLWOR (“flower”) expressions • Several other kinds of expressions as well, including conditional expressions, list expressions, quantified expressions, etc. • Expressions evaluated w.r.t. a context: • Context item (current node) • Context position (in sequence being processed) • Context size (of the sequence being processed) • Context also includes namespaces, variables, functions, date, etc.
Path Expressions Examples: • Bib/paper • Bib/book/publisher • Bib/paper/author/lastname Given an XML document, the value of a path expression p is a set of objects
Path Expression Examples Bib &o1 paper Doc = paper book references &o12 &o24 &o29 references references author page author year author title http title title publisher author author author &o43 &25 &o44 &o45 &o46 &o52 &96 1997 &o51 &o50 &o49 &o47 &o48 first last firstname lastname lastname firstname &o70 &o71 &243 &206 “Serge” “Abiteboul” “Victor” 122 133 “Vianu” Bib/paper = <&o12,&o29> Bib/book/publisher = <&o51> Bib/paper/author/lastname = <&o71,&206> Note that order of elements matters!
Element Construction • An XQuery expression can construct new values or structures • Example: Consider the path expressions from the previous slide. • Each of them returns a newly constructed sequence of elements • Key point is that we don’t just return existing structures or atomic values; we can re-arrange them as we wish into new structures
Data Model • In the XQuery data model, every document is represented as a tree of nodes. The kinds of nodes that may occur are: document, element, attribute, text, name- space, processing instruction, and comment. • Anitem is a single node or atomic value. A series of items is known as a sequence. In XQuery, every value is a sequence
Literals and comments • (: Hello World :)
Doc() function • Returns entire document doc("books.xml")
Locating nodes • A path expression consists of a series of one or more steps, separated by a slash, /, or double slash, //. • doc("books.xml")/bib/book • doc("books.xml")//book
Predicates • Predicates are Boolean conditions that select a subset of the nodes computed by a step expression. • XQuery uses square brackets around predicates. • For instance, the following query returns only authors for which last="Stevens" is true:doc("books.xml")/bib/book/author[last="Stevens"]
If a predicate contains a single numeric value, it is treated like a subscript. For instance, the following expression returns the first author of each book: (doc("books.xml")/bib/book/author)[1]
Creating Nodes document { <book year="1977"> <title>Harold and the Purple Crayon</title> <author><last>Johnson</last><first>Crockett </first></author> <publisher>HarperCollins Juvenile Books</publisher> <price>14.95</price> </book> }
<titles count="{ count(doc('books.xml')//title) }"> { doc("books.xml")//title } </titles>
FLWOR Expressions • FOR-LET-WHERE-ORDERBY-RETURN = FLWOR FOR / LET Clauses List of tuples WHERE Clause List of tuples ORDERBY/RETURN Clause Instance of XQuery data model
FOR vs. LET • FOR$xIN list-expr • Binds $x in turn to each value in the list expr • LET$x = list-expr • Binds $x to the entire list expr • Useful for common sub-expressions and for aggregations
FOR vs. LET: Example Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ... FOR$xINdocument("bib.xml")/bib/book RETURN <result> $x </result> Notice that result has several elements Returns: <result> <book>...</book> <book>...</book> <book>...</book> ... </result> LET$xINdocument("bib.xml")/bib/book RETURN <result> $x </result> Notice that result has exactly one element
XQuery Example 1 Find all book titles published after 1995: FOR$xINdocument("bib.xml")/bib/book WHERE$x/year > 1995 RETURN$x/title Result: <title> abc </title> <title> def </title> <title> ghi </title>
XQuery Example 2 For each author of a book by Morgan Kaufmann, list all books she published: FOR$aINdistinct(document("bib.xml")/bib/book[publisher=“Morgan Kaufmann”]/author) RETURN <result> $a, FOR$tIN /bib/book[author=$a]/title RETURN$t </result> distinct = a function that eliminates duplicates (after converting inputs to atomic values)
Results for Example 2 <result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result> Observe how nested structure of result elements is determined by the nested structure of the query.
XQuery Example 3 <big_publishers> FOR$pINdistinct(document("bib.xml")//publisher) LET$b := document("bib.xml")/book[publisher = $p] WHEREcount($b) > 100 RETURN$p </big_publishers> For each publisher p • Let the list of books • published by p be b Count the # books in b, and return p if b > 100 count = (aggregate) function that returns the number of elements
XQuery Example 4 Find books whose price is larger than average: LET$a=avg(document("bib.xml")/bib/book/price) FOR$b in document("bib.xml")/bib/book WHERE$b/price > $a RETURN$b
FLWOER Expressions for $b in doc("books.xml")//book where $b/@year = "2000" return $b/title