680 likes | 906 Views
Query XML Documents with XQuery. Cheng-Chia Chen. Objectives. How XML generalizes relational databases The XQuery language How XML may be supported in databases. XQuery 1.0. XML documents naturally generalize database relations XQuery is the corresponding generalization of SQL.
E N D
Query XML Documentswith XQuery Cheng-Chia Chen
Objectives • How XML generalizes relational databases • The XQuery language • How XML may be supported in databases
XQuery 1.0 • XML documents naturally generalize database relations • XQuery is the corresponding generalization of SQL
Queries on XML documents • XML documents generalize relational data: • A database is composed of many tables • A table (relation) is composed of tuples • A tuple (record; rows) is composed of attributes • An attribute (field; column) contains a primitve data (, which may serves as a key/key reference) .
Only Some Trees are Relations • A relation is a tree • of height two with • an unbounded number of children (rows) all of which • have the same fixed number of child nodes (columns) • The database community has been looking for aricher data model than relations. Hierarchical, object-oriented, or multi-dimensional databases have emerged, but neither has reached consensus. • A DTD for RDB : • <!ELEMENT PeopleDataBase (PeopleTable, …) > • <!ELEMENT PeopleTable (PeopleRow*) > • <!ELEMENT PeopleRow (%Column;) > • <!ENTITY % Column “FirstName, LastName, Age “ > • <!ELEMENT FirstName (#PCDATA)> • <!ELEMENT LastName (#PCDATA)> • <!ELEMENT Age (#PCDATA)>
Trees Are Not Relations • Not all XML Document trees satisfy the previous characterization • Trees are ordered, while both rows and columns of tables may be permuted without changing the meaning of the data • Trees may have height > 2. • Trees is in general not homogeneous. • Elements of the same type may have different number and types of children. • This does not mean that we cannot store these documents in traditional data base but only means that a more complex encoding of xml documents into data base tables is required.
An examlpe student grade records <students> <student id="100026"> <name>Joe Average</name> <age>21</age> <major>Biology</major> <grades> <result course="Math 101" grade="C-"/> <result course="Biology 101" grade="C+"/> <result course="Statistics 101" grade="D"/> </grades> </student> <student id="100078"> <name>Jack Doe</name> <age>18</age> <major>Physics</major><major>XML Science</major> <grades> <result course="Math 101" grade="A"/> <result course="XML 101" grade="A-"/> <result course="Physics 101" grade="B+"/> <result course="XML 102" grade="A"/> </grades> </student></students>
A Corresponding Student Database key point of data base design: decompose general xml tree into flat data base tables
For Relational Data base, we have well-established theory and practice, but can we have the same achievement for XML ? • database XML • DDL(DataDefintionLang) XML Schema • DQL(DataQueryLang;SQL) XQuery • How should query languages like SQL be similarly generalized?
XQuery Design Requirements • Must have • at least one XML syntax and • at least one human-readable syntax • Must be declarative • Must be namespace aware • Must coordinate with XML Schema • Must support simple and complex datatypes • Must be able to combine information from multiple documents • Must be able to transform and create XML trees
The XQuery language • Developed by W3C • Recommendation. • Derived from several previous proposals: • XML-QL ,YATL ,Lorel,Quilt which all agree on the fundamental principles. XQuery relies on XPath and XML Schema datatypes. • There have been many commercial/non-commerical implementations released. • Exist : http://exist.sf.net • AltovaXML (free) • … • Two formats • XQuery : plaintext syntax • XQueryX : XML syntax.
Relationship to XPath • XQuery 1.0 is a strict superset of XPath 2.0 • Every XPath 2.0 expression is directly an XQuery 1.0 expression (a query) • The extra expressive power is the ability to • join information from different sources and • generate new XML fragments
Relationship to XSLT • XQuery and XSLT are • both domain-specific languages (DSL) for combining and transforming XML data from multiple sources • They are vastly different indesign, partly for historical reasons • XQuery is designed from scratch, • XSLT is an intellectual descendant of CSS (cascaded StyleSheet) – about how to render a document. • Technically, they may emulate each other. • XQuery Grammar
XQuery concepts • A query in XQuery is an expression that: • reads a sequence of XML nodes or atomic values • returns a sequence of XML nodes or atomic values • The principal forms of XQuery expressions are: • path expressions • element constructors • FLWOR ("flower") expressions • list expressions (for,let) • conditional expressions (where ) • order by • quantified expressions (some/all ) • datatype expressions ( xs:int(“123”) )
XQuery Modulesand Prologs • An XQuery is a main module: • Module ::= VersionDecl? ( MainModule | LibraryModule) • VersionDecl: • xquery version "1.0" encoding "UTF-8" ; • (: this is versiondecl; encoding is optional :) • a module may import other library modules. • MainModule ::= Prolog QueryBoby • LibraryModule ::= ModuleDecl Prolog • Like XPath expressions, XQuery expressions are evaluated relatively to a context • This is explicitly provided by a prolog • Settings define various parameters for the XQuery processor language, such as: declare bounndary-space (preserve | strip );
More From the Prolog Prolog ::= (: Prolog consists of two parts. :) ( ( DefaultNamespaceDecl// declare default function or element/type NSs | Setter// settings for XQuery processing | NamespaceDecl// declare prefix-URI bindings | Import// import schema or other modules )Separator// each ends with ‘;’ )* ( ( (:Second part contains variable decl, function decl and options :) VarDecl// global variable declarations | FunctionDecl// function declarations | OptionDecl // implementation-defined options declaration ) Separator// each ends with ';' )* optionDecl ::= declare option pfx:optname “aLiteral”
Namespace Declarations Format: // declare prefix-namexpace binding declare namespace prefix = ”aURI” ; Ex: declare namespace bk = ”http://www.book.org/” ; • Predeclared namespaces prefixes: • (: for xml:lang, xml:space, etc :) • declare namespace xml = "http://www.w3.org/XML/1998/namespace"; • (: for xml schema and schema instance :) • xs = "http://www.w3.org/2001/XMLSchema"; • xsi = "http://www.w3.org/2001/XMLSchema-instance"; • (: for xpath fucntion and data types ; this is the default function namespace:) • fn = "http://www.w3.org/2005/xpath-functions"; • (: for xquery fucntions :) • local="http://www.w3.org/2005/xquery-local-functions"; • only xml prefix cannot be redefined.
Setter [7] Setter ::= BoundarySpaceDecl | // preserve space b/t elements • declare boundary-space (preserve | strip); • DefaultCollationDecl | // default collation for ordering • declare default collation "aURILiteral" • BaseURIDecl | • declare base-uri "http://example.org"; • ConstructionDecl| // preserve type of copied constructs? • declare construction (strip | preserve); • OrderingModeDecl | // ordered or not for return values • declare ordering (ordered | unordered) ; • EmptyOrderDecl | //empty or NaN as least/greatest key • declare default order empty (greatest | least); • CopyNamespacesDecl // copy prefix-NS bindings • see (here. and (3.7.1.3.1.e.2.D) for details) • cf: constructed node ;copied nodes; original elements • ex: <cn>{.//book}</cn>
Default namespace declarations • Defautl namespace • Defautl element/type namespace • Default function namespace Ex: • (: can be disabled by ”” • for element (and type as well) :) • declare default element namespace ”http://a.b.c/d” ; • declare default functionnamespace “aURI”; • (: default is "http://www.w3.org/2005/xpath-functions" if not specified • or disabled by "" • So f1(.) has the same effect as fn:f1(.) :) • Unprefixed attribute names and variable names are in no namespace.
Import declarations • Import declarations • // import element/type/attribute declared in a schema • // may also declare prefix/default namespace for the target namespace • // default namespace uri can be "" for importing noNamespaceSchema • import schema • import (library) module Ex: • import schema "targetNamespaceURI" (at "schema_loc_uri1", "schema_loc_uri2" )? ; • import schema namespace prefix = "targetNamespaceURI" (at "loc1" )?; • import schema default element namespace "targetNamespaceURI" (at "loc_uri" )? ; • Ex: • // import a schema and use it as the default element/type namespace • import schema default element namespace "http://example.org/abc"; • // import a schema which has no target namespace and use it as default namespace. • import schema default element namespace "" at "http://example.org/xyz.xsd";
Module imports • Import declarations • // import function/global variables declared in an XQ library module • import (library) module// may also declare its namespace prefix • Notes: • Only variables and functions are imported; Hence in scope schema definition and statically known namespaces are not imported. • Module imports are not transitive: A imports B and B imports C do not imply A imports C. • Ex: • import module "http://book.org/ "; • // can give hints about module locations • import module "http://book.org/” (at "http://www.book.org/book.xq", "book2.xq” )? ; • // can also bind a prefix for imported module namespace uri. • import module namespace book="http://book.org/” ( at "book2.xq” )? ; • import module namespace math = "http://example.org/math-functions";
Variable and function declarations • VarDecl ::= • declare variable $QNameTypeDeclaration? ((":=" ExprSingle) | "external") • declare variable $math:pi xs:decimalopt := 3.14159 ; • declare variable $math:e xs:double := 2.71E0 • declare variable $ext:var1 external ; • FunctionDecl ::= • declare function QName ( ParamList? ) (as SequenceType)? (EnclosedExpr | external)
Ex : declare function local:summary // name ( $emps as element(employee)* ) // parameters as element(dept) * // return type { //body for $d in fn:distinct-values($emps/deptno) //a FLWOR expression let $e := $emps[deptno = $d] return <dept> <deptno>{$d}</deptno> <headcount> {fn:count($e)} </headcount> <payroll> {fn:sum($e/salary)} </payroll> </dept> }; Direct element construction!
Example • Suppose we have the file at loc "employee.xml" <employees><employee> <depno>dept2</depno><name>Wang</name><salary>100000</salary></employee> <employee> <depno>dept1</depno><name>Lee</name><salary>50000</salary></employee> <employee> <depno>dept3</depno><name>Chen</name><salary>100000</salary></employee> <employee> <depno>dept1</depno><name>Cheng</name><salary>150000</salary></employee> <employees> • The following query sho how to use the function summary xquery version "1.0" encoding "UTF-8" ; (: Definition of fun summary inserted here or import it here :) let $emps := doc("employees.xml")/employees/employee return <summary>{summary($emps)}</summary> • Result: <summary> <dept><deptno>dept2</deptno><headCount>1</headcount><payroll>100000</payroll></dept> <dept><deptno>dept1</deptno><headCount>2</headcount><payroll>200000</payroll></dept> <dept><deptno>dept3</deptno><headCount>1</headcount><payroll>100000</payroll></dept> </summary>
Xquery Conecpts • QueryBody = Expr • [31] Expr ::= ExprSingle ("," ExprSingle)* • [32] ExprSingle ::= FLWORExpr | QuantifiedExprxpath20 | TypeswitchExpr| IfExprxpath20 | OrExprxpath20 • Expressions are evaluated relative to a expression context: • namespaces • variables • functions • date and time • context item,context position, context size • …
XPath Expressions • XPath expressions are also legal XQuery expressions • same data model : sequence of items • items : nodes or atomic data • The XQuery prolog gives the required static context • namespace, function library, variable bindings • The initial context node, position, and size are undefined • Since XQuery is intended to operate on multple docuemnts. • initial context info can be obtained by starting from fn:doc()
Datatype Expressions • Same atomic values as XPath 2.0 • Also lots of primitive simple values: xs:string("XML is fun") xs:boolean("true") xs:decimal("3.1415") xs:float("6.02214199E23") xs:dateTime("1999-05-31T13:20:00-05:00") xs:time("13:20:00-05:00") xs:date("1999-05-31") xs:gYearMonth("1999-05") xs:gYear("1999") xs:hexBinary("48656c6c6f0a") xs:base64Binary("SGVsbG8K") xs:anyURI("http://www.brics.dk/ixwt/") xs:QName("rcp:recipe") • Functions for theses types can be found in fn:XXX().
XML Expressions • XQuery expressions may produce new XML nodes, which do not exist in input documents. • Expressions may denote • element, • character data, • comment, and • processing instruction nodes • Each node is created with a unique node identity • Constructors may be either • direct (called literal result element in XSLT) or • computed • Both may be nested with each other.
Direct Constructors • Uses the standard XML syntax • Ex: • The expression <foo><bar/>baz</foo> • evaluates to the given XML fragment • Note that the expression (containing two direct elements) <foo/> is <foo/> • evaluates to false • why ? ( unique identity of each node )
Namespaces in Constructors (1/3) • The following four constructions have the same effect. declare default element namespace "http://businesscard.org"; <card> <name>John Doe</name> <title>CEO, Widget Inc.</title> <email>john.doe@widget.com</email> <phone>(202) 555-1414</phone> <logo uri="widget.gif"/> </card>
Namespaces in Constructors (2/3) declare namespace b = "http://businesscard.org"; <b:card> <b:name>John Doe</b:name> <b:title>CEO, Widget Inc.</b:title> <b:email>john.doe@widget.com</b:email> <b:phone>(202) 555-1414</b:phone> <b:logo uri="widget.gif"/> </b:card>
Namespaces in Constructors (3/3) <card xmlns="http://businesscard.org"> <name>John Doe</name> <title>CEO, Widget Inc.</title> <email>john.doe@widget.com</email> <phone>(202) 555-1414</phone> <logo uri="widget.gif"/> </card> <b:card xmlns:b="http://businesscard.org"> <b:name>John Doe</b:name> <b:title>CEO, Widget Inc.</b:title> <b:email>john.doe@widget.com</b:email> <b:phone>(202) 555-1414</b:phone> <b:logo uri="widget.gif"/> </b:card>
Enclosed Expressions • XQ also allow us to embed some XQ expressions into direct elements : • Such expression is called attribuate value template in XSLT. • but can be appled to atribute value as well as element contents. <foo>1 2 3 4 5</foo> <foo>{1, 2, 3, 4, 5}</foo> <foo>{1, "2", 3, 4, 5}</foo> <foo>{1 to 5}</foo> <foo>1 {1+1}{" "}{"3"}{" "}{4 to 5}</foo> <foo bar="1 2 3 4 5"/> <foo bar="{1, 2, 3, 4, 5}"/> <foo bar="1 {2 to 4} 5"/> • ‘{‘ and '}' now have special meaning and must be escaped using { ( or {{ ) and } (or }} ).
Direct Constructors v.s Computed constructors • Direct construction in XML syntax : <card xmlns="http://businesscard.org"> <name>John Doe</name> <title>CEO, Widget Inc.</title> <email>john.doe@widget.com</email> <phone>(202) 555-1414</phone> <logo uri="widget.gif"/> </card> • Non-XML alternative syntax (Computed Constructors) • syntax • declare default element namespace http://businesscard.org" ; • element card { • <name>John Doe</name>, (: may use direct contructor :) • element title { text { "CEO, Widget Inc." } } , • element email { text { "john.doe@widget.com" } }, • element phone { text { "(202) 555-1414" } }, • element logo { • attributeuri { "widget.gif" } • }}
Computed QNames • Names of elements/attributes may need to be computed! • syntax nodeType (QName| {NameExpr})? { ContentExpr } • nodeTypes : • element,attribute,processing-instruction – need a name • document, comment, text -- no name declare default element namespace "http://businesscard.org" element { lower-case("CARD") } { element { "name" } { text { "John Doe" } }, element { "title" } { text { "CEO, Widget Inc." } }, element { "email" } { text { "john.doe@widget.com" } }, element { "phone" } { text { "(202) 555-1414" } }, element { "logo" } { attribute { "uri" } { "widget.gif" }, "aLogo" } }
Biliingual Business Cards xquery version "1.0" encoding "big5" ; declare default element namespace "uri:businesscard.org"; declare variable $lang := "zh_TW" ; element { if ($lang="zh_TW") then "名片" else "card" } { element { if ($lang="zh_TW") then "姓名" else "name" }{ "John Doe" }, element { if ($lang="zh_TW") then "頭銜" else "title" }{ "CEO, Widget Inc." }, element { "email" }{ "john.doe@widget.inc" }, element { if ($lang="zh_TW") then "電話" else "phone" } {"(202) 456-1414" }, element logo { attribute { "uri" } { "widget.gif" }} }
FLWOR Expressions • Used for general queries: • Find the names of all students which have more than one major and order them by student id. <doubles> { for $s in fn:doc("students.xml")//student let $m := $s/major where fn:count($m) ge 2 orderby $s/@id return <double> { $s/name/text() } </double> } </doubles>
The Difference Between For and Let (1/4) for $x in (1, 2, 3, 4) let $y := ("a", "b", "c") return ($x, $y) 1, a, b, c, 2, a, b, c, 3, a, b, c, 4, a, b, c
The Difference Between For and Let (2/4) let $x in (1, 2, 3, 4) for $y := ("a", "b", "c") return ($x, $y) 1, 2, 3, 4, a, 1, 2, 3, 4, b, 1, 2, 3, 4, c
The Difference Between For and Let (3/4) for $x in (1, 2, 3, 4) for $y in ("a", "b", "c") return ($x, $y) 1, a, 1, b, 1, c, 2, a, 2, b, 2, c, 3, a, 3, b, 3, c, 4, a, 4, b, 4, c
The Difference Between For and Let (4/4) let $x := (1, 2, 3, 4) let $y := ("a", "b", "c") return ($x, $y) 1, 2, 3, 4, a, b, c
Computing Joins • Find distinct namesof recipses that make use of at least one stuff in a refrigerator? declare namespace rcp = "http://www.brics.dk/ixwt/recipes"; fn:distinct-values( for $r in fn:doc("recipes.xml")//rcp:recipe, $i in $r//rcp:ingredient/@name, $s in fn:doc("frige.xml")//stuff[text()=$i] return $r/rcp:title/text() ) <fridge> <stuff>eggs</stuff> <stuff>olive oil</stuff> <stuff>ketchup</stuff> <stuff>unrecognizable moldy thing</stuff> </fridge>
Computing Joins • Find distinct namesof recipses that make use of at least one stuff in a refrigerator? • more efficient version: declare namespace rcp = "http://www.brics.dk/ixwt/recipes"; for $r in fn:doc("recipes.xml")//rcp:recipe let $names := $r//rcp:ingredient/@name let $stuffs := fn:doc("fridge.xml")//stuff/text() (: may be put at 2nd line:) where $names = $s (: general comparison :) return $r/rcp:title/text()
Inverting a Relation • From Recipe ingredients to ingredients recipes declare namespace rcp = "http://www.brics.dk/ixwt/recipes"; <ingredients> { for $i in distinct-values( (: for each ingredient i :) fn:doc("recipes.xml")//rcp:ingredient/@name (: find all ingredients :) ) return <ingredient name="{$i}"> { for $r in fn:doc("recipes.xml")//rcp:recipe (: for each receipe :) where $r//rcp:ingredient[@name=$i] (: which uses ingrediwent i :) return <title>$r/rcp:title/text()</title> } </ingredient> } </ingredients>
Sorting the Results declare namespace rcp = "http://www.brics.dk/ixwt/recipes"; <ingredients> { for $i in distinct-values( fn:doc("recipes.xml")//rcp:ingredient/@name ) order by $i return <ingredient name="{$i}"> { for $r in fn:doc("recipes.xml")//rcp:recipe where $r//rcp:ingredient[@name=$i] order by $r/rcp:title/text() return <title>$r/rcp:title/text()</title> } </ingredient> } </ingredients>
A More Complicated Sorting for $s in document("students.xml")//student order by fn:count($s/results/result[fn:contains(@grade,"A")]) descending, fn:count($s/major) descending, xs:integer($s/age/text()) ascending return $s/name/text()
Using Functions declare function local:grade($g as xs:string ) { if ($g="A") then 4.0 else if ($g="A-") then 3.7 else if ($g="B+") then 3.3 else if ($g="B") then 3.0 else if ($g="B-") then 2.7 else if ($g="C+") then 2.3 else if ($g="C") then 2.0 else if ($g="C-") then 1.7 else if ($g="D+") then 1.3 else if ($g="D") then 1.0 else if ($g="D-") then 0.7 else 0 }; declare function local:gpa($s as element(student) ) { fn:avg( for $g in $s/results/result/@grade return local:grade($g) ) }; <gpas> { for $s in fn:doc("students.xml")//student return <gpa id="{ $s/@id }" gpa="{local:gpa($s) }"/> } </gpas>
A Height Function • Find the height of an node tree. • can use recursion ! declare function local:height($x as node() ) { if (fn:empty($x/*)) then 0 (: an element without child element has height 0:) else fn:max(for $y in $x/* return local:height($y) )+1 } ;
A Textual Outline • intended textual outline of a recipe • unusual in that it generate plain text instead of xml output. Cailles en Sarcophages //recipe title pastry // ingredient chilled unsalted butter // sub ingredient flour salt ice water filling baked chicken marinated chicken small chickens, cut up Herbes de Provence dry white wine orange juice minced garlic truffle oil ...