530 likes | 709 Views
XML, XML Schema, Xpath and Xquery. Slides collated from various sources, many from Dan Suciu at Univ. of Washington. XML. W3C standard to complement HTML origins: structured text SGML motivation: HTML describes presentation XML describes content
E N D
XML, XML Schema, Xpath and Xquery Slides collated from various sources, many from Dan Suciu at Univ. of Washington
XML W3C standard to complement HTML • origins: structured text SGML • motivation: • HTML describes presentation • XML describes content • http://www.w3.org/TR/2000/REC-xml-20001006 (version 2, 10/2000) CS561 - Spring 2004.
From HTML to XML HTML describes the presentation CS561 - Spring 2004.
HTML <h1> Bibliography </h1> <p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteboul, Buneman, Suciu <br> Morgan Kaufmann, 1999 CS561 - Spring 2004.
XML <bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> … </bibliography> XML describes the content CS561 - Spring 2004.
XML Terminology • tags: book, title, author, … • start tag: <book>, end tag: </book> • elements: <book>…<book>,<author>…</author> • elements are nested • empty element: <red></red> abbrv. <red/> • an XML document: single root element well formed XML document: if it has matching tags CS561 - Spring 2004.
More XML: Attributes <bookprice = “55” currency = “USD”> <title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 1995 </year> </book> attributes are alternative ways to represent data CS561 - Spring 2004.
More XML: Oids and References <personid=“o555”> <name> Jane </name> </person> <personid=“o456”> <name> Mary </name> <childrenidref=“o123 o555”/> </person> <personid=“o123” mother=“o456”><name>John</name> </person> oids and references in XML are just syntax CS561 - Spring 2004.
XML Namespaces • http://www.w3.org/TR/REC-xml-names (1/99) • name ::= [prefix:]localpart <bookxmlns:isbn=“www.isbn-org.org/def”> <title> … </title> <number> 15 </number> <isbn:number> …. </isbn:number> </book> CS561 - Spring 2004.
defined here XML Namespaces • syntactic: <number> , <isbn:number> • semantic: provide URL for schema <tagxmlns:mystyle = “http://…”> … <mystyle:title> … </mystyle:title> <mystyle:number> … </tag> CS561 - Spring 2004.
XML Schemas • http://www.w3.org/TR/xmlschema-1/10/2000 • generalizes DTDs • uses XML syntax • two documents: structure and datatypes • http://www.w3.org/TR/xmlschema-1 • http://www.w3.org/TR/xmlschema-2 • XML-Schema is complex CS561 - Spring 2004.
XML Schemas <xsd:elementname=“paper” type=“papertype”/> <xsd:complexTypename=“papertype”> <xsd:sequence> <xsd:elementname=“title” type=“xsd:string”/> <xsd:elementname=“author” minOccurs=“0”/> <xsd:elementname=“year”/> <xsd:choice> < xsd:elementname=“journal”/> <xsd:elementname=“conference”/> </xsd:choice> </xsd:sequence> </xsd:element> DTD: <!ELEMENT paper (title,author*,year, (journal|conference))> CS561 - Spring 2004.
Elements v.s. Types in XML Schema <xsd:elementname=“person”> <xsd:complexType> <xsd:sequence> <xsd:elementname=“name” type=“xsd:string”/> <xsd:elementname=“address”type=“xsd:string”/> </xsd:sequence> </xsd:complexType></xsd:element> <xsd:elementname=“person”type=“ttt”><xsd:complexType name=“ttt”> <xsd:sequence> <xsd:elementname=“name” type=“xsd:string”/> <xsd:elementname=“address”type=“xsd:string”/> </xsd:sequence></xsd:complexType> DTD: <!ELEMENT person (name,address)> CS561 - Spring 2004.
Elements v.s. Types in XML Schema • Types: • Simple types (integers, strings, ...) • Complex types (regular expressions, like in DTDs) • Element-type-element alternation: • Root element has a complex type • That type is a regular expression of elements • Those elements have their complex types... • ... • On the leaves we have simple types CS561 - Spring 2004.
Local and Global Types in XML Schema • Local type: <xsd:elementname=“person”> [define locally the person’s type] </xsd:element> • Global type: <xsd:elementname=“person” type=“ttt”/> <xsd:complexType name=“ttt”> [define here the type ttt] </xsd:complexType> CS561 - Spring 2004. Global types: can be reused in other elements
Local v.s. Global Elements inXML Schema • Local element: <xsd:complexType name=“ttt”> <xsd:sequence> <xsd:elementname=“address” type=“...”/>... </xsd:sequence> </xsd:complexType> • Global element: <xsd:elementname=“address” type=“...”/> <xsd:complexType name=“ttt”> <xsd:sequence><xsd:elementref=“address”/> ... </xsd:sequence> </xsd:complexType> Global elements: like in DTDs CS561 - Spring 2004.
Regular Expressions in XML Schema Recall the element-type-element alternation: <xsd:complexType name=“....”> [regular expression on elements] </xsd:complexType> Regular expressions: • <xsd:sequence> A B C </...> = A B C • <xsd:choice> A B C </...> = A | B | C • <xsd:group> A B C </...> = (A B C) • <xsd:... minOccurs=“0”maxOccurs=“unbounded”> ..</...> = (...)* • <xsd:... minOccurs=“0”maxOccurs=“1”> ..</...> = (...)? CS561 - Spring 2004.
Attributes in XML Schema <xsd:elementname=“paper” type=“papertype”/> <xsd:complexTypename=“papertype”> <xsd:sequence> <xsd:elementname=“title” type=“xsd:string”/> . . . . . . </xsd:sequence> <xsd:attribute name=“language" type="xsd:NMTOKEN" fixed=“English"/> </xsd:complexType> Attributes are associated to the type, not to the element Only to complex types; more trouble if we want to add attributes to simple types. CS561 - Spring 2004.
“Mixed” Content, “Any” Type <xsd:complexTypemixed="true"> . . . . • Better than in DTDs: can still enforce the type, but now may have text between any elements • Means anything is permitted there <xsd:elementname="anything" type="xsd:anyType"/> . . . . CS561 - Spring 2004.
Derived Types by Extensions <complexTypename="Address"> <sequence> <elementname="street" type="string"/> <elementname="city" type="string"/> </sequence> </complexType> <complexTypename="USAddress"> <complexContent> <extensionbase="ipo:Address"> <sequence> <elementname="state" type="ipo:USState"/> <elementname="zip" type="positiveInteger"/> </sequence> </extension> </complexContent> </complexType> CS561 - Spring 2004. Corresponds to inheritance
Derived Types by Restrictions • (*): may restrict cardinalities, e.g. (0,infty) to (1,1); may restrict choices; other restrictions… <complexContent> <restrictionbase="ipo:Items“> … [rewrite the entire content, with restrictions]... </restriction> </complexContent> Corresponds to set inclusion CS561 - Spring 2004.
Keys in XML Schema XML: • <purchaseReport> • <regions> • <zipcode="95819"> • <partnumber="872-AA" quantity="1"/> • <partnumber="926-AA" quantity="1"/> • <partnumber="833-AA" quantity="1"/> • <partnumber="455-BX" quantity="1"/> • </zip> • <zip code="63143"> • <partnumber="455-BX" quantity="4"/> • </zip> • </regions> • <parts> • <partnumber="872-AA">Lawnmower</part> • <partnumber="926-AA">Baby Monitor</part> • <partnumber="833-AA">Lapis Necklace</part> • <partnumber="455-BX">Sturdy Shelves</part> • </parts> • </purchaseReport> XML Schema: <keyname="NumKey"> <selectorxpath="parts/part"/> <fieldxpath="@number"/> </key> CS561 - Spring 2004.
Keys in XML Schema • In general, two flavors: <keyname=“someDummyNameHere"> <selectorxpath=“p"/> <fieldxpath=“p1"/> <fieldxpath=“p2"/> . . . <fieldxpath=“pk"/> </key> <uniquename=“someDummyNameHere"> <selectorxpath=“p"/> <fieldxpath=“p1"/> <fieldxpath=“p2"/> . . . <fieldxpath=“pk"/> </key> Note: all Xpath expressions “start” at the element currently being defined The fields must identify a single node CS561 - Spring 2004.
Keys in XML Schema • Unique = guarantees uniqueness • Key = guarantees uniqueness and existence • All Xpath expressions are “restricted”: • /a/b | /a/c OK for selector” • //a/b/*/c OK for field • Note: better than DTD’s ID mechanism CS561 - Spring 2004.
Keys in XML Schema • Examples • <keyname="fullName"> • <selectorxpath=".//person"/> • <fieldxpath="forename"/> • <fieldxpath="surname"/> • </key> • <uniquename="nearlyID"> • <selectorxpath=".//*"/> • <fieldxpath="@id"/> • </unique> Recall: must have A single forename, Single surname CS561 - Spring 2004.
Foreign Keys in XML Schema • Example • <keyrefname="personRef" refer="fullName"> • <selectorxpath=".//personPointer"/> • <fieldxpath="@first"/> • <fieldxpath="@last"/> • </keyref> CS561 - Spring 2004.
XPath • Goal = permit to access some nodes from document • XPath main construct : axis navigation • XPath path consists of one or more navigation steps, separated by / • Navigation step : axis + node-test + predicates • Examples • /descendant::node()/child::author • /descendant::node()/child::author[parent/attribute::booktitle =“XML”][2] • XPath also offers shortcuts • no axis means child • // º /descendant-or-self::node()/ CS561 - Spring 2004.
context node aaa ccc aaa aaa ccc 2 3 1 bbb bbb 4 5 6 7 XPath- Child axis navigation • author is shorthand for child::author. Examples: • aaa -- all the child nodes labeled aaa (1,3) • aaa/bbb -- all the bbb grandchildren of aaa children (4) • */bbb all the bbb grandchildren of any child (4,6) • . -- the context node • / -- the root node CS561 - Spring 2004.
XPath- child axis navigation • /doc -- all the doc children of the root • ./aaa -- all the aaa children of the context node (equivalent to aaa) • text() -- all the text children of the context node • node() -- all the children of the context node (includes text and attribute nodes) • .. -- parent of the context node • .// -- the context node and all its descendants • // -- the root node and all its descendants • //text() -- all the text nodes in the document CS561 - Spring 2004.
Predicates • [2] -- the second child node of the context node • chapter[5] -- the fifth chapter child of the context node • [last()] -- the last child node of the context node • chapter[title=“introduction”] -- the chapter children of the context node that have one or more title children whose string-value is “introduction” (the string-value is the concatenation of all the text on descendant text nodes) • person[.//firstname = “joe”] -- the person children of the context node that have in their descendants a firstname element with string-value “Joe” CS561 - Spring 2004.
Axis navigation • So far, nearly all our expressions have moved us down by moving to child nodes. Exceptions were • . -- stay where you are • / go to the root • // all descendants of the root • .// all descendants of the context node • XPath has several axes: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, self • Some of these (self, parent) describe single nodes, others describe sequences of nodes. CS561 - Spring 2004.
XPath Navigation Axes ancestor preceding-sibling following-sibling self child attribute preceding following namespace descendant CS561 - Spring 2004.
XPath abbreviated syntax (nothing) child:: @ attribute:: // /descendant-or-self::node() . self::node() .// descendant-or-self::node .. parent::node() / (document root) CS561 - Spring 2004.
Summary of XQuery • FLWR expressions • FOR and LET expressions • Collections and sorting Resources XQuery: A Query Language for XML Chamberlin, Florescu, et al. W3C recommendation: www.w3.org/TR/xquery/ CS561 - Spring 2004.
XQuery • Based on Quilt (which is based on XML-QL) • http://www.w3.org/TR/xquery/2/2001 • XML Query data model (ordered) CS561 - Spring 2004.
FLWR (“Flower”) Expressions FOR ... LET... FOR... LET... WHERE... RETURN... CS561 - Spring 2004.
XQuery Find all book titles published after 1995: FOR$xINdocument("bib.xml")/bib/book WHERE$x/year > 1995 RETURN$x/title Result: <title> abc </title> <title> def </title> <title> ghi </title> CS561 - Spring 2004.
XQuery For each author of a book by Morgan Kaufmann, list all books she published: FOR$aINdistinct(document("bib.xml")/bib/book[publisher=“Morgan Kaufmann”]/author) RETURN <result> $a, FOR$tIN /bib/book[author=$a]/title RETURN$t </result> distinct = a function that eliminates duplicates CS561 - Spring 2004.
XQuery Result: <result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result> CS561 - Spring 2004.
XQuery • FOR$x in expr -- binds $x to each element in the list expr • LET$x = expr -- binds $x to the entire list expr • Useful for common subexpressions and for aggregations CS561 - Spring 2004.
XQuery <big_publishers> FOR$pINdistinct(document("bib.xml")//publisher) LET$b := document("bib.xml")/book[publisher = $p] WHEREcount($b) > 100 RETURN$p </big_publishers> count = a (aggregate) function that returns the number of elms CS561 - Spring 2004.
XQuery Find books whose price is larger than average: LET$a=avg(document("bib.xml")/bib/book/@price) FOR$b in document("bib.xml")/bib/book WHERE$b/@price > $a RETURN$b CS561 - Spring 2004.
XQuery Summary: • FOR-LET-WHERE-RETURN = FLWR FOR/LET Clauses List of tuples WHERE Clause List of tuples RETURN Clause CS561 - Spring 2004. Instance of Xquery data model
FOR v.s. LET FOR • Binds node variables iteration LET • Binds collection variables one value CS561 - Spring 2004.
FOR v.s. LET Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ... FOR$xINdocument("bib.xml")/bib/book RETURN <result> $x </result> LET$x:=document("bib.xml")/bib/book RETURN <result> $x </result> Returns: <result> <book>...</book> <book>...</book> <book>...</book> ... </result> CS561 - Spring 2004.
Collections in XQuery • Ordered and unordered collections • /bib/book/author = an ordered collection • Distinct(/bib/book/author) = an unordered collection • LET$a = /bib/book $a is a collection • $b/author a collection (several authors...) Returns: <result> <author>...</author> <author>...</author> <author>...</author> ... </result> RETURN <result> $b/author </result> CS561 - Spring 2004.
Sorting in XQuery <publisher_list> FOR$pINdistinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR$bIN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/@price </book> SORTBY(priceDESCENDING) </publisher> SORTBY(name) </publisher_list> CS561 - Spring 2004.
Sorting in XQuery • Sorting arguments: refer to name space of RETURN clause, not FOR clause • To sort on an element you don’t want to display, first return it, then remove it with an additional query. CS561 - Spring 2004.