480 likes | 557 Views
Querying XML . Sameer S. Pradhan. The Problem (DBMS Vs Docs). 3-level hierarchy: table, record and field Order is not part of the information Strings in separate fields are separate Location of data is not generally significant
E N D
Querying XML Sameer S. Pradhan
The Problem (DBMS Vs Docs) • 3-level hierarchy: table, record and field • Order is not part of the information • Strings in separate fields are separate • Location of data is not generally significant • Linking is far more often part of the data, not part of the schema representing data
Goals • Data Model • Based on XML Infoset • Query Operators • Query Language
Usage Scenarios • Human readable documents • Data-oriented documents • Mixed-model documents • Administrative data • Filtering streams • Multiple syntactic environments
General Requirements • Syntax Binding • MAY have more than one syntax binding • Declarativity • MUST be declarative • Protocol Independence • MUST be defined independently of any protocols • Error Conditions
XML Query Functionality (1) • Quantifiers • MUST include support for both Universal and Existential Quantifiers • Hierarchy and Sequence • MUST support operations on hierarchy and sequence of document structures • Aggregation • MUST allow computing summary information
XML Query Functionality (2) • Combination • MUST be able to combine information from multiple documents or from different parts of the same document • Sorting • MUST be able to sort query results • Structural Preservation • MUST preserve structure of original document
XML Query Functionality (3) • Structural Transformation • MUST be able to transform and create new structures • References • MUST be able to traverse intra- and inter-document references • Text and Element Boundaries • MUST handle text across element boundaries
XML Query Functionality (4) • Operation on Schemas • MUST be able to access Schemas or DTDs • Extensibility • SHOULD support the use of externally defined functions • Operation on Names • MUST perform simple operations on names • MAY perform more powerful operations
XML Query Functionality (5) • Closure • MUST be closed with respect to the XML Query data model
XML Query Data Model (1) • Datatypes • MUST represent XML 1.0 data as well as simple and complex types of XML Schema • References • MUST include support for references, both, internal and external • Schema Availability • MUST query even in the absence of Schema
XML Query Data Model (2) • Trees • Node-labeled • Edge-labeled • XML Query data model is a Node-labeled, tree-constructor representation • Node functions • Constructors • Accessors
Node Accessors • A node has eight accessors • isDocNode • isElemNode • isValueNode • isAttrNode • isNSNode • isPINode • isCommentNode • isInfoItemNode
Value Constructors • Fourteen primitive XML Schema datatypes • stringValue • boolValue • floatValue • doubleValue • decimalValue • timeDurValue • recurDurValue • binaryValue • urirefValue • idValue • idrefValue • qnameValue • entityValue • notationValue Note: ValueNode replaces XPath’s TextNode
Example <?xml version=1.0?> <p:part xmlns:p=“http://www.mywebsite.com/PartSchema” xsi:schemaLocation = “http://www.mywebsite.com/PartSchema http://www.mywebsite.com/PartSchema” name=“nutbolt”> <mfg>Acme</mfg> <price>10.50</price> </p:part>
Data-Model (1) children(D1) = [ Ref(E1) ] root(D1) = Ref(E1) name(E1) = QNameValue("http://www.mywebsite.com/PartSchema", "part", Ref(Def_QName)) children(E1) = [ Ref(E2), Ref(E3) ] attributes(E1) = { Ref(A1) } namespaces(E1) = { Ref(N1) } type(E1) = Ref(Def_part_type) parent(E1) = Ref(D1) name(A1) = QNameValue(null, "name", Ref(Def_QName)) value(A1) = Ref(StringValue("nutbolt", Ref(Def_string)))
Data-Model (2) parent(A1) = Ref(E1) prefix(N1) = Ref(StringValue("p", Ref(Def_string))) uri(N1) = URIRefValue("http://www.mywebsite.com/PartSchema", Ref(Def_uriReference)) parent(N1) = Ref(E1)
Constraints on Data Model • Node References • Defined by the query system NOT by the query language • Node Identity • The function ref is one-to-one onto • ref_equal(ref(n1), ref(n2)) equal(n1,n2) • Unique parent • Duplicate-free list of children
XQL • XQL - XML Query Language • The name was an ad hoc selection, but seems like it has and will survive for quite some time
XQL Design (1) • Compact, easy to type and read • Simple for common cases • Embeddable in programs, scripts, URLs • Unique identification of each node • Declarative NOT procedural • Evaluation at any level in the document • Result in document order; no repeat node
XQL Design (2) • Superset of XSL • Closure is guaranteed ONLY if the implementation returns well-formed XML documents
XQL: Syntax (1) • Mimics the URI navigation syntax • Notation • / : Root context • ./ : Current context • // : Recursive descent from root • .// : Recursive descent from current node • @ : Attribute • * : Any element
Sample Document <?xml version='1.0'?> <!-- This file represents a fragment of a book store inventory database --> <bookstore specialty='novel'> <book style='autobiography'> <title>Seven Years in Trenton</title> <author> <first-name>Joe</first-name> <last-name>Bob</last-name> <award>Trenton Literary Review Honorable Mention</award> </author> <price>12</price> </book> <my:book style='leather' price='29.50' xmlns:my='http://www.placeholder-name-here.com/schema/'> <my:title>Who's Who in Trenton</my:title> <my:author>Robert Bob</my:author> </my:book> </bookstore>
XQL: Examples (1) • ./author author • /bookstore • //author .//author • book[bookstore/@specialty = @style] • author/first-name • author/* • bookstore//title bookstore/*/title • *[@specialty]
XQL: Examples (2) • book[@style] • book/@style • book[excerpt]/author[degree] • book[excerpt][title] book[excerpt $and$ title] • author[name = …] author[name $eq$ …] • author[. = ‘Bob’] author[text() = ‘Bob’] • author[first-name!text() = ‘Bob’] • degree[index() $lt$ 3] degree[index() < 3]
XQL: Examples (3) • x/y[index() = 0] x/y[0] • (x/y)[0] • x[0]/y[0] • book[end()] • author[first-name][2] • price[@intl!value() = ‘canada’] • my:* • *:book • book/@my:style <x> <y/> <y/> </x> <x> <y/> <y/> </x>
XQL: Examples (4) • author[publications!count() > 10] • books[pub_date < date(‘1995-01-01’)] • books[pub_date < date(@first)] • bookstore/(book | magazine) • //comment()[1] • ancestor(book/author) • author[0, 2 $to$ 4, -1]
XML-QL • SQL-like • Features of query languages for semi-structured data • Supports joins and aggregates
XML-QL: Sample Document <bib> <book year="1995"> <!-- A good introductory text --> <title> An Introduction to Database Systems </title> <author> <lastname> Date </lastname> </author> <publisher> <name> Addison-Wesley </name > </publisher> </book> <book year="1998"> <title> Foundation for Object Databases: The Third Manifesto </title> <author> <lastname> Date </lastname> </author> <author> <lastname> Darwen </lastname> </author> <publisher> <name> Addison-Wesley </name > </publisher> </book> </bib>
XML-QL: Flattening Query (1) WHERE <book> <publisher><name>Addison-Wesley</name></publisher> <title> $t</title> <author> $a</author> </book> IN "www.a.b.c/bib.xml" CONSTRUCT $a Note: Flattening is not possible with XQL
XML-QL: Result (1) <result> <author> <lastname> Date </lastname> </author> <title> An Introduction to Database Systems </title> </result> <result> <author> <lastname> Date </lastname> </author> <title> Foundation for Object Databases: The Third Manifesto </title> </result> <result> <author> <lastname> Darwen </lastname> </author> <title> Foundation for ObjectDatabases: The Third Manifesto </title> </result>
XML-QL: Nested Queries (2) WHERE <book > $p</> IN "www.a.b.c/bib.xml", <title > $t</>, <publisher><name>Addison-Wesley</></> IN $p CONSTRUCT <result> <title> $t </> WHERE <author> $a </> IN $p CONSTRUCT <author> $a</> </>
XML-QL: CONTENT_AS WHERE <book> <title> $t </> <publisher><name>Addison-Wesley </> </> </> CONTENT_AS $p IN "www.a.b.c/bib.xml" CONSTRUCT <result><title> $t </> WHERE <author> $a</> IN $p CONSTRUCT <author> $a</> </>
XML-QL: Result (2) <result> <title> An Introduction to Database Systems </title> <author> <lastname> Date </lastname> </author> </result> <result> <title> Foundation for Object/Relational Databases: The Third Manifesto </title> <author> <lastname> Date </lastname> </author> <author> <lastname> Darwen </lastname> </author> </result>
XML-QL: Query (3) WHERE <article> <author> <firstname> $f </> // firstname $f <lastname> $l </> // lastname $l </></> CONTENT_AS $a IN "www.a.b.c/bib.xml” <book year=$y> <author> <firstname> $f </> // join on same firstname $f <lastname> $l </> // join on same lastname $l </></> IN "www.a.b.c/bib.xml", $y > 1995 CONSTRUCT <article> $a </>
XML-QL: ELEMENT_AS WHERE <article> <author> <firstname> $f</> // firstname $f <lastname> $l</> // lastname $l </> </> ELEMENT_AS $e IN "www.a.b.c/bib.xml" ... CONSTRUCT $e
XML-QL: Tag Variables WHERE <$p> <title> $t </title> <year>1995</> <$e> Smith </> </> IN "www.a.b.c/bib.xml", $e IN {author, editor} CONSTRUCT <$p> <title> $t </title> <$e> Smith </> </> Note: XQL does not support tag variables
XML-QL: Regular Expressions <!ELEMENT part (name brand part*)> <!ELEMENT name CDATA> <!ELEMENT brand CDATA> WHERE <part*><name>$r</> <brand>Ford</> </> IN www.a.b.c/bib.xml" CONSTRUCT <result>$r</> WHERE <$*> <name>$r</> <brand>Ford</> </> IN "www.a.b.c/bib.xml" CONSTRUCT <result>$r</> WHERE <part+.(subpart|component.piece)>$r</> IN "www.a.b.c/parts.xml" CONSTRUCT <result> $r</> Note: XQL does not support regular expressions
XML-QL: Joins WHERE <person> <name></> ELEMENT_AS $n <ssn> $ssn</> </> IN "www.a.b.c/data.xml", <taxpayer> <ssn> $ssn</> <income></> ELEMENT_AS $i </> IN "www.irs.gov/taxpayers.xml" CONSTRUCT <result> $n $i </>
XML-QL: Ordering WHERE <pub> &p </> in "www.a.b.c/bib.xml", <title> $t </> in $p, <year> $y </> in $p <month> $z </> in $p ORDER-BY $y,$z CONSTRUCT $t Note: XQL does not support ordering
XML-QL: Grouping CONSTRUCT <results> { WHERE <bib><book> <title>$t</title> <author><last>$l</last><first>$f</first></author> </book> </bib> IN "www.bn.com/bib.xml" CONSTRUCT <result ID=author($l,$f)> <title>$t</title> <author><last>$l</last><first>$f</first></author></result> } </results> Note: Explicit grouping is not possible with XQL
XML-QL: Functions FUNCTION findDeclaredIncomes($Taxpayers, $Employees) WHERE <taxpayer> <ssn> $s </> <income> $x </> </> IN $Taxpayers, <employee> <ssn> $s </> <name> $n </> </> IN $Employees CONSTRUCT <result> <name> $n </> <Income> $x </> </> END findDelcaredIncomes("www.irs.gov/taxpayers.xml", “www.a.b.c/employees.xml")
XQuery • Builds directly on XPointer • Special type for the results • Ability to return ranges (spans)
XQuery: Syntax • ? : Selects element with given id • ^ : Selects among containers of current node • < : Preceding sibling • > : Following sibling • « : All preceding nodes • » : All following nodes • @ : Attribute • $ : Selects a range by matching a string
XQuery: Queries • descendant(FOOTNOTE & TYPE=‘CITATION’).(REF) • descendent(SEC & descendent(LEVEL = ‘SECRET’)) • descendent(FOOTNOTE & TYPE=‘CITATION’).(REF){1-2}.link(role=AUTHOR) • descendent(FOOTNOTE & (child(AUTHOR).attr(TYPE)= *(ancestor(CHAPTER).attr(AUTHOR))) • union(id(foo), id(bar), descendent(SEC)) • intersection(descendent (ITEM & string(‘dog’)), descendent (ITEM & string(‘cat’))) • difference(fsibling(div), ID(SECRET)) • ^TI P* [^UI OL DL] {1,3} SUMMARY $
Other Query Languages • Lorel (Lightweight Object REpository Language) • YATL • Xtract • Xmlquery • XML Query EngineAnd...
QUILT • The problem with most query languages is that they are either document oriented or database oriented • QUILT is derived from both domains and promises substantial coverage of both areas • It has a FLWR (pronounced as ‘flower’) construct
References • http://www.w3.org/TR/2000/WD-xmlquery-req-20000131 • http://www.w3.org/TandS/QL/QL98/pp/xql.html • http://www.w3.org/TR/1998/NOTE-xml-ql-19980819/ • http://www.w3.org/TandS/QL/QL98/pp/xquery.html • http://www.fatdog.com/ • http://www.almaden.ibm.com/cs/people/chamberlin/quilt_lncs.pdf • http://www-db.research.bell-labs.com/user/simeon/xquery.html • http://www-db.stanford.edu/lore/ • http://www.cs.washington.edu/homes/zives/research/xmlquery.pdf • http://www.oasis-open.org/cover/xmlQuery.html (main source)