1 / 48

Querying XML

Querying XML . Sameer S. Pradhan. The Problem (DBMS Vs Docs). 3-level hierarchy: table, record and field Order is not part of the information Strings in separate fields are separate Location of data is not generally significant

johnna
Download Presentation

Querying XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Querying XML Sameer S. Pradhan

  2. The Problem (DBMS Vs Docs) • 3-level hierarchy: table, record and field • Order is not part of the information • Strings in separate fields are separate • Location of data is not generally significant • Linking is far more often part of the data, not part of the schema representing data

  3. Goals • Data Model • Based on XML Infoset • Query Operators • Query Language

  4. Usage Scenarios • Human readable documents • Data-oriented documents • Mixed-model documents • Administrative data • Filtering streams • Multiple syntactic environments

  5. General Requirements • Syntax Binding • MAY have more than one syntax binding • Declarativity • MUST be declarative • Protocol Independence • MUST be defined independently of any protocols • Error Conditions

  6. XML Query Functionality (1) • Quantifiers • MUST include support for both Universal and Existential Quantifiers • Hierarchy and Sequence • MUST support operations on hierarchy and sequence of document structures • Aggregation • MUST allow computing summary information

  7. XML Query Functionality (2) • Combination • MUST be able to combine information from multiple documents or from different parts of the same document • Sorting • MUST be able to sort query results • Structural Preservation • MUST preserve structure of original document

  8. XML Query Functionality (3) • Structural Transformation • MUST be able to transform and create new structures • References • MUST be able to traverse intra- and inter-document references • Text and Element Boundaries • MUST handle text across element boundaries

  9. XML Query Functionality (4) • Operation on Schemas • MUST be able to access Schemas or DTDs • Extensibility • SHOULD support the use of externally defined functions • Operation on Names • MUST perform simple operations on names • MAY perform more powerful operations

  10. XML Query Functionality (5) • Closure • MUST be closed with respect to the XML Query data model

  11. XML Query Data Model (1) • Datatypes • MUST represent XML 1.0 data as well as simple and complex types of XML Schema • References • MUST include support for references, both, internal and external • Schema Availability • MUST query even in the absence of Schema

  12. XML Query Data Model (2) • Trees • Node-labeled • Edge-labeled • XML Query data model is a Node-labeled, tree-constructor representation • Node functions • Constructors • Accessors

  13. Node Accessors • A node has eight accessors • isDocNode • isElemNode • isValueNode • isAttrNode • isNSNode • isPINode • isCommentNode • isInfoItemNode

  14. Value Constructors • Fourteen primitive XML Schema datatypes • stringValue • boolValue • floatValue • doubleValue • decimalValue • timeDurValue • recurDurValue • binaryValue • urirefValue • idValue • idrefValue • qnameValue • entityValue • notationValue Note: ValueNode replaces XPath’s TextNode

  15. Example <?xml version=1.0?> <p:part xmlns:p=“http://www.mywebsite.com/PartSchema” xsi:schemaLocation = “http://www.mywebsite.com/PartSchema http://www.mywebsite.com/PartSchema” name=“nutbolt”> <mfg>Acme</mfg> <price>10.50</price> </p:part>

  16. Data-Model (1) children(D1) = [ Ref(E1) ] root(D1) = Ref(E1) name(E1) = QNameValue("http://www.mywebsite.com/PartSchema", "part", Ref(Def_QName)) children(E1) = [ Ref(E2), Ref(E3) ] attributes(E1) = { Ref(A1) } namespaces(E1) = { Ref(N1) } type(E1) = Ref(Def_part_type) parent(E1) = Ref(D1) name(A1) = QNameValue(null, "name", Ref(Def_QName)) value(A1) = Ref(StringValue("nutbolt", Ref(Def_string)))

  17. Data-Model (2) parent(A1) = Ref(E1) prefix(N1) = Ref(StringValue("p", Ref(Def_string))) uri(N1) = URIRefValue("http://www.mywebsite.com/PartSchema", Ref(Def_uriReference)) parent(N1) = Ref(E1)

  18. Constraints on Data Model • Node References • Defined by the query system NOT by the query language • Node Identity • The function ref is one-to-one onto • ref_equal(ref(n1), ref(n2))  equal(n1,n2) • Unique parent • Duplicate-free list of children

  19. XQL • XQL - XML Query Language • The name was an ad hoc selection, but seems like it has and will survive for quite some time

  20. XQL Design (1) • Compact, easy to type and read • Simple for common cases • Embeddable in programs, scripts, URLs • Unique identification of each node • Declarative NOT procedural • Evaluation at any level in the document • Result in document order; no repeat node

  21. XQL Design (2) • Superset of XSL • Closure is guaranteed ONLY if the implementation returns well-formed XML documents

  22. XQL: Syntax (1) • Mimics the URI navigation syntax • Notation • / : Root context • ./ : Current context • // : Recursive descent from root • .// : Recursive descent from current node • @ : Attribute • * : Any element

  23. Sample Document <?xml version='1.0'?> <!-- This file represents a fragment of a book store inventory database --> <bookstore specialty='novel'> <book style='autobiography'> <title>Seven Years in Trenton</title> <author> <first-name>Joe</first-name> <last-name>Bob</last-name> <award>Trenton Literary Review Honorable Mention</award> </author> <price>12</price> </book> <my:book style='leather' price='29.50' xmlns:my='http://www.placeholder-name-here.com/schema/'> <my:title>Who's Who in Trenton</my:title> <my:author>Robert Bob</my:author> </my:book> </bookstore>

  24. XQL: Examples (1) • ./author  author • /bookstore • //author  .//author • book[bookstore/@specialty = @style] • author/first-name • author/* • bookstore//title  bookstore/*/title • *[@specialty]

  25. XQL: Examples (2) • book[@style] • book/@style • book[excerpt]/author[degree] • book[excerpt][title]  book[excerpt $and$ title] • author[name = …]  author[name $eq$ …] • author[. = ‘Bob’]  author[text() = ‘Bob’] • author[first-name!text() = ‘Bob’] • degree[index() $lt$ 3]  degree[index() < 3]

  26. XQL: Examples (3) • x/y[index() = 0]  x/y[0] • (x/y)[0] • x[0]/y[0] • book[end()] • author[first-name][2] • price[@intl!value() = ‘canada’] • my:* • *:book • book/@my:style <x> <y/> <y/> </x> <x> <y/> <y/> </x>

  27. XQL: Examples (4) • author[publications!count() > 10] • books[pub_date < date(‘1995-01-01’)] • books[pub_date < date(@first)] • bookstore/(book | magazine) • //comment()[1] • ancestor(book/author) • author[0, 2 $to$ 4, -1]

  28. XML-QL • SQL-like • Features of query languages for semi-structured data • Supports joins and aggregates

  29. XML-QL: Sample Document <bib> <book year="1995"> <!-- A good introductory text --> <title> An Introduction to Database Systems </title> <author> <lastname> Date </lastname> </author> <publisher> <name> Addison-Wesley </name > </publisher> </book> <book year="1998"> <title> Foundation for Object Databases: The Third Manifesto </title> <author> <lastname> Date </lastname> </author> <author> <lastname> Darwen </lastname> </author> <publisher> <name> Addison-Wesley </name > </publisher> </book> </bib>

  30. XML-QL: Flattening Query (1) WHERE <book> <publisher><name>Addison-Wesley</name></publisher> <title> $t</title> <author> $a</author> </book> IN "www.a.b.c/bib.xml" CONSTRUCT $a Note: Flattening is not possible with XQL

  31. XML-QL: Result (1) <result> <author> <lastname> Date </lastname> </author> <title> An Introduction to Database Systems </title> </result> <result> <author> <lastname> Date </lastname> </author> <title> Foundation for Object Databases: The Third Manifesto </title> </result> <result> <author> <lastname> Darwen </lastname> </author> <title> Foundation for ObjectDatabases: The Third Manifesto </title> </result>

  32. XML-QL: Nested Queries (2) WHERE <book > $p</> IN "www.a.b.c/bib.xml", <title > $t</>, <publisher><name>Addison-Wesley</></> IN $p CONSTRUCT <result> <title> $t </> WHERE <author> $a </> IN $p CONSTRUCT <author> $a</> </>

  33. XML-QL: CONTENT_AS WHERE <book> <title> $t </> <publisher><name>Addison-Wesley </> </> </> CONTENT_AS $p IN "www.a.b.c/bib.xml" CONSTRUCT <result><title> $t </> WHERE <author> $a</> IN $p CONSTRUCT <author> $a</> </>

  34. XML-QL: Result (2) <result> <title> An Introduction to Database Systems </title> <author> <lastname> Date </lastname> </author> </result> <result> <title> Foundation for Object/Relational Databases: The Third Manifesto </title> <author> <lastname> Date </lastname> </author> <author> <lastname> Darwen </lastname> </author> </result>

  35. XML-QL: Query (3) WHERE <article> <author> <firstname> $f </> // firstname $f <lastname> $l </> // lastname $l </></> CONTENT_AS $a IN "www.a.b.c/bib.xml” <book year=$y> <author> <firstname> $f </> // join on same firstname $f <lastname> $l </> // join on same lastname $l </></> IN "www.a.b.c/bib.xml", $y > 1995 CONSTRUCT <article> $a </>

  36. XML-QL: ELEMENT_AS WHERE <article> <author> <firstname> $f</> // firstname $f <lastname> $l</> // lastname $l </> </> ELEMENT_AS $e IN "www.a.b.c/bib.xml" ... CONSTRUCT $e

  37. XML-QL: Tag Variables WHERE <$p> <title> $t </title> <year>1995</> <$e> Smith </> </> IN "www.a.b.c/bib.xml", $e IN {author, editor} CONSTRUCT <$p> <title> $t </title> <$e> Smith </> </> Note: XQL does not support tag variables

  38. XML-QL: Regular Expressions <!ELEMENT part (name brand part*)> <!ELEMENT name CDATA> <!ELEMENT brand CDATA> WHERE <part*><name>$r</> <brand>Ford</> </> IN www.a.b.c/bib.xml" CONSTRUCT <result>$r</> WHERE <$*> <name>$r</> <brand>Ford</> </> IN "www.a.b.c/bib.xml" CONSTRUCT <result>$r</> WHERE <part+.(subpart|component.piece)>$r</> IN "www.a.b.c/parts.xml" CONSTRUCT <result> $r</> Note: XQL does not support regular expressions

  39. XML-QL: Joins WHERE <person> <name></> ELEMENT_AS $n <ssn> $ssn</> </> IN "www.a.b.c/data.xml", <taxpayer> <ssn> $ssn</> <income></> ELEMENT_AS $i </> IN "www.irs.gov/taxpayers.xml" CONSTRUCT <result> $n $i </>

  40. XML-QL: Ordering WHERE <pub> &p </> in "www.a.b.c/bib.xml", <title> $t </> in $p, <year> $y </> in $p <month> $z </> in $p ORDER-BY $y,$z CONSTRUCT $t Note: XQL does not support ordering

  41. XML-QL: Grouping CONSTRUCT <results> { WHERE <bib><book> <title>$t</title> <author><last>$l</last><first>$f</first></author> </book> </bib> IN "www.bn.com/bib.xml" CONSTRUCT <result ID=author($l,$f)> <title>$t</title> <author><last>$l</last><first>$f</first></author></result> } </results> Note: Explicit grouping is not possible with XQL

  42. XML-QL: Functions FUNCTION findDeclaredIncomes($Taxpayers, $Employees) WHERE <taxpayer> <ssn> $s </> <income> $x </> </> IN $Taxpayers, <employee> <ssn> $s </> <name> $n </> </> IN $Employees CONSTRUCT <result> <name> $n </> <Income> $x </> </> END findDelcaredIncomes("www.irs.gov/taxpayers.xml", “www.a.b.c/employees.xml")

  43. XQuery • Builds directly on XPointer • Special type for the results • Ability to return ranges (spans)

  44. XQuery: Syntax • ? : Selects element with given id • ^ : Selects among containers of current node • < : Preceding sibling • > : Following sibling • « : All preceding nodes • » : All following nodes • @ : Attribute • $ : Selects a range by matching a string

  45. XQuery: Queries • descendant(FOOTNOTE & TYPE=‘CITATION’).(REF) • descendent(SEC & descendent(LEVEL = ‘SECRET’)) • descendent(FOOTNOTE & TYPE=‘CITATION’).(REF){1-2}.link(role=AUTHOR) • descendent(FOOTNOTE & (child(AUTHOR).attr(TYPE)= *(ancestor(CHAPTER).attr(AUTHOR))) • union(id(foo), id(bar), descendent(SEC)) • intersection(descendent (ITEM & string(‘dog’)), descendent (ITEM & string(‘cat’))) • difference(fsibling(div), ID(SECRET)) • ^TI P* [^UI OL DL] {1,3} SUMMARY $

  46. Other Query Languages • Lorel (Lightweight Object REpository Language) • YATL • Xtract • Xmlquery • XML Query EngineAnd...

  47. QUILT • The problem with most query languages is that they are either document oriented or database oriented • QUILT is derived from both domains and promises substantial coverage of both areas • It has a FLWR (pronounced as ‘flower’) construct

  48. References • http://www.w3.org/TR/2000/WD-xmlquery-req-20000131 • http://www.w3.org/TandS/QL/QL98/pp/xql.html • http://www.w3.org/TR/1998/NOTE-xml-ql-19980819/ • http://www.w3.org/TandS/QL/QL98/pp/xquery.html • http://www.fatdog.com/ • http://www.almaden.ibm.com/cs/people/chamberlin/quilt_lncs.pdf • http://www-db.research.bell-labs.com/user/simeon/xquery.html • http://www-db.stanford.edu/lore/ • http://www.cs.washington.edu/homes/zives/research/xmlquery.pdf • http://www.oasis-open.org/cover/xmlQuery.html (main source)

More Related