1 / 29

Querying XML with Locator Semantics

An overview of requirements and concepts for querying XML using locator semantics, including a running example and XQL overview.

gmcculloch
Download Presentation

Querying XML with Locator Semantics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Querying XML with Locator Semantics Peter Fankhauser joint work with: Matthias Friedrich, Gerald Huck, Ingo Macherius, Jonathan Robie GMD German National Research Center for Information Technology Institute for Integrated Publication- and Informationsystems GMD-IPSI http://xml.darmstadt.gmd.de/

  2. Overview • Requirements for Querying XML • XQL Overview • Locators • Locator Algebra • IPSI XML-Brokering Framework

  3. General Requirements for Querying XML(Excerpt from Dave Maier, W3C QL 98) • Require no schema • flexibly match irregular structure • preserve (irregular) structure • Query & Preserve Order and Association • sibling order • hierarchy • Precise Semantics • rewrite rules • compositional semantics • Closedness/Completeness • XML to XML • when is a QL for XML complete?

  4. Running Example • Bookstore: • Non Uniform Hierarchy • sci-fi: 2 levels • mystery: 3 levels • Customers: Flat Table <books_and_customers><bookstore> <fiction> <sci-fi> <book> <isbn>0006482805</isbn> <title>Do androids dream of electric sheep</title> <author>Philip K. Dick</author> </book> </sci-fi> <fantasy> <mystery> <book> <isbn>0261102362</isbn> <title>The two towers</title> <author>JRR Tolkien</author> </book> </mystery> </fantasy> </fiction></bookstore><!-- continued next column --> <customers> <customer> <name>Jason Woolsey</name> <boughtbooks> <isbn>0261102362</isbn> <isbn>0593488321</isbn> </boughtbooks> </customer> <customer> <name>P.W. Ellis</name> <boughtbooks> <isbn>0006482805</isbn> <isbn>0261102362</isbn> </boughtbooks> </customer> </customers> </ books_and_customers >

  5. Functional Requirements for Querying XML (Dave Maier, W3C QL 98) • Selection and Extraction: • all sci-fi books by P.K. Dick • Reduction: • drop all authors but 1st author • Combination: • combine all books with their customers via isbn • Restructuring: • return flat lists of title/author pairs • and vice versa • Multidocument Handling: • get reviews and books from different sites • follow (dereference) links in books to authors

  6. XQL Overview (State W3C QL 98) • Basic Concept: Selection of Subtrees • Originated as QL for DOM • adopted for selectors in XSL-templates(now merged with XPointer to XPel to XPath to ????) • Defined along search contexts = an (ordered) set of document nodes • Path Expressions and Filters: • A query is essentially a navigation in element trees • Navigation and filters modify the search context • Query result is the last search context • Selection of nodes by: • Element- and attribute name • Type (element, attribute, comment, etc.) • Content or value of nodes • Relationship between nodes: hierarchy, sequence, index • Combination by: union, intersection

  7. XQL 98 Examples • Selection and Extraction: • all books by P.K. Dick//book[author=„P.K. Dick“] • Reduction: • drop all but 1st author//*?/book?/(isbn | author[0] | title) • * matches all elements along paths to book • shallow return operator (?) retains nesting hierarchy • union preserves document order (title before author)

  8. XQL 98 lacked: • Selection Functionality • comparison operators for fulltext (in progress) • regular path expressions for hierarchy (only // for recursive descent and * for matching all nodes in a search context) • Restructuring • Suggestions: return operators (SAG), XSLT (W3C), Application Level (e.g. WebMethods) • Combination • joins; Suggestions: see below • Graphs • no navigation along ID/IDREF • no multi-documents (dereferencing URIs) • Suggestions: docref, ref, keyref, idref • Delegation • external functions • wrappers

  9. Extended XQL Examples • Combination: • combine all books with customers via isbn$root//*?/book?[$i:=isbn]/ (* | $root//customer?[boughtbooks/isbn=$i]) • New concepts • combination with nodes outside of search context ($root//review) • correlation variables for expressing join predicate [$i:=isbn] • $root used for clarity... • Irregular structure of bookstore is preserved • Multidocuments/Delegation: • get multiple bookstores from a bookmark list (HTTP-GET)docref('http://www.bookstores')/docref(.//@href)//bookstore • the same with a form (HTTP-POST - simplified!)docref ('http://www.bookstores/search.cfm',‘country',‘us')//bookstore • the same with a wrapper (application program delivering XML)wrapper(„bookstore“)//bookstore

  10. Towards a Datamodel for querying XML <document> <person id=“jonathanr"> <firstname>Jonathan</firstname> <lastname>Robie</lastname> </person> <person id=“joel"> <firstname>Joe</firstname> <lastname>Lapp</lastname> <!-- ... --> <document> person person article ? ? author author firstname firstname lastname lastname title year Jonathan Robie Joe Lapp XQL for Dummies 1999 W3C-DOM: Element Tree XML Serialization: Structured Text OEM: Graph ? ? document document.persondocument.person.@iddocument.person.@id.“joel"document.person.firstnamedocument.person.firstname.“Joe"document.person.firstname.“Lapp"document.persondocument.person.@id ... Relational Tables (generic massive join option) Locators: Lists of Paths

  11. Locators for Bookstore bookstore#1 bookstore#1.fiction#2 bookstore#1.fiction#2.sci-fi#3 bookstore#1.fiction#2.sci-fi#3.book#4 bookstore#1.fiction#2.sci-fi#3.book#4.isbn#5 bookstore#1.fiction#2.sci-fi#3.book#4.title#6 bookstore#1.fiction#2.sci-fi#3.book#4.author#7 … bookstore#1.fiction#2.fantasy#8 bookstore#1.fiction#2.fantasy#8.mistery#9 bookstore#1.fiction#2.fantasy#8.mistery#9.book#10 bookstore#1.fiction#2.fantasy#8.mistery#9.book#10.isbn#11 bookstore#1.fiction#2.fantasy#8.mistery#9.book#10.isbn#11.title#12 bookstore#1.fiction#2.fantasy#8.mistery#9.book#10.isbn#11.author#13 ...

  12. Locators <-> XML Serialization • Locators are lists of paths • XML-document->Locators • each element-node gets id in document-order (depth first, left to right traversal) • each element-node is located by the entire path from root • attributes are attached to element-nodes • content is attached to leave-nodes • Locators->XML-document: • clean up: discard locators $prefix which are followed by at leastone locator $prefix.$postfix • generate tree(1) for all locators generate nested serialization(2) fill up with content and attributes • Mappings should be total, 1:1

  13. Locator Sets vs. Relations • Commonalties • flat sets • identity defined by identity of components • concatenation to derive new locators/tuples • Differences • arity • locators: variable length • tuples: fixed • access to components: • locators: by navigation • tuples: by position/attribute • data: • locator components: document nodestuples components: values

  14. Locator Algebra (0)

  15. Locator Algebra (1) • Preliminaries • L domain of locator sets • x, y • PL domain of locators • u, v • tail(u) … last component of uprefix(u) … u - tail(u) • Tree-Operators • navigation in document tree using DOM methods • root, parent, children: PL  L • applied to locator sets from L using d-join (see below) • Set-Operators • , , -: L  L  Ldefined as usual • order preservation due to total ordering on document nodes

  16. Locator Algebra (2) • Select • select[p]: L  L, where p: PL  Booleanselect[p](x) = {u | u  x, p(tail(u))} • Example: select[nodename(.) = “book”](x) =select[“book”](x) • Return • Corresponds to projectduplicates tail of locator for preserving it insubsequent d-join (see below) • return: PL  PLreturn(u)=concat(u, tail(u))

  17. Locator Algebra (3) • Dependent-Join: • d-join[f]: L  L, where f: PL  Ld-join[f](x) = u  x concat(prefix(u),f(tail(u)) • Example: return all titles of books in their book contextselect[“title”](d-join[children(.)] (select[“book”](d-join[return(children(.))](x)) =/book?/title • Kleene Star: • fixpoint-operator for recursive descent queries • *[f]: L  L, where f: L  L*[f](x) = f(x)  *[f](f(x)) • Example: select all titles in their original contextselect[“title”](d-join[children(.)] (*[d-join[return(children(.)](.))](x))=//*?/title • maybe too general for physical algebra

  18. Locator Algebra (4) • Varbind, Varget • to realize joins across contexts • varbind[i,f]: L  L, where i  Name, f: PL  Lvarbind[i,f](x):for all u  x: vars(u):=vars(u)  vf(tail(u))<i,v> • varget[i]: PL  Lvarget[i](u): {v | (i,v)  vars(u)}

  19. Join Example (1) $D=varbind[$i,select[“isbn”](children(.))]($B)= //*?/book[$i:=isbn]? bc#0 $A=*[d-join[return(children(.))](.)](x)= //*? bc#0.bs#1.f#2.sf#3.b#4<$i,isbn#5> bc#0.bs#1.f#2.fa#8.mi#9.b#10<$i,isbn#11> ... bc#0.bookstore#1 bc#0.bookstore#1.fiction#2 bc#0.bookstore#1.fiction#2.sci-fi#3 ... $E=select[“customer”](d-join[children(.)] (*[d-join[return(children(.))](.)](d-join[root(.)]($D)))=//*?/customer customers#14.customer#15 customers#14.customer#20 $B=select[“book”](d-join[return(children(.))]($A))= //*?/book $F=d-join(select[ select[“isbn”](d-join[children(.)] (select[“boughtbooks”](d-join[children(.)](.)))= = varget[$i](.)](“$E”)]($D)= //*?/book[$i:=isbn]?/ (//*?/customer[boughtbooks/isbn=$i]) bc#0.bs#1.f#2.sf#3.b#4 bc#0.bs#1.f#2.fa#8.mi#9.b#10 ... $C=d-join[return(children(.))]($B)=//*?/book?/* bc#0.bs#1.f#2.sf#3.b#4.cs#14.customer#20 bc#0.bs#1.f#2.fa#8.mi#9.b#10.cs#14.customer#15 bc#0.bs#1.f#2.fa#8.mi#9.b#10.cs#14.customer#20 bc#0.bs#1.f#2.sf#3.b#4.isbn#5 bc#0.bs#1.f#2.sf#3.b#4.title#6 ...

  20. Join Example (2) <fantasy> <mystery> <book> <isbn>0261102362</isbn> <title>The two towers</title> <author>JRR Tolkien</author> <customers> <customer> <name>Jason Woolsey</name> <boughtbooks> <isbn>0261102362</isbn><isbn>0593488321</isbn> </boughtbooks> </customer> <customer> <name>P.W. Ellis</name> <boughtbooks> <isbn>0006482805</isbn> <isbn>0261102362</isbn> </boughtbooks> </customer> </customers> </book> </mystery> </fantasy> </fiction></bookstore></books_and_customers> • <books_and_customers><bookstore> <fiction> <sci-fi> <book> <isbn>0006482805</isbn> <title>Do androids dream of electric sheep</title> <author>Philip K. Dick</author> <customers> <customer> <name>P.W. Ellis</name> <boughtbooks> <isbn>0006482805</isbn> <isbn>0261102362</isbn> </boughtbooks> </customer> </customers> </book> </sci-fi>

  21. Some Equivalence Transformations for L’Algebra • Commutativity: • union(A,B) = union(B,A) (within single document) • but d-join is not commutative • Associativity: • union, intersect, d-join • Idempotence: • union(A,A) = A • Distributivity: • //book/(title | author) = //book/title | //book/author • Neutral Elements: • union: {} • d-join: $root(?)

  22. Open Issues • Combination with relational algebra • Graphs/Multidocuments • DAGs: Multiple paths from root-context to node (serialization?) • Role of URIs in locators? • Typing • Role of XSD (XML Schema Description) • Inference • Constructors • attribute to element and vice versa…. • Grouping, Skolems • Details • Investigate conformance of locator concept to W3C Infoset • Constraints on locators/mappings to guarantee wellformedness • Political • XQL-Implementations shipping:underlying semantics node-based, not locator-based

  23. The IPSI XML Brokering Framework Visualization HTML, CSS URL+Queries XSL Processor XQL XML Queries Server (HTTP, URL) XQL XML Program Queryprocessor: XML Query Language (XQL) DOM Persistent DOM Warehouse Datamodel: Document Object Model (W3C-DOM) HTTP/HTML Roboter Generic Wrappers JEDI Framework Specific Wrappers

  24. Wrappers • Jedi Framework for Wrappers • Pivot Object Model • Scripting language for control-flow • Access to dynamic sources (ODBC, CORBA) with iterators • Generic Wrappers • Generic Mapping of structured formats to XML • Examples: SGML,XML, HTML, MS-RTF • Jedi Parser • for irregularily formatted sources • context free, attributed grammars • fault-tolerant, efficient parser: unlimited lookahead, interpretation of ambiguous, incomplete grammars by specificity ordering • HTTP-Access • Access plans for delegation integrated with XQL Engine

  25. Mediator: XQL Engine + Persistent DOM • XQL 98 Implementation • efficient recursive descent queries by signature-index • + Joins • + Multi Document Handling • extends XQL with external references (via http-get, http-post) • Multidocument DOM; for every node namespace and URI • + User defined functions • input: context (reference-node-set, reference-node-pointer), parameters: constants, XQL-expressions (lazy evaluation) • output: node-functions, collection-functions (set of nodes), comparison-operatorscan attach base-URIs • variables

  26. <golfplatzid="platz0001"> <adresse> [...] </adresse> <policy> ... </policy> <handicap> <wochentag>34</wochentag> <wochenende>34</wochenende> </handicap> </golfplatz> <www.wetter.de> <wetter> <plz>87724</plz> <datum>981001</datum> <temperatur>16</temperatur> <regen>90</regen> <wind>9</wind> <prognose>13</prognose> </wetter> <!-- ... --> <www.wetter.de> <www.reiseplanung.de> <route> <von>53757</von> <nach>93333</nach> <entfernung>481.9</entfernung> <fahrzeit>274</fahrzeit> <karte>5375793333.gif</karte> </route> <!-- ... --> </www.reiseplanung.de> Application 1: An XML Broker for Golfers <golfdemo <golfplatz> <adresse> ... </adresse> <greenfee> ... </greenfee> ... </golfplatz> <wetter> ... </wetter> <route> ... </route> </golfdemo> XSL XML Broker Query

  27. Application 2: RELIMO Integrating Bioinformatics Data XML Application (e.g. Office 2000) XML Browser (e.g. Mozilla 5) XSL Formatter (e.g. Lotus-XSL) XML Broker RELIBASE with XML RPC PDB as local PDOM

  28. Application Data • XML Broker for Golfers • Sources: www.golffuehrer.de (500 KB), www.wetter.de (200 KB), www.routen-information.de (200 KB) • Joins (via zip-code) ~ 2 to 3 secs • RELIMO (Germany) • Sources: Relibase (XML-RPC), PDB (5 GB -> 25 MB XML, 30 MB PDOM) • response time (100 MB) 50 to 30000 ms • MIROWEB (ESPRIT) • JEDI for importing several sources to Oracle 8 • Shakespeare • all plays • 10 MB (Tests with duplicated data up to 0.5 GB)

  29. Some Links & Acks • XQL FAQ • http://metalab.unc.edu/xql/ • IPSI XML Research & Development • http://xml.darmstadt.gmd.de • XQL-Engine 1.0.1 download (non-commercial use) • JEDI download (non-commercial use) • XML Brokering Framework Licensing Info (Infonyte) • hemmje@globit.com • www.infonyte.com • Many thanks to • Karl Aberer, Harald Schöning, Guido Mörkotte

More Related