1 / 22

Symmetrically Exploiting XML

Symmetrically Exploiting XML. Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA The 15 th International World Wide Web Conference May 2006 Edinburgh, Scotland. 1970’s Database Controversy.

leann
Download Presentation

Symmetrically Exploiting XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA The 15th International World Wide Web Conference May 2006 Edinburgh, Scotland

  2. 1970’s Database Controversy • Hierarchical model vs. relational model • Codd: symmetric exploitation of data • part/project works on some, but not all • Path expressions are asymmetric • Currently, all XML query languages use path expressions Part Project Commit Project Part Project Part Symmetrically Exploiting XML: Zhang, Dyreson

  3. author name book book E. F. Codd title publisher price title publisher price 9.99 Automata 46.95 DB Addison Wesley Academic Press Querying Data with Path Expressions • Task • Find books by E. F. Codd • XQuery • return doc("author.xml")//author[name= 'E. F. Codd']/book Symmetrically Exploiting XML: Zhang, Dyreson

  4. Same Data, Different Structure author book book name book book price publisher title author price publisher title author E. F. Codd DB 46.95 Automata 9.99 title publisher price title publisher price name name Addison Wesley Academic Press 9.99 Automata 46.95 DB Codd E. F. Codd Addison Wesley Academic Press • Same task • Find books by E. F. Codd • Need different XQuery • return doc("book.xml")//book[author/name='E. F. Codd'] Symmetrically Exploiting XML: Zhang, Dyreson

  5. Goal • Make same query work on different structures • Useful when there is • lack of schema knowledge • heterogeneous data • irregular data • schema evolution • Factor off problem of different label sets, others are working on it Symmetrically Exploiting XML: Zhang, Dyreson

  6. Existing Axes are Directional ancestor self preceding following descendent Symmetrically Exploiting XML: Zhang, Dyreson

  7. Proposal: A Non-directional Axis ancestor self preceding following descendent Symmetrically Exploiting XML: Zhang, Dyreson

  8. Proposal: A Non-directional Axis ancestor self preceding following descendent Symmetrically Exploiting XML: Zhang, Dyreson

  9. Proposal: A Non-directional Axis ancestor self preceding following descendent Symmetrically Exploiting XML: Zhang, Dyreson

  10. The Closest Axis • Syntax • closest:: • ->name is abbreviation for closest::name • Semantics • a function that takes a context node and returns a sequence of closest nodes Symmetrically Exploiting XML: Zhang, Dyreson

  11. Closest Axis of the First Title • closest::* • Returns a list of five nodes • closest::price • Returns the first price node author name book book title publisher price title publisher price Symmetrically Exploiting XML: Zhang, Dyreson

  12. When the First Book Lacks a Price • Node selection restricted by minimal type distance • The minimal distance between a title and a price is 2 • closest::price • Returns an empty list author name book book title publisher title publisher price Symmetrically Exploiting XML: Zhang, Dyreson

  13. Type Distance is Crucial • closest::name for each book? • Root-to-node path type • author/name • author/book/publisher/name author name book book title publisher title publisher price name Symmetrically Exploiting XML: Zhang, Dyreson

  14. Query Result#1 Query Result#2 Querying with the Closest Axes Same query -- return doc("any.xml")->author[->name='E. F. Codd']->book Closest axis-enabled XQuery evaluation engine Result#3 Query Symmetrically Exploiting XML: Zhang, Dyreson

  15. Querying with Directional Axes Query#1 -- return doc("author.xml")//author[name= 'E. F. Codd']/book Result#1 XQuery evaluation engine Query#2 -- …… Result#2 Result#3 Query#3 -- return doc("book.xml")//book[author/name='E. F. Codd'] Symmetrically Exploiting XML: Zhang, Dyreson

  16. In-memory Implementation • Naïve approach • Compute Closest for every node • Time complexity is O(sn2) • s: number of labels in the signature • n:number of nodes • Converting to a path expression Find the closest price for title Non-directional expression closest::price Directional (path) expression parent::*/child::price author book name title publisher price Symmetrically Exploiting XML: Zhang, Dyreson

  17. Experiment • Compare directional vs. nondirectional for $b in doc("bib.xml")//title/closest::publisher return $b for $b in doc("bib.xml")//title/..//publisher return $b • Implemented closest in eXist (an XML DBMS) Symmetrically Exploiting XML: Zhang, Dyreson

  18. Persistent Implementation • Take advantage of type indexes • LCA-join • Every Closest pair related via an LCA • Idea is to merge lists of types • O(sn) Symmetrically Exploiting XML: Zhang, Dyreson

  19. Related Work • Data integration • TSIMMIS • Garcia-Molina et al. (Journal of Intelligent Information Systems 1997) • YAT • Christophides, Cluet, Simèon (SIGMOD Record June 2000) • Silkroute • Fernandez, Tan, Suciu (WWW 2000) • LCA-related techniques • Schmidt, Kersten, Windhouwer (ICDE 2001) • Cohen, Mamou, Kanza, Sagiv (VLDB 2003) • Li, Yu, Jagadish (VLDB 2004) Symmetrically Exploiting XML: Zhang, Dyreson

  20. Related Research Projects • XML Restructuring • Zhang, Dyreson (IIWeb 2006) • XML Compaction • Zhang, Dyreson, Dang (DASFAA 2006) • Common theme – symmetric exploitation! Symmetrically Exploiting XML: Zhang, Dyreson

  21. Conclusion • Current XQuery depends on path expressions • A path expression is directional (asymmetric) • May break down if structure changes • The closest axis is non-directional (symmetric) • Simple in syntax • Can be easily integrated in XQuery • Can be implemented efficiently • In-memory • Persistent Symmetrically Exploiting XML: Zhang, Dyreson

  22. Thank You!

More Related