220 likes | 304 Views
Symmetrically Exploiting XML. Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA The 15 th International World Wide Web Conference May 2006 Edinburgh, Scotland. 1970’s Database Controversy.
E N D
Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA The 15th International World Wide Web Conference May 2006 Edinburgh, Scotland
1970’s Database Controversy • Hierarchical model vs. relational model • Codd: symmetric exploitation of data • part/project works on some, but not all • Path expressions are asymmetric • Currently, all XML query languages use path expressions Part Project Commit Project Part Project Part Symmetrically Exploiting XML: Zhang, Dyreson
author name book book E. F. Codd title publisher price title publisher price 9.99 Automata 46.95 DB Addison Wesley Academic Press Querying Data with Path Expressions • Task • Find books by E. F. Codd • XQuery • return doc("author.xml")//author[name= 'E. F. Codd']/book Symmetrically Exploiting XML: Zhang, Dyreson
Same Data, Different Structure author book book name book book price publisher title author price publisher title author E. F. Codd DB 46.95 Automata 9.99 title publisher price title publisher price name name Addison Wesley Academic Press 9.99 Automata 46.95 DB Codd E. F. Codd Addison Wesley Academic Press • Same task • Find books by E. F. Codd • Need different XQuery • return doc("book.xml")//book[author/name='E. F. Codd'] Symmetrically Exploiting XML: Zhang, Dyreson
Goal • Make same query work on different structures • Useful when there is • lack of schema knowledge • heterogeneous data • irregular data • schema evolution • Factor off problem of different label sets, others are working on it Symmetrically Exploiting XML: Zhang, Dyreson
Existing Axes are Directional ancestor self preceding following descendent Symmetrically Exploiting XML: Zhang, Dyreson
Proposal: A Non-directional Axis ancestor self preceding following descendent Symmetrically Exploiting XML: Zhang, Dyreson
Proposal: A Non-directional Axis ancestor self preceding following descendent Symmetrically Exploiting XML: Zhang, Dyreson
Proposal: A Non-directional Axis ancestor self preceding following descendent Symmetrically Exploiting XML: Zhang, Dyreson
The Closest Axis • Syntax • closest:: • ->name is abbreviation for closest::name • Semantics • a function that takes a context node and returns a sequence of closest nodes Symmetrically Exploiting XML: Zhang, Dyreson
Closest Axis of the First Title • closest::* • Returns a list of five nodes • closest::price • Returns the first price node author name book book title publisher price title publisher price Symmetrically Exploiting XML: Zhang, Dyreson
When the First Book Lacks a Price • Node selection restricted by minimal type distance • The minimal distance between a title and a price is 2 • closest::price • Returns an empty list author name book book title publisher title publisher price Symmetrically Exploiting XML: Zhang, Dyreson
Type Distance is Crucial • closest::name for each book? • Root-to-node path type • author/name • author/book/publisher/name author name book book title publisher title publisher price name Symmetrically Exploiting XML: Zhang, Dyreson
Query Result#1 Query Result#2 Querying with the Closest Axes Same query -- return doc("any.xml")->author[->name='E. F. Codd']->book Closest axis-enabled XQuery evaluation engine Result#3 Query Symmetrically Exploiting XML: Zhang, Dyreson
Querying with Directional Axes Query#1 -- return doc("author.xml")//author[name= 'E. F. Codd']/book Result#1 XQuery evaluation engine Query#2 -- …… Result#2 Result#3 Query#3 -- return doc("book.xml")//book[author/name='E. F. Codd'] Symmetrically Exploiting XML: Zhang, Dyreson
In-memory Implementation • Naïve approach • Compute Closest for every node • Time complexity is O(sn2) • s: number of labels in the signature • n:number of nodes • Converting to a path expression Find the closest price for title Non-directional expression closest::price Directional (path) expression parent::*/child::price author book name title publisher price Symmetrically Exploiting XML: Zhang, Dyreson
Experiment • Compare directional vs. nondirectional for $b in doc("bib.xml")//title/closest::publisher return $b for $b in doc("bib.xml")//title/..//publisher return $b • Implemented closest in eXist (an XML DBMS) Symmetrically Exploiting XML: Zhang, Dyreson
Persistent Implementation • Take advantage of type indexes • LCA-join • Every Closest pair related via an LCA • Idea is to merge lists of types • O(sn) Symmetrically Exploiting XML: Zhang, Dyreson
Related Work • Data integration • TSIMMIS • Garcia-Molina et al. (Journal of Intelligent Information Systems 1997) • YAT • Christophides, Cluet, Simèon (SIGMOD Record June 2000) • Silkroute • Fernandez, Tan, Suciu (WWW 2000) • LCA-related techniques • Schmidt, Kersten, Windhouwer (ICDE 2001) • Cohen, Mamou, Kanza, Sagiv (VLDB 2003) • Li, Yu, Jagadish (VLDB 2004) Symmetrically Exploiting XML: Zhang, Dyreson
Related Research Projects • XML Restructuring • Zhang, Dyreson (IIWeb 2006) • XML Compaction • Zhang, Dyreson, Dang (DASFAA 2006) • Common theme – symmetric exploitation! Symmetrically Exploiting XML: Zhang, Dyreson
Conclusion • Current XQuery depends on path expressions • A path expression is directional (asymmetric) • May break down if structure changes • The closest axis is non-directional (symmetric) • Simple in syntax • Can be easily integrated in XQuery • Can be implemented efficiently • In-memory • Persistent Symmetrically Exploiting XML: Zhang, Dyreson