1 / 53

XML Native Query Processing

XML Native Query Processing. Chun-shek Chan Mahesh Marathe Wednesday, February 12, 2003. Topics. XML Indexing “Accelerating XPath Location Steps” Torsten Grust, ACM SIGMOD 2002 XML Query Optimization

Download Presentation

XML Native Query Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML Native Query Processing Chun-shek ChanMahesh Marathe Wednesday, February 12, 2003

  2. Topics • XML Indexing • “Accelerating XPath Location Steps”Torsten Grust, ACM SIGMOD 2002 • XML Query Optimization • “Multi-level Operator Combination in XML Query Processing”Shurug Al-Khalifa and H.V. Jagadish,ACM CIKM 2002

  3. XML Query Languages • XPath • Developed by the World Wide Web Consortium • Version 1.0 became a W3C Recommendation on November 16, 1999 • Version 2.0 is a working draft.

  4. XML Query Languages • XQuery • Developed by the World Wide Web Consortium as well • Currently a working draft

  5. Axes on XPath Tree • There are 13 axes according to the XPath 2.0 Technical Report • Forward Axes • child, descendant, attribute, self,descendant-or-self, following-sibling, following, namespace (deprecated) • Reverse Axes • parent, ancestor, preceding-sibling, preceding, ancestor-or-self

  6. XML Traversal and Storage • Tree-based traversal • Efficient storage is challenging • Especially for relational databases, which deals with tuples and is not designed to handle recursion or nested elements

  7. Proposed Solutions • “Querying XML Data for Regular Path Expressions”Li and Moon, VLDB 2001 • “A Fast Index for Semistructured Data”Cooper, Sample, Franklin, Hjaltason and Shadmon, VLDB 2001 • “DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases”Goldman and Widom, VLDB 1997

  8. Problems withProposed Solutions • Solutions focus on support of / and // location steps. Inadequate support for XPath. • Proposals rely on technologies outside the relational domain.

  9. Author’s Proposal • XPath Accelerator • Works entirely within relational database. • Uses traditional relational syntax for queries. • Benefits from advanced index technologies, such as R-tree.

  10. XPath Tree Traversal • Context Node: starting point of any traversal • Location Steps: syntactically separatedby /, evaluated from left to right • A step’s axis establishes a subset of document nodes (a document region)

  11. XPath Forward Axes • Child • Descendant • Attribute • Self • Descendant-or-self • Following-sibling • Following • Namespace

  12. XPath Reverse Axes • Parent • Ancestor • Preceding-sibling • Preceding • Ancestor-or-self

  13. Sample XML Tree

  14. Encoding XMLDocument Regions • Formula:v/descendant v/descendant v/following v/preceding v/self • Each node appears once in this formula • What are the ways to uniquely identify different nodes?

  15. Numbering Nodes • Grust: Find out preorder and postorder rank posts • Tatarinov: Global, Local, Dewey • Li & Moon: Order-size pairs

  16. Descendants? Ancestors? Preceding? Following? XML Document Regions

  17. XPath Tree Node Descriptor • desc(v) = {pre(v),post(v),par(v),att(v),tag(v)} • window(α,v) ={condition for each field in desc()} • Example:window(child,v) = {(pre(v),∞),[0,post(v)),pre(v),false,*}

  18. XPath Query Windows

  19. XPath Evaluation • Given an XPath expression e, an axis α, and a node v, we can evaluate this: • query(e/α) =SELECT v’,*FROM query(e) v, accelv’WHERE v’ INSIDE window(α,v) • This pseudo-SQL code can be flattened into a plain relational query with a flatn-ary self-join.

  20. XML Instance Loading • Loading XML Instance into the database means mapping its nodes into the descriptor table. • Can use callback procedures described in text to load element nodes into relational table. • Make separate table for element contents.

  21. Potential Issues • Insertion of node • Need to renumber all nodes to reflect changes • Deletion of node • Only need to remove its entry in accelerator table

  22. Node Descriptor Indexing • Efficiently supported by R-trees. • Can also be supported by B-trees.

  23. Example of pre/postrank distribution

  24. Shrink-wrapping the //-axis • Optimizing window for descendant axis • For each node, we need to determine the ranges of pre and post ranks for its leftmost and rightmost leaf nodes. • For any node v in a tree t, we havepre(v) −post(v) + size(v) = level(v) • For a leaf node v’, size(v’) = 0, thereforepre(v’) − post(v’) = level(v’) ≤ height(t)

  25. Shrink-wrapping the //-axis • For the rightmost leaf v’ of node v:post(v) = post(v’) + (level(v’) − level(v)) • Using the previous equations, we have:pre(v’) ≤ post(v) + height(t) • For the leftmost left v’’ of node v, we have a similar result:post(v’’) ≥ pre(v) − height(t) • Can use these formula to shrink windows

  26. Shrink-wrapping the //-axis • Original window{ (pre(v),∞), [0,post(v)), *, false, * } • New window{ (pre(v),post(v)+height(t)], [pre(v)−height(t),post(v)), *, false, * } • Similar techniques can be used to optimize the query windows of other axes.

  27. Shrink-wrapping the //-axis

  28. Finding Leavesin an XML Tree

  29. XPath Traversals with and without shrunk windows

  30. XPath Acceleratorv. Edge Map

  31. R-Tree v. B-Tree

  32. Performance for the ancestor axis

  33. Performance: XPath Accelerator v. EE/EA-Join

  34. Capabilities ofXPath Accelerator • Runs on top of a relational backend to leverage its stability, scalability, and performance. • Supports the whole family of XPath axes in an adequate manner. • To originate XPath traversals in arbitrary context nodes. • Provides the groundwork for an effective cost-estimation for XPath queries.

  35. XML Query Optimization • Macro-level algebra: manipulates sets of trees directly • heavyweight, but more directly expressive • Micro-level algebra: manipulates sets of elements • In both algebra, basic operators are “intuitive” unit operations such as selections, projections, joins and set operations.

  36. XQuery Expression and Pattern Tree

  37. Macro-algebra • A macro-algebra would implement this entire expression as a single pattern-tree based selection operator (to select matching books), followed by a projection operator (to return titles).

  38. Micro-algebra • A micro-algebra would break up the selection pattern into one selection operator per node (e.g. (tag=“book”), (tag=“year” && content > 1995)) and one containment join operator per edge. • Result of sequence of joins would then be projected on the book element, after which its title can be obtained as before.

  39. Query Processing Implementation • Identify lists of candidate elements in the database to match each node in the specified structural pattern. • Find combinations of candidate elements, one from each list, that satisfy the required structural relationships. • Apply any conditions that involve multiple nodes in the structural pattern to eliminate some combinations.

  40. Containment Join • Given two sets of elements U and V, a containment join returns pairs of elements (u,v) such that • uU and vV • u “contains” v • i.e. node u is an ancestor of node v in the tree representation

  41. Containment Join Implementation • Three main options: • Scan the entire database • Use an index to find candidate nodes for one end of the join, and navigate from there • Use indices to find candidate nodes for both ends of the join, and compute a containment join between these candidate sets

  42. Projection Merging

  43. Set Operations • Union compatibility is not an issue. • In the relational world, union compatibility is an important consideration with respect to set operations. • In XML, since heterogeneous collections are allowed, this is not an issue.

  44. Union in XML • Give two pattern trees PT1 and PT2, let PTC be a common component of the two pattern trees such that: • PT1− PTC = PT’1 and PT2 − PTC = PT’2where PT’1 and PT’2 are both trees • Node i in PTC has node j in PT’1 such that edge (i,j) is in PT1, if and only if node i also has some node k in PT’2 such that edge (i,k) is in PT2.

  45. Different PatternTrees and Plans

  46. Micro-operator Merging: New Access Methods • At macro-level, we considered a pattern tree selection as a single heavyweight operator. • At micro-level, the approach is to break up a pattern tree selection into multiple containment join operators.

  47. Performance: Union

  48. Performance: Intersection

  49. Performance byQuery Structure

  50. Parent-Child Join Performance

More Related