1 / 35

Containment and Equivalence for an XPath Fragment

Containment and Equivalence for an XPath Fragment. By. Gerome Mikla. Dan Suciu. Presented By. Roy Ionas. SEMINAR OBJECTIVES. PRSENTING THE PROBLEM OF NON POLYNOMIAL COMPLEXITY FOR CONTAINMENT AND EQUIVALENCE OF XPath FRAGMENTS.

jered
Download Presentation

Containment and Equivalence for an XPath Fragment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Containment and Equivalence for an XPath Fragment By Gerome Mikla Dan Suciu Presented By Roy Ionas

  2. SEMINAR OBJECTIVES • PRSENTING THE PROBLEM OF NON POLYNOMIAL COMPLEXITY FOR CONTAINMENT AND EQUIVALENCE OF XPath FRAGMENTS. • PRESENTING TWO ALGORITHMS THAT IMPROVE THE COST OF XPATH CONTAINMENT AND EQUIVALENCE PROBLEM. • PRESENTING TREE PATTERNS AS AN EFFECTIVE TOOL FOR PROVING IN XPATH FRAGMENTS.

  3. SO WHAT IS XPath? • A simple language for navigating XML documents and selecting a set of nodes • With XPATH we can query XML data , describe key constraints , express transformations and reference elements in remote documents. • We can find XPath influence in other XML query languages and features such as XQuery , XSLT , XML schema , XLink , XPointer and more...

  4. DEFINTIONS • Simple XPath fragment. • Containment between two XPath fragments. • Equivalence between two XPath fragments. • Computability definitions. • Tree patterns as a proving tool for XPath fragments.

  5. Simple XPath fragment • An XPath statement. • Contains three most important features for navigating: • Child and descendant axis. “//” “/” • Wildcards. “*” • Qualifiers. “[]” • We disregard attributes , conditions... • We identify and compare nodes only by their label. • We disregard order completely. • Example: a//*[b//d][c]

  6. Simple XPath fragment • Are these all the features we have in XPath??? • Are these all the features we need for representing navigation in XML documents ? NO!!!!! YES!!!!! At least these are the needed ones for the proof of this article.

  7. Containment • The meaning of Containment between two XPath’s fragments A and B is that for every XML document the result of applying XPath A will be contained in the result of applying XPath B. • Result is stated as a Set of nodes and does not consider order. • Can we apply this containment on the entire XML documents world?? • Is there another way to determine containment between two XPath fragments???

  8. Equivalence • The meaning of Equivalence between two XPath fragments A and B is that for every XML document the result of applying XPath A will equal to the result of applying XPath B. • The problem of Equivalence can be reduced to the problem of Containment • Equivalence = containment in both ways between patterns. • Containment can be computed with an algorithm that computes equivalence and runs in polynomial time. • From now we will mention only the problem of containment and the results will be valid as well for equivalence.

  9. Computability Definitions • NP - stands for “Nondeterministic-Polynomial". • P class - A class of mathematical problems for which an efficient solution has been found , which is solvable in polynomial time. • NP class - A class of mathematical problems which most likely has Exponential Complexity, for which no efficient solution has been found (yet), which is not solvable in polynomial time. • NP hard problem - a problem that can be reduced from each NP problem ( even worst than NP… ). • NP complete problem – a problem which belongs to the NP class of problems and is a NP hard problem by itself.

  10. Tree Patterns • An unordered tree over the alphabet of the XPath. • XPath nodes are marked as nodes in the tree pattern. • Child axis are marked as edges. • Descendant are marked as edges with double lines. • K-tuple of nodes called the result type. • For a tree pattern P The arity of the result tuple is called the of arity of P. • Pattern tree P is Boolean iff its arity is 0.

  11. Tree Patterns • Tree patterns are more elegant and general than XPath fragments. • We can reduce from XPath to Tree Patterns and via versa quite easily. Now we can prove attributes using the graph theory.

  12. Tree Pattern - example • For the Xpath expression : • a//*[b//d][c] will be the next tree root a wildcard * child c b descendant d

  13. Usage of Tree Patterns for navigating in XML trees • Embedding from Tree pattern to XML tree. • Imagine it as a function that must: • preserve root. • Respects node labels. • Respects edge relationships. • After embedding return the information from the nodes marked as return nodes and down. • For Boolean Patterns return true if such an embedding exists.

  14. Example for embedding a a s * t b c b c d d

  15. PROBLEM…. • Testing Containment between two XPath fragments is a NP complete problem. • Can be proven by a reduction from the 3CNF Co-NP class to our class.

  16. Do We really care about it??? • In almost all the applications we described so far. • Inference of keys. • Optimization of XPath queries. When do we need to test for containment or equivalence between fragments? I guess we care...

  17. Solving the problem • Finding an algorithm that will be both efficient and complete for this problem is quite difficult ( like proving P = NP ). • Finding an algorithm which is efficient but not complete. • Finding an algorithm that is complete but not always efficient.

  18. First solution : Pattern homomorphism

  19. Pattern Homomorphisms - definition • An homomorphism h between two tree patterns p,p’ is a function h:Nodes(p) -> Nodes(p’) that maintains the following conditions: • Root preserving. • For each x in p h(x) in p’ is x or *. • Child and descendant relations preserving. • Finding weather a homomorphism between two patterns exist has many efficient algorithms. • The algorithm is sound. Whenever there exists homomorphism between tree patterns p and p’ than p  p . • The existence of homomorphism is always a sufficient condition for containment. • But is it a necessary condition?

  20. Example for homomorphism a a h(a) = a h(b) = * * b c

  21. Homomorphism is not a complete solution for containment • A Homomorphism between the two tree patterns does not exist even though they are equivalent. a a * * b b

  22. Cases where homomorphism applies • Fragments contain only *,[] • Fragments contain only //,[] • Fragments that contain all three but can be translated to an expression that belongs to one of the above without changing the semantic.

  23. Conclusion for homomorphism • Sound. • Efficient. • Incomplete. Now we aim searching over an algorithm which will be sound and complete and may be efficient in several cases.

  24. ALGORITHM FOR CONTAINMENT

  25. Containment between regular languages • Reducing the problem of containment between two XPath fragments to containment between two regular languages by translating from Tree Pattern to an automata. • The algorithm is complete , with defined rules we can translate completely from automata to Tree Pattern and via versa.

  26. Automata for XPath fragment • Defined on ranked trees. • Bottom up structure. • Only the root is an accepting state. • The initial states are the leaves of the tree. • The transitions are of the form:(q1,q2,…,qn;a) -> q

  27. definitions • FTA - finite tree automata, an automata that contains set of states and transitions of the form described. • FTA can be deterministic - DFTA. • Each FTA A with Q states can be translated to a DFTA B with maximum of 2Q states . • AFTA - alternating finite tree automaton extends the definition of FTA by adding “AND transitions” of the form (q1,q2,…,qm)->qi. • A DFTA can be built as well for AFTA without increasing the cost of determinisiting the automata.

  28. The entire algorithm • Construct the DFTA A accepting the “regular expressions of P” • Construct the AFTA A’ accepting the regular expressions of P’ ” • Compute the AFTA B=A x A’ • compute the DFTA C=Det(B) • if lang(A) lang(C) the return true else return false.

  29. r r ? a  a * b b a b * b

  30. Step 1:Building FTA A from Tree pattern p • States(A) = Nodes(p). • For each node x with children x1,…,xk we add a transition (x1,x2,…;x) -> x • For each descendant edge e from node x to node y we add (y;e)->x. we add internal circle (y,*) -> y • The terminal state will be only the root.

  31. Example for building FTA r r a a * b b * b a a b b b

  32. Step 2:Building an AFTA A’ from pattern p’ • States(A’) = Nodes(p’)  Edges(p’) • (q,a) -> for every symbol a that has out coming edge e. if it is a descendant relationship than we also add an internal circle to the source node. • (e1,e2,e3..) -> a for every a that has incoming edges.

  33. Example for building AFTA for pattern p’ r r a a b * b *

  34. Conclusion for the containment algorithm • Sound • Complete. • Not always efficient.

  35. THE END

More Related