430 likes | 528 Views
On the Minimization of XPath Queries. Paper by S. Flesca, F. Furfaro, E. Masciari CmpE 521 Presentation Emre Yurtsever – 2002701372 Muammer Yüzügüldü – 2003700183. Outline. Introduction Trees & Tree Patterns Problem Statement A Framework for minimizing X P ath queries
E N D
On the Minimization of XPath Queries Paper by S. Flesca, F. Furfaro, E. Masciari CmpE 521 Presentation Emre Yurtsever – 2002701372 Muammer Yüzügüldü – 2003700183
Outline • Introduction • Trees & Tree Patterns • Problem Statement • A Framework for minimizing XPath queries • Complexity Results • Tractability Results • Conclusions & Future Works
Introduction • XML Queries are usually expressed by means of XPath expressions. • XPath expressions. • A way of navigating an XML tree to return the set of nodes through the paths specified by the expression.
<bib> <book> <book> <authors> <title> <editor> <title> <genre> <editor> <genre> <author> <author> Introduction – cont’d • An XPath expression can be represented graphically as a tree pattern. An XML Tree...
<bib> <book> <author> <title> Introduction – cont’d • For example; “find the titles of all the books for which at least one author is known.” • XPath Expression: bib/book[//author]/title Descendant Edge Output Node A tree pattern
Introduction – cont’d • Efficiency of XPath expression depends on size. • Optimization ~ Minimization • We should minimize the expression.
Introduction – cont’d • Example Query : “Retrieve the editors that published thrillers and whose authors have written a thriller.” • “query containment”. • Reduced Query : “Retrieve the editors that published thriller.”
Introduction – cont’d • Minimization problem for XPath fragments can be efficiently solved as: • It can be reduced to solve a number of instances of containment between pairs of tree patterns. • For these fragments, it can be reduced to find a homomorphism between them.
Trees & Tree Patterns • A tree t is a tuple (rt, Nt, Et, t) where; • Nt ℕ, set of nodes. • t : Nt is a node labelling function. • rt Nt is the distinguished root of t. • Et Nt x Nt, set of edges.
Trees & Tree Patterns – cont’d • Given a tree t = (rt, Nt, Et, t) • Tree t’ = (rt’, Nt’, Et’, t’) is the subtree if : • Nt’ Nt; • The edge (ni, nj) belongs to Et’ iff ni Nt’,nj Nt’ and (ni, nj) Et.
Trees & Tree Patterns – cont’d • Definition : A tree pattern p is a pair (tp, op), where: • tp= (rp, Np, Ep, p) is a tree. • Ep is partitioned into the two disjoint sets Cp and Dp denoting, respectively, the child and descendent branches; • op Np is a distinguished output node.
Trees & Tree Patterns – cont’d • Grammar for XPath expressions : exp exp | exp/exp | exp//exp | exp[exp] | | * | . where is a symbol in , and the symbol ‘.’ stands for the current node. • Given XPath expression; a[b/*//c]//d a b d * c
Trees & Tree Patterns – cont’d • Given a tree t and a tree pattern p, an embedding e of p into t is a total function e : Np Nt, such that: • e(rp) = rt, • (x; y) Cp, e(y) is a child of e(x) in t, • (x; y) Dp, e(y) is a descendant of e(x) in t, and • x Np, if p(x) = a (where a *) then t(e(x)) = a.
Trees & Tree Patterns – cont’d • Models and Canonical Models of Tree Patterns • The models of a tree pattern p defined over the alphabet are the trees of T which can be embedded by p. The set of models of p is Mod(p) = {t T | p(t) } • Canonical models of a tree pattern p are models having the same shape as p. That is, a canonical
Trees & Tree Patterns – cont’d Model and Canonical Model of a tree pattern
Trees & Tree Patterns – cont’d • Given two tree patterns p1, p2, we say that p1 is contained in p2 (p1 p2) iff t p1(t) p2(t). • We say that p1 and p2 are equivalent (p1 p2) iff p1 p2 and p2 p1 (i.e. t p1(t) = p2(t)). • The set of patterns which are equivalent to a given pattern p will be denoted as Eq(p).
Trees & Tree Patterns – cont’d • Notations on tree patterns. A pattern p and its subpatterns spb, spd, spa
Trees & Tree Patterns – cont’d • Tree pattern p whose root has 2 children Subpattern examples
Problem Statement “Given a tree pattern p, construct a tree pattern pmin which is equivalent to p and having minimum size (i.e. size(pmin) = minsize(p))”
Problem Statement– cont’d • a minimum size tree pattern equivalent to p can be found among the subpatterns of p; • the containment between two tree patterns p, q (p q) is equivalent to the problem of finding a homomorphism from q to p. A homomorphism h from a pattern q to a pattern p is a total mapping from the nodes of q to the nodes of p such that: • h preserves node types (i.e. u Nqq(u) `*' ) q(u) = p(h(u))); • h preserves structural relationships (i.e.whenever v is a child (resp. descendant) of u in q, h(v) is a child (resp. descendant) of h(u) in p).
Problem Statement– cont’d A homomorphism between two tree patterns
Problem Statement– cont’d Two tree patterns not related with homomorphism
A framework for minimizing XPath Queries • Two fundamental contribution • Proving that property 1 holds for XP{/, //, [], *} • An algorithm for minimizing a tree pattern query
Proving that Property holds for XP{/, //, [], *} • Various lemmas are introduced • Lemma 1 : Let p and q be two patterns with root r, such that p contained in q. Then, for each subpattern Qj element of P(q) there exists a subpattern Pi element of P(p) s.t Pi contained in Qi.
Proving that Property holds for XP{/, //, [], *} • Lemma 2 : Let p and be two patterns rooted in r s.t p=q and let m and n, with m>n, be the number of children of r in p and, respectively, q. Then, there exist a set S subset of SP(p) consisting of m-n subpatterns spi, such that p-S = p
Proving that Property holds for XP{/, //, [], *} • Lemma 3 : Let p and q be two equivalent patterns rooted in r having the same number of child and descendant nodes of r, and let q be of minimum size. Then, there not exists a subpattern spk element of SP(p) such that p – spk = p
Proving that Property holds for XP{/, //, [], *} • Lemma 4 : Let p and q be two eqivalent patterns whose roots have the same number of child and descendant nodes, and let q be of minimum size. For each subpattern Pi element of P(p) there exists a unique subpattern Qj element of P(q) directly connected to rq s.t piqj
Proving that Property holds for XP{/, //, [], *} • Lemma 5 : A pattern p in XP {[], /, //, *} is not of minimum size iff at least one of the following conditions hold: • there exists a pair of subpatterns pi, pj s.t pi contained in pj; • there exists a subpattern pi of p which is not of minimum size.
Proving that Property holds for XP{/, //, [], *} • Theorem 1 : Given a pattern p in XP{/, //, [], *} if minsize(p) = k then there exists a subpattern pmin of p such that p= pmin and size(pmin)=k
An Algorithm for tree pattern minimization Function Minimize(p:a tree pattern):pmin a minimum tree pattern equivalent to p Begin pmin = p; For each pi element of P(pmin) do if (pmin -spi contained in pmin) pmin = pmin – spi; SPnew = 0; For each spi element of SP(pmin) do SPnew = SPnew + Minimize(spi); pmin = assemble (pmin, SPnew); return pmin; End
Upper Bound • Algorithm 1 works in O(b*r*(|p|^2)*((w+1)^(d+1))) • |p| is the size of p • d is the number descendant edges in p • w is one the longest chain of ‘*’ in p • b is the number branches of p as b • r is the maximum degree of any node of p
Complexity Results • In XPath{/, //, [], *} it is not possible to define an algorithm performing much better than Algorithm 1 • Lemmma 6: Let p be a pattern in XP{/, //, [], *} and k is possitive integer. The problem of testing if minsize(p)>k is NP-complete problem
Complexity Results • Theorem 2 : Let p be a pattern in XP{/, //, [], *} and k a positive integer. The problem of testing if there exists a pattern p’ equivalent to p such that size(p’)<= k is coN-complete
Tractability Results • Definition : A limited branched tree pattern p is a tree pattern in XP{/, //, [], *} such that: • Every non leaf node of p may have any number of children; • If a node n has k children n1...nk, then at least k-1 of the patterns spn, (where i element [1...k]) are linear.
b r b a b d * a c b * Example
Tractability Results • Theorem : Let p a limited branched tree pattern. A minimum pattern pmin equivalent to p can be found in polynomial time. (w.r.t. The size of p) • Linear patterns have minimum size • The containment between pairs of linear patterns can be decided in polynomial time.
Tractability Results • Algorithm 2 Function Minimize(p:a boundend branched tree pattern):pmin a minimum tree pattern equivalent to p Begin pmin = p; B = {b1, ...., bm}; while(B != 0) b = deepest(B); q = spb; Redq = 0; For each pi element of P(q) do For each qj of P(q) do if ((i!=j) ^ (qi is linear) ^ (qj not element of Redq) ^(qj contained in qi)) Redq = Redq + qi; q = q – Redq; pmin = replace(pmin, sqb, q); end while return pmin; End
Conclusion • It has been proved the global minimality property, a minimum tree pattern equivalent to a given tree pattern p can be found amoung the subpatterns of p, and thus obtained by prunning “redundant” branches from p. • It has been characerized the complexity of the minimization problem, showing that the corresponding decisional problem is coNP-complete.
Conclusion • It has been studied a “tractable” form of tree pattern which can be minimized in polynomial time. • It has been provided by somealgorithms proposed in the paper.
Future Works • Extending minimization framework to deal with XPath queries that must satisfy some constraints such as join conditions on tree pattern nodes. • The introduction of these constraints makes the minimization problem harder, and global minimality property does not hold.
Questions? Thank you...