1 / 31

On the Inverse rules algorithm

On the Inverse rules algorithm. It is guaranteed to compute the certain answers But, what about its efficiency ? As presented, it computes tuples using views that cannot contribute to the rewriting, and then discards these tuples We show examples, and then how to address the problems.

abril
Download Presentation

On the Inverse rules algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes tuples using views that cannot contribute to the rewriting, and then discards these tuples We show examples, and then how to address the problems lav-iv

  2. Example : A db: parenthood relation par(c, p) A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren A query: Q: q(X, Y) :- par(X, Z), par(Z, Y) // find grandchildren The algorithm inverts the view: par(C, f(C, G)) , par ((f(C,G), G) -: v(C,G) Given n tuples in the view, it produces 2n tuples, then joins, the discards the results that contain f(-,-) The bucket algorithm will spend more time on rewriting, find: Q’(X, Y) :- v(X, Y) And then output the n results lav-iv

  3. Example (university db) : Views: v1(s, c, q, t) :- registered(s, c, q), course(c, t), c>=500, q>=a98 v2(s, p, c, q) :- registered(s, c, q), teaches(p, c, q) v3(s, c) :- registered(s, c, q), q<=a94 v4(p, c, t, q) :- registered(s, c, q), teaches(p, c, q), course(c, t), q<=a97 Query: q(s, p, c) :- registered(s, c, q), teaches(p, c, q), course(c, t), c>=300, q>=a95 Inverting v3: registered(s, c, f(s,c)) -: v3(s, c) This may produce any number of facts for registered, but for this query none can be used – why? lav-iv

  4. v3(s, c) :- registered(s, c, q), q<=a94 q(s, p, c) :- registered(s, c, q), teaches(p, c, q), course(c, t), c>=300, q>=a95 • How should the constraint on q in v3 be represented? Could export it by f(s, c) <=a94 – then notice conflict with f(s, c) >= a95 in query (how is q in the query transformed to f(s,c)?) But, what if the view contained no constraint? • The view must export variables constrained in the query • The query has a join on q with teaches; teaches facts are derived only from other views, so q will be exported as a different function symbol, or as q (which of these here?)  a join will fail (cannot join f1(-,-) with f2(-,-) or a regular variable)  The view must export join variables of the query lav-iv

  5. The factors that determine usability of a view are the same as in the bucket algorithm, but the inverse rules algorithm tries to use all views anyway Solution: compose query with inverse rules, to obtain a new query that uses directly the views Composition: Consider the heads of inverse rules as a db – collection of facts Look for valuations – mapping of query variables that map query atoms to this db Then repalce query goals by views lav-iv

  6. Example : A db: parenthood relation par(c, p) A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren A query: Q: q(X, Y) :- par(X, Z), par(Z, Y) // find grandchildren The algorithm inverts the view: par(C, f(C, G)) , par ((f(C,G), G) -: v(C,G) Two candidate valuation mappings: X  C, Z  f(C,G), Y  G  q(C, G) :- v(C, G), v(C, G) X  f(C, G), Z  ,G, Y  f(C, G)  (assuming we add C=G) q(f(G, G), f(G,G)) :- v(G, G), v(G, G) 2nd is discarded – no function symbols in result Minimization of 1st gives q(C, G) :- v(C, G), same as bucket ‘db’ lav-iv

  7. q(s, p, c) :- registered(s, c, q), teaches(p, c, q), course(c, t), c>=300, q>=a95 registered(s, c, f(s, c)), f(s, c)<=a94 :- v3(s, c) Any valuation that uses this fact must map q  f(s, c) • The constraint f(s, c) <= a94 conflicts with f(s,c)>=a95, but what if there is no constraint to export? • The mapping q f(s, c) cannot be used to map teaches to any fact derived from other views  v3 cannot be used lav-iv

  8. A mapping will fail to define a valuation if • a view does not export a join variable, and does not contain the join (why?) • The view does not export a variable that is constrained in the query (cannot ‘check’ the constraint in the ‘db’) Thus, the results (for a CQ query, possibly with constraints) will be the same as for bucket (assuming it is correct & complete) The amount of work invested will probably be similar Composition can be performed also for Datalog queries, but weeding out useless mappings is more difficult lav-iv

  9. The MiniCon algorithm --- the final one? Motivation Preliminaries The MiniCon algorithm lav-iv

  10. Motivation Previous algorithms: bucket, inverse rules, may be quite expensive to use, especially for systems with many views. The bucket algorithm has a narrow peephole in 1st stage – each bucket is for a single atom  global constraints are treated only in 2nd stage  Many useless combinations may be examined The inverse rules algorithm improved by composition, seems to perform similar work The motivation: find an algorithm that will do more work in preliminary filtering, and will scale up to hundreds of views lav-iv

  11. Preliminaries The idea • Once a view is put in a bucket of a query atom, switch to considering join variables – and find which other atoms are necessarily covered by the view • Along the way, find out also which view head variables need to be equated • Given coverage by views, combine views with disjoint covers Expected gain: • more filtering in the 1st stage, • better representation of information  A smaller number of combinations, reduced number of containment checks in the 2nd stage lav-iv

  12. Example : A db: parenthood relation par(c, p) A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren A query: Q: q(X, Y) :- par(X, Z), par(Z, Y) Bucket : one view in each bucket par(X, Z): {v(X,G)} par(Z, Y): {v(P, Y)} When the two view atoms are combined, a containment check discovers that G=Y containment, & redundancy of 2nd atom Alternative: given par(X, Z):v(X,G), since Z (join var) occurs in 2nd atom of query, add par(Z, Y) to coverage ofv(X,G), with G=Y In 2nd stage, just use v(X, Y) lav-iv

  13. Assumptions, terminology: • CQ queries and views, for now: no constants / constraints in query/views • View definitions use variables different from those in query or other views (disjoint sets of variables) • b(Q) – body atoms of Q, b(V) – body atoms of view V • A mapping from vars(Q) to a vars(V) is interesting only if it maps a non-empty subset of b(Q) to b(V) • Considered mappings always map Q head vars to V head vars – head var preservation – (hvp) • If h maps x in vars(Q) to an existential var in some V, then all atoms of b(Q) that contain x must be mapped to same V: join variable condition --- (jvc) lav-iv

  14. Given Q(X), assume Q’ is a rewriting in terms of views Q’: q(X) :- v1(X1), …, vn(Xn) (some vi, vj may be occurrences of same view v) • Exists containment mapping h from Q to exp(Q’) (satisfies hvp) Let • Gi be the set of atoms of b(Q) mapped to b(exp(vi)) • h/i – h restricted to vars(Gi) Then And Gi satisfies (jvc): if h/i maps x of vars(Gi) to existential variable of vi, then every atom g in b(Q) that contains this atom is in Gi lav-iv

  15. The occurrence of vi in Q’ may have some head variables equated Example : the original head might be vi(A, B, C) the head in Q’ : vi(X, X, Z) These equalities are given by a unique least set of equality constraints Ei (v/E -- the view v, with head variables equated as specified by E) Summary (so far): the containment mapping can be decomposed into “disjoint” components (vi, Ei, h/i , Gi) All we need to do is find such components, then combine them What is the condition for successful combination? Does a combination (s.t. ) ever fail ? lav-iv

  16. To find such components, we must use the given view definitions (variables different from those of Q or exp(Q’)). Answer : a component and its mapping can be expressed as: Here: • hi is a mapping from Q to the given view definition for vi • E’i – the least set of equalities that make hi a good mapping • h’i is a variable renaming E’i and hi depend only on Q and the definition of vi • We can find components mappings from Q to the view defs, then combine & rename, possibly equating more head vars h/i Gi exp(vi(Xi)) hi h’i vi/E’i lav-iv

  17. One more step : A component (vi, Ei, hi , Gi) may be further decomposed into smaller components (vi, Ei1, hi1 , Gi1), (vi, Ei2, hi2 , Gi2) provided • each of Gi1, Gi2 satisfies (jvc), and they are disjoint • Each of Ei1, Ei2 is a subset of Ei, least sets for the mappings hi1, hi2 to be ok When these are combined, Ei1 union Ei2 is augmented with the remaining equalities of Ei Minimal such components: • Easier to find • Can be re-used for different combinations. lav-iv

  18. What is a minimal component? C = (vi, Ei, hi, Gi) is minimal if • hi satisfies (hvp) + (jvc) (assuming the equalities in Ei) • There is no component C1 whose last three components are contained in C’s last three components (at least one is proper containment) A component: minicon (mini containment) description -- MCD The algorithm constructs and combines minimal MCDs lav-iv

  19. The MiniCon Algorithm Minimal MCD Construction Algorithm : For each g in b(Q), each k in each b(vi) Let E(g,k) be the least set of equalities s.t. a mapping h(g,k) from g to k that satisfies (hvp) exists // E(g,k) and h(g,k), if they exist, // are uniquely determined by g, k If E(g,k) and h(g,k) exist find all minimal MCDs that extend them: (vi, Ei, hi, Gi) extends if Ei contains E(g,k), hi contains h(g,k), Gi contains g For the final set of MCDs remove duplicates lav-iv

  20. How do we find minimal MCDs that extend a given mapping? I. Extension to one more query atom, one view atom extend (vi, E, h, g, k) // E equalities on head vars of vi // h: vars(Q)  vars(vi), partial, hvp with E // g in b(Q), k in b(vi) try to extendh to map g to k, with hvp, by adding equalities to E return fail, or the (uniquely determined) E’,h’ (The first step in alg. of previous page is this one, given empty E and h) lav-iv

  21. How do we find minimal MCDs that extend a given mapping? II. Extend repeatedly, as long as needed and successful Given vi, g, k , E(g,k) and h(g,k) : Let C = {(vi, E(g,k), h(g,k), {g}}, MC = {} //C – initial component, (jvc) possibly not satisfied While C not empty • remove some c = (vi, E, h, G) from C • if (jvc) satisifed – put in MC • if not, exists x in vars(Q) s.t. h(x) is existential, g’ that contains x, g’ not in G • for each k’ in b(vi) if extend(vi, E, h, g’, k’) succeeds, put extension in C Remove duplicates from MC lav-iv

  22. Example : A db: parenthood relation par(c, p) A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren A query: Q: q(X, Y) :- par(X, Z), par(Z, Y) MCDs: • 1st query atom, 1st view atom: h(1,1) = {XC, Z P}, E(1.1) ={} need to extend to par(Z, Y), can only map to 2nd view atom MCD: (v, E={}, h={XC, ZP, YG}, b(Q)) • 1st query atom, 2nd view atom: no mapping … The only MCD is the above lav-iv

  23. Comment : In the paper, if (vi, Ei1, hi1, Gi1) and (vi, Ei2, hi2, Gi2) are both minimal extensions, and Gi1 is contained in Gi2, then the 2nd is thrown away (another minimization) I do not know how to explain this optimization, or prove that with it the algorithm is still complete lav-iv

  24. 2nd phase: MCD combination, and variable renaming : A set of MCDs {(vi, Ei, hi, Gi)} is a candidate if For each candidate set: Rename variables : for each view variable y : If hi(x) = y (y a view variable), rename y to x else rename y to a fresh distinct variable Note : if x in domain of both hi, hj , then hi(x), hj(x) are head variables of vi, vj (by def of MCD),  renaming makes them equal lav-iv

  25. Example (cont’d): A db: parenthood relation par(c, p) A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren A query: Q: q(X, Y) :- par(X, Z), par(Z, Y) MCD: (v, E={}, h={XC, ZP, YG}, b(Q)) Rename in v C to X, G to Y Rewriting: q(X, Y) :- v(X, Y) lav-iv

  26. Example : A db: parenthood relation par(c, p) A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren A query: Q: q(X, X) :- par(X, Z), par(Z, X) // I am my own grandpa MCDs: • 1st query atom, 1st view atom: h(1,1) = {XC, Z P}, E(1.1) ={} need to extend to par(Z, X), can only map to 2nd view atom MCD: (v, {C=G}, {XC, ZP}, b(Q)) • 1st query atom, 2nd view atom: no mapping … The only MCD is the above lav-iv

  27. Example : A db: parenthood relation par(c, p) A view: v(C, P) :- par(C, P), par(P, G) // parents where grandparents exist A query: Q: q(X, Y) :- par(X, Z), par(Z, Y) MCDs: • h(1,1) = {X C, Z P}, E(1.1) ={}  MCD A1 = ( v(C, P), {}, h(1,1), {par(X,Z)} ) • h(1, 2) = {X P, Z  G}, E(1,2)={}, fails(why?) • h(2, 1) = {Z C, Y  P}, E(2,1)={}  MCD A2 = ( v(C, P), {}, h(2,1), {}, {par(Z,Y)} ) • h(2, 2) = {Z P, Y  G}, fails(why?) lav-iv

  28. A view: v(C, P) :- par(C, P), par(P, G) A query: Q: q(X, Y) :- par(X, Z), par(Z, Y) MCDs: A1 = ( v(C, P), {}, h(1,1), {par(X,Z)} ) A2 = ( v(C, P), {}, h(2,1), {par(Z,Y)} ) Rewritings: (rename views to have distinct vars) A1+A2: X C1, Z P1, Z C2, Y  P2 : add P1 (in 1st v) = C2 (in 2nd v) rewriting v(C1,P1), v(P1, P2) renaming: v(X, Z), v(Z, Y) – a correct rewriting lav-iv

  29. When Q or views contain constants: MCD formation: • a of Q must be mapped to a head variable of vi, or itself • If x is in headvar(Q), it can be mapped to headvar(vi) or to a • Whenever x is mapped to a, hi records this fact MCD combination: If A1, A2 are defined on x, then allow also • Both map x to a • One maps x to a, the other to head var of view • In either case, rename x to a in rewriting lav-iv

  30. When Q or views contain comparisons: • If views contain comparisons, no change to algorithm (it finds contained rewritings anyway) • If Q contains comparisons, then there may be no Datalog program that computes the certain answers (can express x != y) But, we can expect that extending the algorithm for comparisons will be a good heuristics, and will find certain answers in many cases lav-iv

  31. When Q or views contain comparisons: C(Q) – constraints of Q (closed under inference) MCD formation: (vi, Ei, hi, Gi) (extend the join variable condition) • If hi(x) is existential of vi, and c(x, y) in C(Q), then hi(y) is defined • C(vi) must imply all constraints in hi(C(Q)) that involve at least one existential of vi MCD combination: Add all constraints of C(Q) not covered by those of the views lav-iv

More Related