180 likes | 286 Views
CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets. Notes From Chapter 9 of Advanced Database Systems by Zaniolo, Ceri, Faloutsos, Snodgrass, Subrahmanian and Zicari Morgan Kaufmann, 1997. Carlo Zaniolo Department of Computer Science
E N D
CS240A: Databases and Knowledge BasesFrom Differential Fixpoints to Magic Sets Notes From Chapter 9 of Advanced Database Systems by Zaniolo, Ceri, Faloutsos, Snodgrass, Subrahmanian and Zicari Morgan Kaufmann, 1997 Carlo Zaniolo Department of Computer Science University of California, Los Angeles
Recursive Predicates r1: anc(X, Y) ¬ parent(X, Y). r2:anc(X, Z) ¬ anc(X,Y), parent(Y,Z). r2 is a recursive rule---a left linear one r1 is the a nonrecursive rule defining a recursive predicate—this is called an exit rule. An alternative definition for anc: r3: anc(X, Y) ¬ parent(X, Y). r4:anc(X, Z) ¬ anc(X,Y), anc(Y,Z). Herer4is a quadratic rule.
Fixpoint Computation (cont.) Naïve FixpointAlgorithm for P(M = Æ, for now ) S : = M ; S¢: = TP(M) while S Ì S¢ { S : = S¢; S¢: = TP(S) } We can replace the first TP with TE and the second one with TR respectively denoting the immediate consequence operators for the exit rules and the recursive ones.
Differential Fixpoint (a.k.a. Seminaive Computation) Redundant Computation: the jth iteration step also re-computes all atoms obtained in the (j – 1)th step. 1.S the set of atoms obtained up to step j-1 2.S’=T(S) is the set of atoms obtained up to step j 3.dS = S’-S = TR(S)-S is the new atoms at step j (i.e., the atoms that were not in S at step j-1) 4. Let Td(X) denote the function TR(X)-X. Then dS=Td(S) are the atoms at steps sj 5.dS’ = Td(S’) = TR(S’)-S’are the new atoms obtained at step j+1. Finite diff techniques tracing the derivations over two steps: 6. Td(S¢)= Td(S È dS) computed from rules.
Differential Fixpoint Algorithm(M=Æ, for now ) S := M; dS := TE(M); S¢ := S È dS; whiledS ¹Æ { d¢S := Td(S¢)¢; S := S¢ ; dS := d¢S ; S¢ := S È dS } anc, danc, and anc¢, respectively, denote ancestor atoms that are in S, dS, and S¢ = S ÈdS.
Rule Differentiation • To compute dS¢: = TR ( S¢) - S¢ we can use a TR defined by the following rule: d¢anc(X, Z) ¬ anc¢(X,Y), parent(Y,Z). • This can be rewritten as: d¢anc(X, Z) ¬danc(X,Y), parent(Y,Z). d¢anc(X, Z) ¬ anc(X,Y), parent(Y,Z). The second rule can now be eliminated, since it produces only atoms that were already contained in anc¢, i.e., in the S¢ computed in the previous iteration. Thus, for linear rules, replace: d¢S := TR(S¢) - S¢ by d¢S := TR(dS) - S¢. Forn nonlinear rules the rewriting is more complex.
Non Linear Rules ancs(X, Y) ¬ parent(X, Y). ancs(X, Z) ¬ ancs(X,Y), ancs(Y,Z). r: d¢ancs(X, Z) ¬ ancs¢(X,Y), ancs¢(Y,Z). r1:d¢ancs(X, Z) ¬dancs(X,Y), ancs¢(Y,Z). r2:d¢ancs(X, Z) ¬ ancs(X,Y), ancs¢(Y,Z). Now, we can re-write r2 as: r2,1:d¢ancs(X, Z) ¬ ancs(X,Y), dancs(Y,Z). r2,2:d¢ancs(X, Z) ¬ ancs(X,Y), ancs(Y,Z). Rule r2,2 produces only `old' values, and can be eliminated. We are left with rules r1 and r2,1: d¢ancs(X, Z) ¬dancs(X,Y), ancs¢(Y,Z). d¢ancs(X, Z) ¬ ancs(X,Y), dancs(Y,Z).
Semivaive Fixpoint (cont.) • Analogy with symbolic differentiation • Performance improvements: it is typically the case that n = |dS | << N = |S|»| S¢|. • The original ancs rule, for instance, requires the equijoin of two relations of size N; after the differentiation we need to compute two equijoins, each joining a relation of size n with one of size N.
General Nonlinear Rules A recursive rule of rank k is as follows: r: Q0¬ c0, Q1, c1, Q2, ¼ Qk, ck Is rewritten as follows: r1: d¢Q0¬ c0, dQ1, c1, Q¢2, ¼ Q¢k, ck r2: d¢Q0¬ c0, Q1, c1, dQ2, ¼ Q¢k, ck¼ rk:d¢Q0¬ c0, Q1, c1, Q2, ¼ dQk, ck Thus the jth rule has the form: rj: d¢Q0¬¼ Q ¼dQj¼ Q¢
Fixpoint Computation The inflationary immediate consequence operator for P: P (I) = TP (I) ÈI We have: Pn (Æ) = TPn (Æ) lfp(TP) = TPw (Æ) = lfp(P) = Pw (Æ) Thus P computes the same result as TP when we start from an empty set M. But when we start from a non-empty M, then Pa preserves the old results—unlike TP
Iterated Fixpoint Computation for program P stratified in n strata Let Pj, 1 £ j £ n denote the rules with their head in the j-th stratum. Then, Mj be inductively constructed as follows: • 1.M0 = Æ and • 2.Mj = Pjw (Mj-1). The naïve fixpoint algorithm remains the same, but M := Mj-1and P is replaced byPj Theorem: Let P be a positive program stratified in n strata, and let Mn be the result produced by the iterated fixpoint computation. Then, Mn = lfp(TP). For programs with negated goals the computation by strata is necessary to produce the correct result (I.e., the Mn is the stable model for P---not discussed here)
Bottom-Up versus Top-Down Computation anc(X, Y) ¬ parent(X, Y). Compiled Rules anc(X, Z) ¬ anc(X,Y), parent(Y,Z). parent(X, Y) ¬ father(X, Y). parent(X, Y) ¬ mother(X, Y). mother(anne, silvia). Database mother(silvia, marc). • The differential fixpoint is computed in a bottom-up fashion. For a query ?anc(X, Y) this is optimal. • But many queries are such as ?anc(marc, Y) we want to propagate down the ‘marc’ constraint. Same for query forms: ?anc($X, Y), ?anc(X, $Y), or ?anc($X, $Y).
Specialization for Left-linear Recursive Rules ?anc(tom, Desc). anc(Old, Young) ¬ parent(Old, Young). anc(Old, Young) ¬ anc(Old, Mid), parent(Mid, Young) This is changed into: ? anc(tom, Desc ) anc(Old/tom, Young) ¬ parent(Old/tom, Young). anc(Old/tom, Young) ¬ anc(Old/tom, Mid), parent(Mid, Young). Similar to the pushing selection inside recursion of query optimizers. This works for left-linear rules with the query form: ?anc($Someone, Desc)
Right-linear rules anc(Old, Young) ¬ parent(Old, Young). anc(Old, Young) ¬ parent(Old, Mid), anc(Mid, Young). Descendants of Tom: ? anc(TOM, X) • This query can no longer be implemented by specializing the program. Solution: turn the rules into equivalent left-recursive ones! • Symmetrically anc(X, $Y) cannot be supported into the above, to right-linear one above to which specialization applies. • The situation is symmetric. A query such as anc(X, $Y) cannot be supported on the left-linear version of the program. But the program can be transformed into the one above, to right-linear rules above to which specialization can apply. • For each left (right) linear rule there exists an equivalent right(left) linear program---similar tor regular grammars in PLs. • Deductive Database compilers do that.
Linear Rules that are notleft-linear or right-linear • Specialization only works for left/right linear programs. It does not work in general, even for linear rules. The same generation example: sg(A , A). sg(X, Y) ¬ parent(XP,X), sg(XP,YP), parent(YP,Y). ?sg(marc, Who). • This program cannot be computed in a bottom-up fashion because the exit rule is not safe. • Different techniques are needed and many have been proposed. • The Magic Set Method first compute all the ancestors of marc and then modify the original rules to visit only such ancestors
The Same Generation Example sg(A , A). sg(X, Y) ¬parent(XP,X), sg(XP,YP), parent(YP,Y). ?sg(marc, Who). • This program cannot be computed in a bottom-up fashion because the exit rule is not safe. • We can compute a “magic” set containing all the ancestors of marc and add them to the two rules. • The magic set computation utilizes the bound arguments and goals in rules (blue).The first argument of sg is bound in the query. Thus X is bound and through goal parent(XP, X) the binding is passed to XP in the recursive goal. The variables Y and YP remain unbound
Magic Sets (Cont.) Magic set rules: m.sg(marc). m.sg(XP) ¬ m.sg(X), parent(XP,X). Transformed rules: sg¢(X, X) ¬ m.sg(X). sg¢(X, Y) ¬ parent(XP,X), sg¢(XP,YP), parent(YP,Y), m.sg(X). Query: ?sg¢(marc, Who). • The rules for the magic predicates are built by using: (1) the query constant as the exit rule (a fact). (2) the bound arguments and predicates from the recursive rules---but the head and tail must be switched!
Recursive Methods • There are many other recursive methods, but the magic set is the most general and more widely use in deductive systems—including LDL++