CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

CS240A: Databases and Knowledge BasesFrom Differential Fixpoints to Magic Sets Notes From Chapter 9 of Advanced Database Systems by Zaniolo, Ceri, Faloutsos, Snodgrass, Subrahmanian and Zicari Morgan Kaufmann, 1997 Carlo Zaniolo Department of Computer Science University of California, Los Angeles January, 2002

Recursive Predicates r1: anc(X, Y) ¬ parent(X, Y). r2:anc(X, Z) ¬ anc(X,Y), parent(Y,Z). r2 is a recursive rule---a left linear one r1 is the a nonrecursive rule defining a recursive predicate—this is called an exit rule. An alternative definition for anc: r3: anc(X, Y) ¬ parent(X, Y). r4:anc(X, Z) ¬ anc(X,Y), anc(Y,Z). Herer4is a quadratic rule.

Fixpoint Computation The inflationary immediate consequence operator for P: P (I) = TP (I) ÈI We have: Pn (Æ) = TPn (Æ) lfp(TP) = TPw (Æ) = lfp(P) = Pw (Æ)

Fixpoint Computation (cont.) Naïve FixpointAlgorithm for P(M = Æ, for now ) {S : = M ; S¢: = P(M) while S Ì S¢ { S : = S¢; S¢: = P(S) } } We can replace the first P with E and the second one with R respectively denoting the immediate consequence operators for the exit rules and the recursive ones.

Differential Fixpoint (a.k.a. Seminaive Computation) Redundant Computation: the jth iteration step also re-computes all atoms obtained in the (j – 1)th step. Finite differences techniques tracing the derivations over two steps: 1.S the set of atoms obtained up to step j-1 2.S’ the set of atoms obtained up to step j 3.dS = R (S) - S = TR (S) - S denotes the new atoms at step j (i.e., the atoms that were not in S at step j-1) 4.d¢S = R (S¢) - S¢ = TR (S¢) - S¢ are the new atoms obtained at step j+1.

Differential Fixpoint Algorithm(M=Æ, for now ) {S := M; dS := TE(M); S¢ := S È dS; whiledS ¹Æ { d¢S := TR(S¢) - S¢; S := S¢ ; dS := d¢S ; S¢ := S È dS } } anc, danc, and anc¢, respectively, denote ancestor atoms that are in S, dS, and S¢ = S ÈdS.

Rule Differentiation • To compute dS¢: = TR ( S¢) - S¢ we can use a TR defined by the following rule: d¢anc(X, Z) ¬ anc¢(X,Y), parent(Y,Z). • This can be rewritten as: d¢anc(X, Z) ¬danc(X,Y), parent(Y,Z). d¢anc(X, Z) ¬ anc(X,Y), parent(Y,Z). The second rule can now be eliminated, since it produces only atoms that were already contained in anc¢, i.e., in the S¢ computed in the previous iteration. Thus, for linear rules, replace: d¢S := TR(S¢) - S¢ by d¢S := TR(dS) - S¢. Forn nonlinear rules the rewriting is more complex.

Non Linear Rules ancs(X, Y) ¬ parent(X, Y). ancs(X, Z) ¬ ancs(X,Y), ancs(Y,Z). r: d¢ancs(X, Z) ¬ ancs¢(X,Y), ancs¢(Y,Z). r1:d¢ancs(X, Z) ¬dancs(X,Y), ancs¢(Y,Z). r2:d¢ancs(X, Z) ¬ ancs(X,Y), ancs¢(Y,Z). Now, we can re-write r2 as: r2,1:d¢ancs(X, Z) ¬ ancs(X,Y), dancs(Y,Z). r2,2:d¢ancs(X, Z) ¬ ancs(X,Y), ancs(Y,Z). Rule r2,2 produces only `old' values, and can be eliminated. We are left with rules r1 and r2,1: d¢ancs(X, Z) ¬dancs(X,Y), ancs¢(Y,Z). d¢ancs(X, Z) ¬ ancs(X,Y), dancs(Y,Z).

Semivaive Fixpoint (cont.) • Analogy with symbolic differentiation • Performance improvements: it is typically the case that n = |dS | << N = |S|»| S¢|. • The original ancs rule, for instance, requires the equijoin of two relations of size N; after the differentiation we need to compute two equijoins, each joining a relation of size n with one of size N.

General Nonlinear Rules A recursive rule of rank k is as follows: r: Q0¬ c0, Q1, c1, Q2, ¼ Qk, ck Is rewritten as follows: r1: d¢Q0¬ c0, dQ1, c1, Q¢2, ¼ Q¢k, ck r2: d¢Q0¬ c0, Q1, c1, dQ2, ¼ Q¢k, ck¼ rk:d¢Q0¬ c0, Q1, c1, Q2, ¼ dQk, ck Thus the jth rule has the form: rj:d¢Q0¬¼ Q ¼dQj¼ Q¢

Iterated Fixpoint Computation for program P stratified in n strata Let Pj, 1 £ j £ n denote the rules with their head in the j-th stratum. Then, Mj be inductively constructed as follows: • 1.M0 = Æ and • 2.Mj = UPjw (Mj-1). The naïve fixpoint algorithm remains the same, but M := Mj-1and P is replaced byPj Theorem: Let P be a positive program stratified in n strata, and let Mn be the result produced by the iterated fixpoint computation. Then, Mn = lfp(TP). For programs with negated goals the computation by strata is necessary to produce the correct result (I.e., the Mn is the stable model for P---not discussed here)

Bottom-Up versus Top-Down Computation anc(X, Y) ¬ parent(X, Y). Compiled Rules anc(X, Z) ¬ anc(X,Y), parent(Y,Z). parent(X, Y) ¬ father(X, Y). parent(X, Y) ¬ mother(X, Y). mother(anne, silvia). Database mother(silvia, marc). • The differential fixpoint is computed in a bottom-up fashion. For a query ?anc(X, Y) this is optimal. • But many queries are such as ?anc(marc, Y) we want to propagate down the ‘marc’ constraint. Same for query forms: ?anc($X, Y), ?anc(X, $Y), or ?anc($X, $Y).

Specialization for Left-linear Recursive Rules ?anc(tom, Desc). anc(Old, Young) ¬ parent(Old, Young). anc(Old, Young) ¬ anc(Old, Mid), parent(Mid, Young) This is changed into: ? anc(tom, Desc ) anc(Old/tom, Young) ¬ parent(Old/tom, Young). anc(Old/tom, Young) ¬ anc(Old/tom, Mid), parent(Mid, Young). Similar to the pushing selection inside recursion of query optimizers. This works for left-linear rules with the query form: ?anc($Someone, Desc)

Right-linear rules anc(Old, Young) ¬ parent(Old, Young). anc(Old, Young) ¬ parent(Old, Mid), anc(Mid, Young). Descendants of Tom: ? anc(TOM, X) • This query can no longer be implemented by specializing the program. Solution: turn the rules into equivalent left-recursive ones! • Symmetrically anc(X, $Y) cannot be supported into the above, to right-linear one above to which specialization applies. • The situation is symmetric. A query such as anc(X, $Y) cannot be supported on the left-linear version of the program. But the program can be transformed into the one above, to right-linear rules above to which specialization can apply. • For each left (right) linear rule there exists an equivalent right(left) linear program---similar tor regular grammars in PLs. • Deductive Database compilers do that.

The Magic Set Method • Specialization only works for left/right linear programs. It does not work in general, even for linear rules. The same generation example: sg(A , A). sg(X, Y) ¬ parent(XP,X), sg(XP,YP), parent(YP,Y). ?sg(marc, Who). • This program cannot be computed in a bottom-up fashion because the exit rule is not safe. • We can compute a “magic” set containing all the ancestors of marc and add them to the two rules.

Magic Sets fornon-recursive rules • Find the graduating seniors and their parents’ address: spa(SN, PN, Paddr) ¬ senior(SN), parent(SN, PN), address(PN, Paddr). senior(SN) ¬ student(SN, _, senior),graduating(SN). • To find the address of the parent named `Joe Doe’ ?spa(SN, `Joe Doe’, Paddr) • Suppose that computing parent(X, $Y) is safe and not too expensive.

Magic Set Rewriting spa_q(‘Joe Doe’). m.senior(SN) ¬ spa_q(SN), parent(SN,PN). senior(SN) ¬ m.senior(SN),student(SN, _, senior), graduating(SN). The rest remains unchanged: spa(SN, PN, Paddr) ¬ senior(SN), parent(SN,PN), address(PN,Paddr). ? spa(SN, `Joe Doe’, Paddr).

The Same Generation Example sg(A , A). sg(X, Y) ¬parent(XP,X), sg(XP,YP), parent(YP,Y). ?sg(marc, Who). • This program cannot be computed in a bottom-up fashion because the exit rule is not safe. • We can compute a “magic” set containing all the ancestors of marc and add them to the two rules. • The magic set computation utilizes the bound arguments and goals in rules (blue).The first argument of sg is bound in the query. Thus X is bound and through goal parent(XP, X) the binding is passed to XP in the recursive goal. The variables Y and YP remain unbound

Magic Sets (Cont.) Magic set rules: m.sg(marc). m.sg(XP) ¬ m.sg(X), parent(XP,X). Transformed rules: sg¢(X, X) ¬ m.sg(X). sg¢(X, Y) ¬ parent(XP,X), sg¢(XP,YP), parent(YP,Y), m.sg(X). Query: ?sg¢(marc, Who). • The rules for the magic predicates are built by using: (1) the query constant as the exit rule (a fact). (2) the bound arguments and predicates from the recursive rules---but the head and tail must be switched!

Recursive Methods • There are many other recursive methods, but the magic set is the most general and more widely use in deductive systems—including LDL++

CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets