250 likes | 371 Views
PODS 2012. 2012 ACM SIGMOD/PODS Conference Scottsdale, Arizona, USA. A Dichotomy in the Complexity of Deletion Propagation with Functional Dependencies. Benny Kimelfeld IBM Research – Almaden. Deletion Propagation.
E N D
PODS 2012 2012 ACM SIGMOD/PODS Conference Scottsdale, Arizona, USA A Dichotomy in the Complexity of Deletion Propagation with Functional Dependencies Benny Kimelfeld IBM Research – Almaden
Deletion Propagation • Translate a tuple deletion on the view back to the source relations … properly • Classic database problem • Specializing the more general view-update problem • [Dayal & Bernstein 1982; Cosmadakis & Papadimitriou 1984; Keller 1986; Cui & Widom 2001; Buneman & Khanna & Tan 2002; Cong & Fan & Geerts 2006; …] • Renewed motivation: debug/causality for false positives [K, Vondrak, Williams, 2011] • Various definitions of “properly” were studied • Minimize the view side effect • # view tuples lost except the intentional one • Minimize the source side effect • # source tuples to delete • = maximal “responsibility” for an answer [Meliou et al., 2010] This Work!
Example: File Access [Cui & Widom 2001; Buneman et al. 2002] = ⋈ Access(u,f) :–UserGroup(u,g), GroupFile(g,f) Delete source rows, s.t. Emma won’t access a.txt. But, maintain maximum access permissions!
Example: File Access [Cui & Widom 2001; Buneman et al. 2002] = ⋈ Access(u,f) :–UserGroup(u,g), GroupFile(g,f) Delete source rows, s.t. Emma won’t access a.txt. But, maintain maximum access permissions!
Example: File Access [Cui & Widom 2001; Buneman et al. 2002] = ⋈ side-effect free (& minimal side effect) Access(u,f) :–UserGroup(u,g), GroupFile(g,f) Delete source rows, s.t. Emma won’t access a.txt. But, maintain maximum access permissions!
Formal Definitions SchemaS: rel. symbols + functional dependencies (fd) R1,….,Rm Ri: attribute-set → attribute Conjunctive Query (CQ) Q: Q(y1,y2,y3) :– R1(x1,y1), R2(x1,'ibm'), R3(x2,y1,y2,x3), R4(x4,y3) head variables existential variables atom No self joins! • Solution: E ⊆ D s.t. a ∉ Q(E) • Side-effect free: Q(E) = Q(D) – {a} • Optimal: |Q(E)| is maximal • Input: • DB D over S • Answer a ∈ Q(D) to delete
Complexity Questions What is the complexity of • Deciding if a side-effect-free solution exists? • Finding an optimal solution? • Or one w/ approximatelyminimal side effect? • Or one w/ approximatelymaximal # surviving answers? • Not the same [K, Vondrák, Williams, 2011]
Unirelation Algorithm (1Rel): Example [Buneman et al., 2002] ⋈ = Access(u,f) :–UserGroup(u,g), GroupFile(g,f) Delete a = (Emma, a.txt)
Unirelation Algorithm (1Rel): Example [Buneman et al., 2002] ⋈ = Access(u,f) :–UserGroup(u,g), GroupFile(g,f) better than previous ⇒selected solution Delete a = (Emma, a.txt) Recall: there is even better solution (side-effect free)
1Rel: General Case … undesired a ∈Q(D) D Q has k atoms solution 1 … select best D solution 2 (i=1,…,k) solutioni: delete from Ri each tuple consistent w/ a … … D solutionk
Head Domination [K, Vondrák, Williams, 2011] head domination: ∀C ∊CC(G∃[Q]) ∃j∊atoms(Q) s.t.,headVars(C) ⊆ vars(j) Connected Components Q(y1 ,y2) :– R1(x1,y1),R2(x1,y2),R3(x1,y1 ,y2) Q(y1 ,y2 ,y3) :– R1(x1,y1),R2(x1,y2),R3(y1 ,y2),R4(x2,y2,y3) Q(y1 ,y2) :– R1(x,y1),R2(x,y2) Access(u,f)
Previous Dichotomy Theorem [KVW 2011] Let Q be a CQ over a schema S (no self joins) PTime (1Rel) Q(y1 ,y2) :– R1(x1,y1),R2(x1,y2),R3(x1,y1 ,y2) Q(y1 ,y2 ,y3) :– R1(x1,y1),R2(x1,y2),R3(y1 ,y2),R4(x2,y2,y3) PTime (1Rel) Q(y1 ,y2) :– R1(x,y1),R2(x,y2) Access(u,f) NP-hard
Access Example Revisited Delete (Emma, a.txt) NP-hard ⋈ = group ← file PTime
Access Example Revisited Delete (Emma, a.txt) NP-hard = ⋈ user → group group ← file PTime PTime
Access Example Revisited Delete (Emma, a.txt) NP-hard = ⋈ user ← group PTime user → group group ← file PTime PTime
Access Example Revisited Delete (Emma, a.txt) NP-hard = ⋈ user ← group group → file Every nontrivial set of FDs brings the problem to PTime PTime PTime user → group group ← file PTime PTime
Additional Examples Q(y,y1 ,y2) :– R1(y1,x1),R(x1,y, x2),R2(y2 ,x2) NP-hard Q(y,y1 ,y2) :– R1(x1,y1),R(x1,y, x2),R2(x2,y2) PTime Q(y,y1 ,y2) :– R1(x1,y1),R(x1,y, x2),R2(x2,y2) NP-hard
Dichotomy with FDs Let Q be a CQ over a schema S (no self joins) Remove tuple only if it is used for the undersired answer Depending on the CQ and FDs, the problem is either straightforward or hard!
FDs Among Variables Access(u,f) :–UserGroup(u,g), GroupFile(g,f) FD: user → group FD: group → file u →g g→f u →f {u,g} →f
The CQ Q+ Tractability Condition: Q+hasfunctional head domination Q+ :add to Q’s head every x s.t. headVars → x Access(u,f) :–UserGroup(u,g), GroupFile(g,f) group ← file g ← {u,f} ⇒ Access+(u,g,f) :–UserGroup(u,g), GroupFile(g,f)
Functional Head Domination Tractability Condition: Q+hasfunctional head domination head domination: ∀C∈CC(G∃[Q]) ∃j∊atoms(Q), s.t. vars(j)⊇headVars(C) functional head domination: ∀C∈CC(G∃[Q]) ∃j∊atoms(Q), s.t. vars(j)→ headVars(C) Access(u,f) :–UserGroup(u,g), GroupFile(g,f) ⇐ {u,g} →{u,f} group → file
Examples Tractability Condition: Q+hasfunctional head domination Q(y,y1 ,y2) :– R1(x1,y1),R(x1,y, x2),R2(x2,y2) NP-hard Q(y,y1 ,y2) :– R1(x1,y1),R(x1,y, x2),R2(x2,y2) {y,y1 ,y2} →x2 Q+(y,y1 ,y2,x2) :– R1(x1,y1),R(x1,y, x2),R2(x2,y2) PTime(1Rel*)
Example: Key-Preserving Views Tractability Condition: Q+hasfunctional head domination Theorem [Cong, Fan, Geerts, 2006]: Q preserves keys* ⇒ deletion propagation in PTime For CQs w/o self joins, follows directly from our positive side: Q preserves keys ⇒ Q+ has no existential vars ⇒ G∃[Q+] has no edges ⇒ Q+ trivially hasfunctional head domination (every connected component is a node, dominated by itself…) ⇒ 1Rel* returns an optimal solution • Each relation has a key; none of the key attributes are projected out
About the Proof • The positive side is fairly simple • … once the tractability condition is found • The negative side is intricate • Reduction from the special case of the Access CQ • Challenge: simulating Access(u,f) by an instance that satisfies all the FDs • Central concept: graph separation on the variable graph of the CQ Q(y1 ,y2) :– R1(y1,x),R2(x ,y2) → Q'(y1 ,y2) :– R1(y1,x1,x),R2(x ,x2,y2) R3(x1,x2)
Conclusions & Ongoing Work • Studied deletion propagation in the presence of functional dependencies • Established a dichotomy in complexity: • PTime by a straightforward algorithm vs. • Hardness (of approximation) • Generalizes previously established special cases: no FDs, key-preserving views • Ongoing work: deletion of multiple answers • Preview: trichotomy • Straightforward • Hard but approximable (by a constant-factor) • Hard to approximate Questions?