620 likes | 803 Views
A Course on Probabilistic Databases. Dan Suciu University of Washington. Outline. Part 1. Motivating Applications The Probabilistic Data Model Chapter 2 Extensional Query Plans Chapter 4.2 The Complexity of Query Evaluation Chapter 3 Extensional Evaluation Chapter 4.1
E N D
Probabilistic Databases - Dan Suciu A Course on Probabilistic Databases Dan Suciu University of Washington
Probabilistic Databases - Dan Suciu Outline Part 1 • Motivating Applications • The Probabilistic Data Model Chapter 2 • Extensional Query Plans Chapter 4.2 • The Complexity of Query Evaluation Chapter 3 • Extensional Evaluation Chapter 4.1 • Intensional Evaluation Chapter 5 • Conclusions Part 2 Part 3 Part 4
Probabilistic Databases - Dan Suciu Overview • Review: Unions of Conjunctive Queries, UCQ • Four simple rules for evaluating queries Q • Big Dichotomy Theorem: • If the rules succeed Q is safe in PTIME • If the rules fail Q is unsafe #P-complete • Compare to the Small Dichotomy Theorem, which applies only to conjunctive queries w/o self-joins: • Case 1 holds precisely when Q is hierarchical • Case 2 holds precisely when Q is not hierarchical
Probabilistic Databases - Dan Suciu Review: Unions of Conjunctive Queries Owners of items in either “Office444” or “Hall7”: Q(z) = ∃x1∃t1 (Owner(z,x1) ∧ Location(x1,t1,”Office444”)) ∨∃x2∃t2(Owner(z,x2) ∧ Location(x2,t2,”Hall7”)) Same as: Q(z) = Owner(z,x1),Location(x1,t1,”Office444”) ∨ Owner(z,x2),Location(x2,t2,”Hall7”)
Probabilistic Databases - Dan Suciu Review: Unions of Conjunctive Queries Owners of items in either “Office444” or “Hall7”: Q(z) = ∃x1∃t1 (Owner(z,x1) ∧ Location(x1,t1,”Office444”)) ∨∃x2∃t2(Owner(z,x2) ∧ Location(x2,t2,”Hall7”)) Union of conjunctive queries Same as: Q(z) = Owner(z,x1),Location(x1,t1,”Office444”) ∨ Owner(z,x2),Location(x2,t2,”Hall7”)
Probabilistic Databases - Dan Suciu Review: Unions of Conjunctive Queries Owners of items in either “Office444” or “Hall7”: Q(z) = ∃x1∃t1 (Owner(z,x1) ∧ Location(x1,t1,”Office444”)) ∨∃x2∃t2(Owner(z,x2) ∧ Location(x2,t2,”Hall7”)) Union of conjunctive queries Same as: Q(z) = Owner(z,x1),Location(x1,t1,”Office444”) ∨ Owner(z,x2),Location(x2,t2,”Hall7”) Same as: Q(z) = Owner(z,x)∧∃t [Location(x,t,”Office444”) ∨ Location(x,t,”Hall7”)]
Probabilistic Databases - Dan Suciu Review: Unions of Conjunctive Queries Owners of items in either “Office444” or “Hall7”: Q(z) = ∃x1∃t1 (Owner(z,x1) ∧ Location(x1,t1,”Office444”)) ∨∃x2∃t2(Owner(z,x2) ∧ Location(x2,t2,”Hall7”)) Union of conjunctive queries Same as: Q(z) = Owner(z,x1),Location(x1,t1,”Office444”) ∨ Owner(z,x2),Location(x2,t2,”Hall7”) Same as: Q(z) = Owner(z,x)∧∃t [Location(x,t,”Office444”) ∨ Location(x,t,”Hall7”)] We will use these laws: Distributivity law for ∨, ∧ Commutativity law for ∃,∨: (∃x P(x)) ∨ (∃y T(y)) = ∃z (P(z) ∨ T(z))
Probabilistic Databases - Dan Suciu Four Rules for Computing Query Probabilities • Independent join • Independent project • Independent union • Inclusion/exclusion Rules apply to Boolean Queries only
Probabilistic Databases - Dan Suciu Rule 1: Independent Join P(Q1 ∧Q2) = P(Q1)P(Q2) If Q1 and Q2 are independent (meaning: no common atoms)
Probabilistic Databases - Dan Suciu Rule 1: Independent Join P(Q1 ∧Q2) = P(Q1)P(Q2) If Q1 and Q2 are independent (meaning: no common atoms) Rule 2: Independent Project P(∃z Q) = 1 – Πa ∈Domain(1– P(Q[a/z]) If z is a “separator variable” in Q, meaning that for any constants a,b,Q[a/z] and Q[b/z] are independent
Probabilistic Databases - Dan Suciu Rule 1: Independent Join P(Q1 ∧Q2) = P(Q1)P(Q2) If Q1 and Q2 are independent (meaning: no common atoms) Rule 2: Independent Project P(∃z Q) = 1 – Πa ∈Domain(1– P(Q[a/z]) If z is a “separator variable” in Q, meaning that for any constants a,b,Q[a/z] and Q[b/z] are independent Rule 3: Independent Union P(Q1 ∨Q2) =1 – (1 – P(Q1))(1 – P(Q2)) If Q1 and Q2 are independent (meaning: no common atoms)
Probabilistic Databases - Dan Suciu Example QU= R(x1),S(x1,y1) ∨ T(x2),S(x2,y2) =∃x1∃y1R(x1)∧S(x1,y1) ∨ ∃x2∃y2T(x2)∧S(x2,y2)
Probabilistic Databases - Dan Suciu Example QU= R(x1),S(x1,y1) ∨ T(x2),S(x2,y2) =∃x1∃y1R(x1)∧S(x1,y1) ∨ ∃x2∃y2T(x2)∧S(x2,y2) QU= ∃z [R(z)∧S(z,y1) ∨ T(z)∧S(z,y2)] Commute ∃ with ∨
Probabilistic Databases - Dan Suciu Example QU= R(x1),S(x1,y1) ∨ T(x2),S(x2,y2) =∃x1∃y1R(x1)∧S(x1,y1) ∨ ∃x2∃y2T(x2)∧S(x2,y2) QU= ∃z [R(z)∧S(z,y1) ∨ T(z)∧S(z,y2)] Commute ∃ with ∨ P(QU) = 1 – Πa ∈Domain (1– P[R(a)∧S(a,y1)∨T(a)∧S(a,y2))] Independent project: for a≠b, QU[a/z] and QU[b/z] are independentbecause atoms R(a),S(a,y1),T(a),S(a,y2)are distinct from R(b),S(b,y1),T(b),S(b,y2)
Probabilistic Databases - Dan Suciu Example QU= R(x1),S(x1,y1) ∨ T(x2),S(x2,y2) =∃x1∃y1R(x1)∧S(x1,y1) ∨ ∃x2∃y2T(x2)∧S(x2,y2) QU= ∃z [R(z)∧S(z,y1) ∨ T(z)∧S(z,y2)] Commute ∃ with ∨ P(QU) = 1 – Πa ∈Domain (1– P[R(a)∧S(a,y1)∨T(a)∧S(a,y2))] Independent project: for a≠b, QU[a/z] and QU[b/z] are independentbecause atoms R(a),S(a,y1),T(a),S(a,y2)are distinct from R(b),S(b,y1),T(b),S(b,y2) P(QU) = 1 – Πa ∈Domain (1– P[(R(a)∨T(a))∧∃y. S(a,y)] Distribute ∧ over ∨
Probabilistic Databases - Dan Suciu Example QU= R(x1),S(x1,y1) ∨ T(x2),S(x2,y2) =∃x1∃y1R(x1)∧S(x1,y1) ∨ ∃x2∃y2T(x2)∧S(x2,y2) QU= ∃z [R(z)∧S(z,y1) ∨ T(z)∧S(z,y2)] Commute ∃ with ∨ P(QU) = 1 – Πa ∈Domain (1– P[R(a)∧S(a,y1)∨T(a)∧S(a,y2))] Independent project: for a≠b, QU[a/z] and QU[b/z] are independentbecause atoms R(a),S(a,y1),T(a),S(a,y2)are distinct from R(b),S(b,y1),T(b),S(b,y2) P(QU) = 1 – Πa ∈Domain (1– P[(R(a)∨T(a))∧∃y. S(a,y)] Distribute ∧ over ∨ P(QU) = 1 – Πa ∈Domain (1– P[R(a)∨T(a)] P[∃y. S(a,y)] Independent join
Probabilistic Databases - Dan Suciu Example QU= R(x1),S(x1,y1) ∨ T(x2),S(x2,y2) =∃x1∃y1R(x1)∧S(x1,y1) ∨ ∃x2∃y2T(x2)∧S(x2,y2) QU= ∃z [R(z)∧S(z,y1) ∨ T(z)∧S(z,y2)] Commute ∃ with ∨ P(QU) = 1 – Πa ∈Domain (1– P[R(a)∧S(a,y1)∨T(a)∧S(a,y2))] Independent project: for a≠b, QU[a/z] and QU[b/z] are independentbecause atoms R(a),S(a,y1),T(a),S(a,y2)are distinct from R(b),S(b,y1),T(b),S(b,y2) P(QU) = 1 – Πa ∈Domain (1– P[(R(a)∨T(a))∧∃y. S(a,y)] Distribute ∧ over ∨ P(QU) = 1 – Πa ∈Domain (1– P[R(a)∨T(a)] P[∃y. S(a,y)] Independent join P(QU) = 1 – Πa ∈Domain (1– (1-(1-P[R(a)])(1-P[T(a)])) (1-Πb ∈Domain (1– P[S(a,b)])))
Probabilistic Databases - Dan Suciu Rule 4: Inclusion-Exclusion P(Q1 ∧ Q2 ∧ Q3) = P(Q1) + P(Q2) + P(Q3) - P(Q1 ∨ Q2) – P(Q1 ∨ Q3) – P(Q2 ∨ Q3) + P(Q1 ∨ Q2 ∨ Q3) Note: this is the dual of the more popular formula: P(Q1 ∨Q2 ∨Q3) = P(Q1) + P(Q2) + P(Q3) - P(Q1 ∧Q2) – P(Q1 ∧Q3) – P(Q2 ∧Q3) + P(Q1 ∧Q2 ∧Q3)
Probabilistic Databases - Dan Suciu Example QJ= R(x1),S(x1,y1), T(x2),S(x2,y2) = [∃x1∃y1R(x1)∧S(x1,y1)] ∧ [∃x2∃y2T(x2)∧S(x2,y2)]
Probabilistic Databases - Dan Suciu Example QJ= R(x1),S(x1,y1), T(x2),S(x2,y2) = [∃x1∃y1R(x1)∧S(x1,y1)] ∧ [∃x2∃y2T(x2)∧S(x2,y2)] QJ = Q1∧ Q2 where Q1= R(x1),S(x1,y1) Q2= T(x2),S(x2,y2)
Probabilistic Databases - Dan Suciu Example QJ= R(x1),S(x1,y1), T(x2),S(x2,y2) = [∃x1∃y1R(x1)∧S(x1,y1)] ∧ [∃x2∃y2T(x2)∧S(x2,y2)] QJ = Q1∧ Q2 where Q1= R(x1),S(x1,y1) Q2= T(x2),S(x2,y2) P(QJ) = P(Q1) + P(Q2) - P(Q1 ∨ Q2) Q1 = a hierarchical conjunctive query w/o self-joins Q2= similar Q1 ∨ Q2 = QU, which have see a couple of slides ago
Probabilistic Databases - Dan Suciu Lesson 3 We need unions in order to handle self-joins! • Conjunctive Queries = not a “natural” class of queries for Probabilistic DBs • Unions of Conjunctive Queries = the “natural” class of queries
Probabilistic Databases - Dan Suciu Unsafe Queries – When the Rules Fail H0= R(x),S(x,y),T(y)
Probabilistic Databases - Dan Suciu Unsafe Queries – When the Rules Fail H0= R(x),S(x,y),T(y) H1= R(x0),S(x0,y0) ∨ S(x1,y1),T(y1) =∃z [R(z)∧S(z,y0) ∨ S(x1,z)∧T(z)] Unlike QU, here z occurs on different positions in Sand we cannot apply Independent Project
Probabilistic Databases - Dan Suciu Unsafe Queries – When the Rules Fail H0= R(x),S(x,y),T(y) H1= R(x0),S(x0,y0) ∨ S(x1,y1),T(y1) H2= R(x0),S1(x0,y0)∨S1(x1,y1),S2(x1,y1)∨S2(x2,y2),T(y2)
Probabilistic Databases - Dan Suciu Unsafe Queries – When the Rules Fail H0= R(x),S(x,y),T(y) H1= R(x0),S(x0,y0) ∨ S(x1,y1),T(y1) H2= R(x0),S1(x0,y0)∨S1(x1,y1),S2(x1,y1)∨S2(x2,y2),T(y2) H3= R(x0),S1(x0,y0)∨S1(x1,y1),S2(x1,y1)∨S2(x2,y2),S3(x2,y2)∨S3(x3,y3),T(y3) . . .
Probabilistic Databases - Dan Suciu Unsafe Queries – When the Rules Fail H0= R(x),S(x,y),T(y) H1= R(x0),S(x0,y0) ∨ S(x1,y1),T(y1) H2= R(x0),S1(x0,y0)∨S1(x1,y1),S2(x1,y1)∨S2(x2,y2),T(y2) H3= R(x0),S1(x0,y0)∨S1(x1,y1),S2(x1,y1)∨S2(x2,y2),S3(x2,y2)∨S3(x3,y3),T(y3) . . . Theorem. Each query Hk is #P-hard The proof is in [Dalvi&S, JACM’2012]
Probabilistic Databases - Dan Suciu The Amazing Queries Hk Hk is #P-hard. But if we drop any one conjunctive query, then it is in PTIME H3= R(x0),S1(x0,y0)∨S1(x1,y1),S2(x1,y1)∨S2(x2,y2),S3(x2,y2)∨S3(x3,y3),T(y3) Independent union = ∃z [S2(x2,z),S3(x2,z)∨S3(x3,z),T(z)]= ∃z [∃x3S3(x3,z)] ∧ [(∃x2S2(x2,z)) ∨ T(z)]= etc
Probabilistic Databases - Dan Suciu Where We Are • We have seen examples of unsafe queries: Hk • But if a queryQ has Hk as a subquery, it is not necessarily unsafe • When the four rules succeed, then Q is safe • But inclusion/exclusionis insufficient: need to replace with Mobius inversion formula We will discuss these issuesthen state the Big Dichotomy Theorem
Probabilistic Databases - Dan Suciu A Safe Query with H1 as Subquery QV = R(x1),S(x1,y1) ∨ S(x2,y2),T(y2) ∨ R(x3),T(y3)
Probabilistic Databases - Dan Suciu A Safe Query with H1 as Subquery Disconnected query = H1(unsafe!) QV = R(x1),S(x1,y1) ∨ S(x2,y2),T(y2) ∨ R(x3),T(y3)
Probabilistic Databases - Dan Suciu A Safe Query with H1 as Subquery Disconnected query = H1(unsafe!) DNF QV = R(x1),S(x1,y1) ∨ S(x2,y2),T(y2) ∨ R(x3),T(y3) CNF QV =[S(x2,y2),T(y2)∨ R(x3)] ∧ [R(x1),S(x1,y1)∨T(y3)]
Probabilistic Databases - Dan Suciu A Safe Query with H1 as Subquery Disconnected query = H1(unsafe!) DNF QV = R(x1),S(x1,y1) ∨ S(x2,y2),T(y2) ∨ R(x3),T(y3) CNF QV =[S(x2,y2),T(y2)∨ R(x3)] ∧ [R(x1),S(x1,y1)∨T(y3)] Inclusion/exclusion: PTIME ! P(QV) = P(q1∧q2)= P(q1) + P(q2)-P(q1∨q2) = R(x3) ∨ T(y3)
Probabilistic Databases - Dan Suciu Inclusion/Exclusion is Insufficient QW = [R(x0),S1(x0,y0) ∨ S2(x2,y2),S3(x2,y2)] ∧ /* Q1 */ [R(x0),S1(x0,y0) ∨ S3(x3,y3),T(y3)] ∧ /* Q2 */ [S1(x1,y1),S2(x1,y1) ∨ S3(x3,y3),T(y3)] /* Q3 */
Probabilistic Databases - Dan Suciu Inclusion/Exclusion is Insufficient QW = [R(x0),S1(x0,y0) ∨ S2(x2,y2),S3(x2,y2)] ∧ /* Q1 */ [R(x0),S1(x0,y0) ∨ S3(x3,y3),T(y3)] ∧ /* Q2 */ [S1(x1,y1),S2(x1,y1) ∨ S3(x3,y3),T(y3)] /* Q3 */ P(QW) = P(Q1) + P(Q2) + P(Q3) + - P(Q1∨Q2) - P(Q2∨Q3) – P(Q1∨Q3) + P(Q1∨ Q2∨ Q3) = H3(hard !) Also = H3
Probabilistic Databases - Dan Suciu Inclusion/Exclusion is Insufficient QW = [R(x0),S1(x0,y0) ∨ S2(x2,y2),S3(x2,y2)] ∧ /* Q1 */ [R(x0),S1(x0,y0) ∨ S3(x3,y3),T(y3)] ∧ /* Q2 */ [S1(x1,y1),S2(x1,y1) ∨ S3(x3,y3),T(y3)] /* Q3 */ PTIME P(QW) = P(Q1) + P(Q2) + P(Q3) + - P(Q1∨Q2) - P(Q2∨Q3) – P(Q1∨Q3) + P(Q1∨ Q2∨ Q3) = H3(hard !) Also = H3 #P-hard
Probabilistic Databases - Dan Suciu Inclusion/Exclusion is Insufficient QW = [R(x0),S1(x0,y0) ∨ S2(x2,y2),S3(x2,y2)] ∧ /* Q1 */ [R(x0),S1(x0,y0) ∨ S3(x3,y3),T(y3)] ∧ /* Q2 */ [S1(x1,y1),S2(x1,y1) ∨ S3(x3,y3),T(y3)] /* Q3 */ PTIME P(QW) = P(Q1) + P(Q2) + P(Q3) + - P(Q1∨Q2) - P(Q2∨Q3) – P(Q1∨Q3) + P(Q1∨ Q2∨ Q3) PTIME = H3(hard !) Also = H3 #P-hard
Probabilistic Databases - Dan Suciu August Ferdinand Möbius 1790-1868 • Möbius strip • Möbius function μ in number theory • Generalized to lattices[Stanley’97,Rota’09] • And now to queries !
Probabilistic Databases - Dan Suciu The CNF Lattice See formal definition in the book. Definition. The CNF lattice of Q = Q1 ∧ Q2 ∧ … is:
Probabilistic Databases - Dan Suciu The CNF Lattice See formal definition in the book. Definition. The CNF lattice of Q = Q1 ∧ Q2 ∧ … is: Example QW = [R(x0),S1(x0,y0) ∨ S2(x2,y2),S3(x2,y2)] ∧ /* Q1 */ [R(x0),S1(x0,y0) ∨ S3(x3,y3),T(y3)] ∧ /* Q2 */ [S1(x1,y1),S2(x1,y1) ∨ S3(x3,y3),T(y3)] /* Q3 */
Probabilistic Databases - Dan Suciu The CNF Lattice See formal definition in the book. Definition. The CNF lattice of Q = Q1 ∧ Q2 ∧ … is: Example QW = [R(x0),S1(x0,y0) ∨ S2(x2,y2),S3(x2,y2)] ∧ /* Q1 */ [R(x0),S1(x0,y0) ∨ S3(x3,y3),T(y3)] ∧ /* Q2 */ [S1(x1,y1),S2(x1,y1) ∨ S3(x3,y3),T(y3)] /* Q3 */ ^1 Q1 Q2 Q3 ^1 =max(L) Q1∨Q2 Q2∨Q3 Q1∨Q2∨Q3 (= Q1∨Q3)
Probabilistic Databases - Dan Suciu The CNF Lattice See formal definition in the book. Definition. The CNF lattice of Q = Q1 ∧ Q2 ∧ … is: Example QW = [R(x0),S1(x0,y0) ∨ S2(x2,y2),S3(x2,y2)] ∧ /* Q1 */ [R(x0),S1(x0,y0) ∨ S3(x3,y3),T(y3)] ∧ /* Q2 */ [S1(x1,y1),S2(x1,y1) ∨ S3(x3,y3),T(y3)] /* Q3 */ ^1 ^1 Q1 Q2 Q3 ^1 =max(L) Q1∨Q2 Q2∨Q3 Nodes in PTIME,Nodes #P hard. Q1∨Q2∨Q3 (= Q1∨Q3)
Probabilistic Databases - Dan Suciu The Möbius’ Function Def. The Möbius function:μ( , ) = 1 μ(u, ) = - Σu < v ≤ μ(v, ) ^1 ^1 ^1 ^1 ^1 Möbius’ Inversion Formula: P(Q) = - ΣQi <μ(Qi,) P(Qi) ^1 ^1 ^1
Probabilistic Databases - Dan Suciu The Möbius’ Function Def. The Möbius function:μ( , ) = 1 μ(u, ) = - Σu < v ≤ μ(v, ) ^1 ^1 ^1 ^1 ^1 Möbius’ Inversion Formula: P(Q) = - ΣQi <μ(Qi,) P(Qi) ^1 ^1 1 ^1
Probabilistic Databases - Dan Suciu The Möbius’ Function Def. The Möbius function:μ( , ) = 1 μ(u, ) = - Σu < v ≤ μ(v, ) ^1 ^1 ^1 ^1 ^1 Möbius’ Inversion Formula: P(Q) = - ΣQi <μ(Qi,) P(Qi) ^1 ^1 1 ^1 -1 -1 -1
Probabilistic Databases - Dan Suciu The Möbius’ Function Def. The Möbius function:μ( , ) = 1 μ(u, ) = - Σu < v ≤ μ(v, ) ^1 ^1 ^1 ^1 ^1 Möbius’ Inversion Formula: P(Q) = - ΣQi <μ(Qi,) P(Qi) ^1 ^1 1 ^1 -1 -1 -1 1 1
Probabilistic Databases - Dan Suciu The Möbius’ Function Def. The Möbius function:μ( , ) = 1 μ(u, ) = - Σu < v ≤ μ(v, ) ^1 ^1 ^1 ^1 ^1 Möbius’ Inversion Formula: P(Q) = - ΣQi <μ(Qi,) P(Qi) ^1 ^1 1 ^1 -1 -1 -1 1 1 0
Probabilistic Databases - Dan Suciu The Möbius’ Function 1 ^1 Def. The Möbius function:μ( , ) = 1 μ(u, ) = - Σu < v ≤ μ(v, ) -1 -1 -1 ^1 ^1 ^1 ^1 ^1 Möbius’ Inversion Formula: P(Q) = - ΣQi <μ(Qi,) P(Qi) ^1 ^1 1 ^1 -1 -1 -1 1 1 0
Probabilistic Databases - Dan Suciu The Möbius’ Function 1 ^1 Def. The Möbius function:μ( , ) = 1 μ(u, ) = - Σu < v ≤ μ(v, ) -1 -1 -1 ^1 ^1 ^1 ^1 ^1 2 Möbius’ Inversion Formula: P(Q) = - ΣQi <μ(Qi,) P(Qi) ^1 ^1 1 ^1 -1 -1 -1 1 1 0
Probabilistic Databases - Dan Suciu The Möbius’ Function 1 ^1 Def. The Möbius function:μ( , ) = 1 μ(u, ) = - Σu < v ≤ μ(v, ) -1 -1 -1 ^1 ^1 ^1 ^1 ^1 2 Möbius’ Inversion Formula: P(Q) = - ΣQi <μ(Qi,) P(Qi) ^1 ^1 1 ^1 -1 -1 -1 New Rule Inclusion/Exclusion Mobius’ Inversion Formula 1 1 0