230 likes | 263 Views
Containment. CSE 590 DB Rachel Pottinger. Outline. Introduction Motivation Formal definition Algorithms for different complexities An application: rewriting queries using views. Containment, what is it?.
E N D
Containment CSE 590 DB Rachel Pottinger
Outline • Introduction • Motivation • Formal definition • Algorithms for different complexities • An application: rewriting queries using views
Containment, what is it? • For two queries, Q1 and Q2, if all of the answers to Q1 are a subset of those for Q2 for all databases, then Q1 is contained in Q2. • Denoted as Q1 Q2. • For general datalog, this is undecidable (by reduction from decision problems for context free languages)
Why should I care? • Containment is useful in a number of situations, including: • Query minimization • Independence of queries using updates • Rewriting queries using views • Interesting logic problem
More definitions • Equivalence of queries: Q1Q2 if they return the same answers for all databases. This is the same as Q1 Q2 and Q2 Q1 • Conjunctive query - a query that is formed only of conjunctions of predicates. • Q(X,Y):- e(X,Z),e(Z,Y)
Containment Mapping • Let Q1 and Q2 be two conjunctive queries • Q1: I :- J1, …, Jl • Q2: H :- G1, …, Gk • A symbol mapping h is said to be a containment mapping if h turns Q2 into Q1; that is, h(H)= I, and for each i = 1,2,…,k, there is some j such that h(Gi)=Jj. There is no requirement that each Jj be the target of some Gi
Proof Sketch • If there’s a containment mapping from Q2 to Q1, then Q1 Q2 • Suppose maps Vars(Q2)Vars(Q1) • Let D be a database and be an answer • is a mapping from Vars(Q1) D • • Vars(Q2) D • The rest of the proof follows later
Example of homomorphism rules • Q1: fp(X,Y) :- e(Y,X), e(X,Z) • Q2: fp(A,B) :- e(B,A), e(C,A),e(A,D) • For Q1 Q2, map from Q2 to Q1
Test for containment of a conjunctive query (Q1Q2) • Freeze the body of Q1, and put this into a canonical database • Apply Q2 to the canonical database • If Q1 can be derived from Q2 on the canonical database, then Q1 Q2, otherwise not
A chilling example Q1: p(X,Z) :- a(X,Y), a(Y,Z) Q2: p(X,Z) :- a(X,U), a(V,Z) Canonical Database of Q1
Proof continued • If Q1 Q2,then there is a containment mapping • Since Q1 Q2, we know that if we apply Q2 to the canonical database formed from Q1, we’ll get back the same fact we got from applying it to Q1, which makes a mapping from Q2 to Q1.
Conjunctive queries with negation • Negation in the heads of the subgoals, ie: Q(X,Y):- e(X,Z),e(Z,Y) • The Levy and Sagiv test looks at an exponential number of canonical databases, thus is P2 complete • Consider all partitions of Q1; form canonical databases for all of them, D1, … Dk • For each database Di, see if the database makes all subgoals of Q1 true. • For all Di’s passing step 2, see if it the head of Q1 can be derived by applying Q2 • If so, then Q1 Q2, else not
A negative example • Q1: p(X,Z):-a(X,Y), a(Y,Z), a(X,Z) • Q2: p(A,C):-a(A,B),a(B,C), a(A,D)
Conjunctive Queries with Arithmetic Comparisons • Q(X,Y):-e(X,Z),e(Z,Y), Z < Y • Treat the same as the negated subgoals, only a check must be made for each ordering of each partition • Also P2 complete for dense domain such as reals
Example with arithmetic comparisons • Q1:p(X,Z):-a(X,Y), a(Y,Z), X < Y • Q2:p(A,C):-A(A,B),A(B,C), A < C • false, see x = z = 0, y = 1
Other complexity results • queries restricted to queries Q1 and Q2 such that all database predicates have arity at most 2 and every database predicate occurs at most three times in the body of Q1 - P2 • Conjunctive queries where Q1 is fixed- NP complete • Conjunctive queries where Q2 is fixed - polynomial • Conjunctive query containment where Q2 is an acyclic query - polynomial time • Conjunctive queries where every database predicate occurs at most twice in the body of Q1 - linear time
Rewriting Queries Using Views • Useful in query optimization • Good for query minimization • Needed to make the best use of cached information • Necessary in data integration
Views • A view is a relation that is not part of the conceptual model, but is visible to the user. • Useful for common expressions, or protecting data • Example: If you had faculty(name, office, ssn) you may want students to access faculty_office(name, office)
Views (con’t.) • Views can be either materialized or virtual • In data integration, data sources can be thought of as views
An example of rewriting queries using views • Suppose you had two databases: • One has famous people and whether they are right or left handed • One has the birthdays of famous people • You want the birthdays of all of the lefties
Containment in rewriting • Query of q(X):-e(X,Y), e(Y,X) • View of v(A,B):- e(A,C),e(C,B)
A more complicated example • Q(x,u):-p(x,y),p0(y,z),p1(x,w),p2(w,u) • V1(a,b):-p(a,c),p0(c,b),p1(a,d) • V2(a,b):-p1(a,b) • V3(a,b):-p2(a,b)