170 likes | 272 Views
CSE 636 Data Integration. Answering Queries Using Views Bucket Algorithm. The Bucket Algorithm. Each subgoal g of Q must be “covered” by some view Make a list of candidates (buckets) per query subgoal Consider combinations of candidates from different buckets Not all combos are “compatible”
E N D
CSE 636Data Integration Answering Queries Using Views Bucket Algorithm
The Bucket Algorithm • Each subgoal g of Q must be “covered” by some view • Make a list of candidates (buckets) per query subgoal • Consider combinations of candidates from different buckets • Not all combos are “compatible” • Keep the compatible ones and minimize them • Discard the ones contained in another • Take their union
The Bucket Algorithm q(X,Y,R) :- ForSale(X,Y,C,”auto”), Review(X,R,”auto”), Y > 1985 Step 1: For each subgoal, put the relevant sources into a bucket: V1(name, year) :- ForSale(name, year, “France”, “auto”), year > 1990 would be relevant V3(name, year) :- ForSale(name, year, “France”, “cheese”) would be irrelevant Step 2: Take the Cartesian product of the buckets Algorithm produces maximally contained rewriting Ignores interactions between subgoals in Step 1
The Bucket Algorithm: Example V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98 V2(Std,Prof,Crs,Qtr):- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr) V3(Std,Crs):- reg(Std,Crs,Qtr), Qtr ≤ Aut94 V4(Prof,Crs,Title,Qtr):- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr ≤ Aut97 q(S,C,P):- teaches(P,C,Q), reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95 Step 1: For each query subgoal, put the relevant sources into a bucket
The Bucket Algorithm: Example V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98 V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr) V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94 V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr ≤ Aut97 q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95 PProf, CCrs, QQtr Note: Arithmetic predicates don’t pose a problem Buckets teaches reg course V2 V4
The Bucket Algorithm: Example V1(Std,Crs,Qtr,Title):- reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98 V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr) V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94 V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr ≤ Aut97 q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95 SStd, CCrs, QQtr Note: V3 doesn’t work: arithmetic predicates not consistent V4 doesn’t work: S not in the output of V4 Buckets teaches reg course V2 V1 V4 V2
The Bucket Algorithm: Example V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98 V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr) V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94 V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr ≤ Aut97 q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95 CCrs, TTitle Buckets teaches reg course V2 V1 V1 V4 V2 V4
The Bucket Algorithm: Example • Step 2: • Try all combos of views, one each from a bucket • Test satisfaction of arithmetic predicates in each case • e.g., two views may not overlap, i.e., they may be inconsistent • Desired rewriting = union of surviving ones • Query rewriting 1: • q1(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’), V1(S”,C,Q’,T) • no problem from arithmetic predicates (none in V2) • May or may not be minimal (why?) teaches reg course V2 V1 V1 V4 V2 V4
The Bucket Algorithm: Example • Unfolding of rewriting 1: • q1’(S,C,P) :- r(S’,C,Q), t(P,C,Q), r(S,C,Q), c(C,T’), r(S”,C,Q’), • c(C,T), C ≥ 500, Q ≥ Aut98, C ≥ 500, Q’ ≥ Aut98 • Black r’s can be mapped to green r:S’S, S”S, Q’Q • Black c can be mapped to green c: just extend above mapping to TT’ • Minimized unfolding of rewriting 1: • q1m’(S,C,P) :- t(P,C,Q), r(S,C,Q), c(C,T’), C ≥ 500, Q ≥ Aut98 • Minimized rewriting 1: • q1m(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’)
The Bucket Algorithm: Example teaches reg course • Query Rewriting 2: • q2(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’), V4(P’,C,T,Q’) • q2’(S,C,P) :- r(S’,C,Q), t(P,C,Q), r(S,C,Q), • r(S,C,Q), c(C,T’), C ≥ 500, Q ≥ Aut98, • r(S”,C,Q’), c(C,T), t(P’,C,Q’), Q’ ≤ Aut97 • This combo is infeasible: consider the conjunction of arithmetic predicates in V1 and V4 • Query rewriting 3: • q3(S,C,P) :- V2(S’,P,C,Q), V2(S,P’,C,Q), V4(P”,C,T,Q’) V2 V1 V1 V4 V2 V4 teaches reg course V2 V1 V1 V4 V2 V4
The Bucket Algorithm: Example • Unfolding of rewriting 3: • q3’(S,C,P) :- r(S’,C,Q), t(P,C,Q), r(S,C,Q), t(P’,C,Q), r(S”,C,Q’), • c(C,T), t(P”,C,Q’), Q’ ≤ Aut97 • The green subgoals can cover the black ones under the mapping: S’S, S”S, P’P, P”P, Q’Q • Minimized rewriting 3: • q3m(S,C,P) :- V2(S,P,C,Q), V4(P,C,T,Q) • Verify that there are only two rewritings that are not covered by others • Maximally Contained Rewriting: • q’ = q1m q3m
The Bucket Algorithm: Example 2 Query: q(X) :- cites(X,Y), cites(Y,X), sameTopic(X,Y) Views: V4(A) :- cites(A,B), cites(B,A) V5(C,D) :- sameTopic(C,D) V6(F,H) :- cites(F,G), cites(G,H), sameTopic(F,G) Note: Should we list V4(X) twice in the buckets? Buckets cites cites sameTopic V4 V4 V5 V6 V6 V6
The Bucket Algorithm: Example 2 • Consider all combos & check for containment of the unfolded rewriting in Q • V4(X) cannot be combined with anything (why?) • Try q1(X) :- V4(X), V4(X), V5(X,Y) • Try q2(X) :- V4(X), V6(X,Y), V5(X,Y) • Does any of these work? • When can we discard a view from consideration?
The Bucket Algorithm: Example 2 • Here is a successful rewriting: • q3(X) :- V6(X,Y), V6(X,Y), V6(X,Y) • By itself is not contained in Q • But, with subgoal X=Y added, it is! • By minimizing the rewriting, we get: • q3m(X,Y) :- V6(X,X)
The Bucket Algorithm: Example 2 • Remarks: • V4 didn’t contribute to any rewrite, but the bucket algorithm doesn’t recognize it ahead • Consider:q2(X,Y) :- cites(X,Y), cites(Y,X) • Then both cites predicates can be folded into V4 • Not recognized by the bucket algorithm
The State of Affairs • Bucket algorithm: • deals well with predicates, Cartesian product can be large (containment check required for every candidate rewriting) • Inverse rules: • modular (extensible to binding patterns, FD’s) • no treatment of predicates • resulting rewritings need significant further optimization Neither scales up • The MINICON algorithm: • change perspective: look at query variables
References • Querying Heterogeneous Information Sources Using Source Descriptors • By Alon Y. Levy, Anand Rajaraman and Joann J. Ordille • VLDB, 1996 • Laks VS Lakshmanan • Lecture Slides • Alon Halevy • Answering Queries Using Views: A Survey • VLDB Journal, 2000 • http://citeseer.ist.psu.edu/halevy00answering.html