300 likes | 449 Views
Local as View: First steps. Introduction and an example Rewriting queries using views The Information Manifold system. Introduction and an example. LAV : local as view The sources are modeled as views, typically conjunctive, of the global (virtual) schema (why conjunctive?)
E N D
Local as View: First steps Introduction and an example Rewriting queries using views The Information Manifold system lav-i
Introduction and an example LAV: local as view The sources are modeled as views, typically conjunctive, of the global (virtual) schema (why conjunctive?) Advantage: adding / removing a source are local to the source, and do not impact the global schema (if sufficiently general) lav-i
A source is associated with a view definition; what are the assumptions on its contents? Sound: contains a subset of the view definition Complete: contains all the data in the definition (possibly more) Exact: contains exactly the data in the view definition The common assumption: sound views (fits the WWW environment) lav-i
Answering a conjunctive query: Rewrite the query in terms of the views Equivalent rewriting: the rewriting is equivalent to the query Contained rewriting: the rewriting is contained in the query Maximally contained rewriting: • contained in query, • not contained in another rewriting For sound views, it is reasonable to search for maximally contained rewritings , then take their union lav-i
Example: a university db (each attribute is explained when it first occurs) course(c#,ti) //c-number, title teaches(pr, c#, qu) // prof, quarter registered(st, c#, qu) //student major(st, dp) //dept advises(pr, st) Views: v0(p, s):- advises(p,s) v1(p,s, q) :- registered(s,c,q), teaches(p,c,q), q>= w97//winter97 lav-i
Q: select a.pr, a.st, r.qu from registered r, teaches t, advises a where r.c# = t.c# and r.qu = t.qu and a.pr = t.pr a.st = r.st and r.qu >=w98 As a conjunctive query : (with individual variables) jump q(p,s,q) :- registered(s,c,q), teaches(p,c,q), advises(p,s), q>=w98 Let Q’: q(p,s,q):- v1(p.s, q), v0(p,s), q>=w98 Expanding by the view defs: Q’’: q(p,s,q) :- registered(s,c,q), teaches(p,c,q), q>= w97 , advises(p,s), q>=w98 Q’’ is equivalent to Q, hence this is a good rewriting lav-i
Assume v1 is replaced by v2(s,q) :- registered(s,c,q), teaches(p,c,q), q>= w98 Can we answer the query? The expansion of v2(s,q), v0(p,s): q(p,s,q) :-registered(s,c,q), teaches(p’,c,q), q>= w98 , advises(p,s), We use p’ (rather than p), since it is an existential variable of v1 This is not equivalent to the query, nor contained in it Assume q was dropped from both v1 and Q, could we answer Q using v0 and v1? lav-i
Assume v1 is replaced by v3: select r.st, t.pr, r.qu from registered r, teaches t where r.c# = t.c# and r.qu >= win98 Convert to conjunctive! Can we answer the query? Interim summary: To be useable in a query rewriting, views have to • export variables that are subject to (arithmetic or join) conditions in the body of the query , or to contain such joins in their bodies, • treat query head variables as head variables of the view lav-i
Rewriting queries using views Scenario: We have a collection of view definitions V = {v1, v2, …, vn} Given a query Q, we ask: • is there a rewriting that uses the views? • If there is, how can it be computed? Applications: • Query optimization by using materialized views • Data integration lav-i
Example : Q: q(X, U) :- p(X, Y), r(Y, Z), s(X, W), t(W, U) V: v1(A,B) :- p(A,C), r(C,B), s(A, D) A partial rewriting: Q’: q(X, U) :- v1(X, Z), s(X, W), t(W, U) The atom s(A, D) in v1 does not replace s(X, W) in Q (and Q’) Given also v2(A,B) :- s(A, C), t(C, B), r(D, E) Q’’: q(X, U) :- v1(X, Z), v2(X, U) Is a complete equivalentrewriting Note: looking at Q’, it is not evident that by replacing the last two atoms with v2 we obtain an equivalent rewriting; v2 contains r(D, E), and Q’ does not contain r. lav-i
Classification: • Partial rewriting: only part of the query is replaced by views useful in query optimization, not in data integration • Complete rewriting: only views occur in it useful in both scenarios • Equivalent rewriting: yields the same answer on all db’s useful in both cases • Contained rewriting: its expansion is a contained query not interesting for query optimization, useful for integration • Maximal contained rewriting: contained in query, not in another rewriting We assume (unless stated otherwise) conjunctive queries and views, no b.i. preds lav-i
Assume a query Q q(..):- Qbody is given. A view is usable in an equivalent rewritingif it occurs in some partial equivalent rewriting of Q Claim 1 : jump That is, if v(D) is empty, for some db D, then so is Q(D) Proof: If the containment does not hold, then there is a D s.t. v(D) is empty, Q(D) is not, so the rewriting is not equivalent If it holds, then Q’: q(..):- Qbody, v(Y)(Y new vars) is an equivalent partial rewriting (bold letters – vectors of variables/constants) lav-i
Corollary: checking for usability is NP-complete Corollary : if v1,…, vk are usable, then Q’: q(..):- Qbody, v1(Y1), …, vk(Yk)(Yi new and distinct) is an equivalent rewriting Apply the containment map hi to each v(Yi), obtaining Q’’ that is contained in Q’ (and doesnot introduce any new vars) Q’’ contains Q, hence Q, Q’ Q’’ are equivalent Q’’ still contains all atoms of bodyQ; some of these may be removed by standard minimization; then some views may be removed How many must be left? lav-i
Claim 2 : jump If Qbody has n atoms, then a minimized, equivalent or contained, rewriting Q’ contains at mostn (view or regular) atoms Proof: consider the expansion exp(Q’), and the containment mapping from Q to exp(Q’); its image has at most n atoms Q: q(..) :- p1(..), pk(..) pk+1(..) pn(..) Q’: q(..):- p1,(..), , pk(..), v1(..), vm(..) r1,1 .. r1k1 rm,1 .. rmk1 lav-i
Claim 3: The problem: is there a complete equivalent rewriting is NP-complete Proof: given v, construct v’: Qhead(..):- Qbody, vbody Then v’ is usable (NP-complete) iff v’ is a complete rewriting Claim 4: The problem above is NP-complete even if the query and the views do not contain repeated predicates in their bodies (which simplifies the search for containment mappings) And also if we are looking for a complete contained rewriting Proof: next page lav-i
Exact cover by 3-sets : (NP-complete problem) Given s={e1,…,en} and sets s1,…,sk, each of size 3, is there a cover of s by a subset of the {sj} where each element occurs in just one set? A reduction from the above to finding a complete rewriting: The query Q: q( ) :- p1(A1,1,…,A1,k),…, pn(An,1,…, An,k) Note: since each sj contains 3 elements, Sj occurs in 3 atoms – this is a join condition in the query body The view vj (for sj): the 3 atoms (as above) for the elements of sj, and a head that contain all the Sm that occur in its body, but not Sj (in the body, Sj is existential) lav-i
Assume a complete contained rewriting Q’ exists There is a containment mapping h : body(Q) body( exp(Q’)) Note: Sj in exp(vj) is renamed to a new var, say Gj, that does not occur in expansion of other views, since it is existential If h maps pi(..) from Q to the expansion of vj, then the expansion contain pi, so and h(Sj)=Gj the atoms pm, pq in Q contain Sj And the body of vj contains pi, pm, pq, with Gj instead of Sj Since h(Sj) = Gj, h maps pm and pq from Q also to exp(vj) (Since vj does not export Sj, the join condition on Sj can be satisfied only if all three atoms pi, pm, pq are mapped by h to vj) The views with expansions in image(h) provide an exact cover lav-i
The other direction (if an exact cover exists, it gives a complete rewriting) - left for you Also: a complete contained rewriting in this case is always an equivalent rewriting lav-i
Comments: • For contained rewritings, the characterization (claim1, p.12) of usability does not hold (even if v(D) is empty on some D, and Q(D) is not, v may be useful for a contained rewriting) • Claim 2 (p. 14) holds also for contained rewritings : search for rewritings is restricted by size of query body but: if db satisfies functional dependencies, the size bound fails Example: Database: a single relation e(X, Y, Z), with fd: X Y Query Q: q(X, Y, Z):- e(X, Y, Z) Views: v1(X,Y):- e(X, Y, Z),v2(X,Z):- e(X, Y, Z) A complete rewriting, of minimal size : Q’: q(X, Y, Z):- v1(X, Y), v2(X, Z) lav-i
If the query and views contain b.i. predicates (comparisons) : Let Q’ be Q w/o the comparisons If Q’’ is a complete rewriting for Q’, just add to it the comparisons in Q (with variables suitably renamed), to obtain a complete contained rewriting for Q (or a contradiction!) Example: from p. 6 Views:v0(p, s):- advises(p,s) v1(p,s, q) :- registered(s,c,q), teaches(p,c,q), q>= w97 A query: q(p,s,q) :- registered(s,c,q), teaches(p,c,q), advises(q,s), q>=w98 A rewriting: q(p,s,q):- v1(p.s, q), v0(p,s), q>=w98 Can we change the example to obtain a contradiction? lav-i
The Information Manifold system (IM) A LAV system, implemented in Bell Labs around 94-96, supported about 100 WWW sources Main ideas: • Rewriting queries using views for answering queries • Using fine-grained descriptions of sources to eliminate irrelevant sources • Support for restricted-capability sources From now, rewriting always means complete, contained lav-i
Rewriting views using views in IM – the bucket algorithm: The goal: Finding rewritings is difficult in worst case But good heuristics often work well in practice Outline: • Find for each query atom – a subgoal, views that may be targets of a mapping from it; put them in a bucket for the subgoal (these are candidates) (hopefully, this filters out many candidates) • Combine views, one from each bucket; test if • there is a containment mapping; • adding constraints yields a satisfiable query on the views • Minimize each rewriting, eliminate those contained in others • Take the union lav-i
Step 1 – computing the bucket for a query atom : Assume Q: q(X) :- p1(U1), …, pn(Un), C(Q) We add h(vj) to bucket(pi(Ui)) if: • vj’s definition contains an atom pi(Y) ; // if vj is used, then possibly a containment mapping sends pi(Ui) to pi(Y) • if the k’th var of Ui is a head var of Q, then the k’th var of Y is a head var of vj (this var is needed for the query result) • h renames variables of vj and (some vars of) Q as follows: • If y, k’th in Y, is a head var of vj, then rename to k’th var of Ui // this may be a head var of Q, or used in a join • Otherwise, h(y) is a new distinct var • h(pi(Ui)) = h(pi(Y)) • h(C(Q)) and h(C(vj)) is satisfiable lav-i
Example (same university db) : Views: v1(s, c, q, t) :- registered(s, c, q), course(c, t), c>=500, q>=a98 v2(s, p, c, q) :- registered(s, c, q), teaches(p, c, q) v3(s, c) :- registered(s, c, q), q<=a94 v4(p, c, t, q) :- registered(s, c, q), teaches(p, c, q), course(c, t), q<=a97 Query: q(s, p, c) :- registered(s, c, q), teaches(p, c, q), course(c, t), c>=300, q>=a95 Bucket for registered(s, c, q): v1(s, c, q, t1), v2(s, p1, c, q) (but not v3 – for one(two?) reasons, not v4 – for one reason) Bucket for teaches(p, c, q):v2(s1, p, c, q), v4(p, c, t2, q) Bucket for course(c, t): lav-i
Step 2 – combining views, testing for satisfiability of rewriting : Example (cont’d): Combining 1st element of each bucket: Q1: q(s, p, c) :- v1(s, c, q, t1), v2(s1, p, c, q), v1(s2, c, q1, t) Minimize by s2s, cc, q1q, t t1 (3rd atom is removed) Q1’: q(s, p, c) :- v1(s, c, q, t1), v2(s1, p, c, q) Expand: v1: registered(s, c, q), course(c, t1), c>=500, q>=a98, v2: registered(s, c, q), teaches(p, c, q) With query constraints (under containment mapping)c>=300, q>=a95, this is satisfiable, so we have a contained rewriting lav-i
Another combination: Q2: q(s, p, c) :- v1(s, c, q, t1), v4(p, c, t2, q) , v4(p2, c, t, q2) Can minimize, remove 3rd atom Expansion: v1: registered(s, c, q), course(c, t1), c>=500, q>=a98 v4 : registered(s, c, q), teaches(p, c, q), course(c, t2), q<=a97 The conjunction is unsatisfiable (conditions in two views arecontradictory), this is not a rewriting Taking the union of all the rewritings that pass the filters, we obtain (in this example) a maximally contained rewriting lav-i
Another example: The database contains a single relation : flight(from, to, carrier) The views are v1(F, T) :- flight(F, T, wn) // wn is Southwest airlines v2(F, T) :- flight (F, T, ua) // United airlines v3(F, T, C) :- flight(F, Z, C), flight(Z, T, C) A user wants to fly from Tucson to S.F, w/o changing the airline, and with at most one stop Q: q(C) :- flight(tus, sfo, C) q(C) :- flight(tus, Z, C), flight(Z, sfo, C) lav-i
Buckets are computed for each sub-query: (an extension!) For flight(tsu, sfo, C): v1(tus, sfo),v2(tus, sfo),v3(tus, T1, C), v3(F1, sfo, C) Each of these gives a candidate rewriting for q, of which only q1(wn) :- v1(tus, sfo) q2(ua) :- v2(tus, sfo) remain lav-i
An alternative presentation of the bucket (p. 23) Step 1 – computing the bucket for a query atom : Assume Q: q(X) :- p1(U1), …, pn(Un), C(Q) We add h(vj) to bucket(pi(Ui)) if: • vj’s definition contains an atom pi(Y) & pi(U) and pi(Y) unify // if vj is used, then possibly a containment mapping sends pi(Ui) to pi(Y) • Some more conditions (next page) are satisfied Thus, if vj’s body contains pi twice, it may be added to the bucket (at most) twice For h, see next page lav-i
The additional conditions & the definition of h: • (condition) if the k’th var of Ui is a head var of Q, or a join var of Q, then the k’th var of Y is a head var of vj (this var is needed for the query result or for the join) (the condition on join vars does not occur in Alon’s survey paper!?) • h renames variables of vj as follows: • If y, k’th in Y, is a head var of vj, then rename to k’th var of Ui • Otherwise, h(y) is a new distinct var • h(pi(Ui)) = h(pi(Y)) this is simply a renaming of the variables of vj • (condition) Since pi(Y) and pi(U) are unifiable, h can be extended to (some variables of) Q, so that h(pi(Ui)) = h(pi(Y)) • (condition) h(C(Q)) and h(C(vj)) is satisfiable (h is defined on some variables of C(Q), so it maps part of C(Q); that part should be consistent with h(C(vj)) ) lav-i