1 / 19

Views as Incomplete Databases – Certain & Possible Answers

Views as Incomplete Databases – Certain & Possible Answers. Views – an incomplete representation Certain and possible answers Complexity results for certain answers. Views – an incomplete representation. Given : a view def V, view extension I Sound V: I is contained in V(D)

fahim
Download Presentation

Views as Incomplete Databases – Certain & Possible Answers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Views as Incomplete Databases – Certain & Possible Answers Views – an incomplete representation Certain and possible answers Complexity results for certain answers certain

  2. Views – an incomplete representation Given: a view def V, view extension I Sound V: I is contained in V(D) Complete V: I contains V(D) Precise V: I = V(D) V may also be mixed: some views are sound, others are complete In general, more than one db D may exist s.t. certain

  3. Example : teams in World Cup Soccer Tournament Global scheme : Team(country, group) (gr – assignment for 1st round) Source1: S-C(C) – the countries that participate Source2 : S-Q(C) -- countries that participated in qualifying games Source3 : S-T(C) – teams whose games will be on T.V For all three, the logical mapping is v(X) :- Team(X, Y) certain

  4. Given V (including a specification in s/c/p) and I poss(V,I) = {D | D is a db for which I is a possible view} Since we have only the views, this is the set of possible databases. For sound views : an infinite set For complete views : contains the empty db For precise views : may be empty -- inconsistent views Example : v1(X, Y) :- R(X, Y, Z), v1={(a, b), (b, c)} v2(X,Z) :- R(X, Y, Z), v2={(a, d), (c, e)} * The above changes when the global db is known to satisfy constraints (e.g. keys) certain

  5. Certain and possible answers Now, assume also a query Q cert(Q, V, I) – seems easier to compute, always finite poss(Q, V, I) – may be infinite and where do we obtain values not in I? A possible approach: a finite representation of a possibly infinite family of partially unknown databases certain

  6. We concentrate on certain answers -- an absolute notion of answering queries using views Cert(Q, V, I) depends on soundness/completeness of views Example : global : p(x, y) v1(x) :- p(x, y), v2(y):- p(x, y) I = {v1(a), v2(b)} Q: q(x, y) :- p(x, y) Sound views : cert(Q, V, I) is empty Precise views : cert(Q, V, I) is {(a, b)} certain

  7. An issue in query processing : For same example, let Q’ : s(x) :- p(x, y) To allow relational algebra manipulation of certain answers, we need more than a simple relational representation! We need algorithms for performing operations on representations of partially unknown db’s (not in this course) certain

  8. From now : sound views, certain answers Was investigated for views defined in L1, query defined in L2, where L1, L2 in {CQ, CQ!=, NR-Datalog, Datalog, FO} Results include: • Complexity – lower bounds • Algorithms – upper bounds certain

  9. Complexity results for certain answers Thm : for V in L1 , Q in L2, the following are equivalent: (a) computing cert(Q, V, I) (b) deciding containment: is Q1 (in L1) contained in Q2 (in L2)? • (a) is decidable iff (b) is • When decidable, combined complexity of (a) = query complexity of (b) • data complexity of (a) <= query complexity of (b) [ Data complexity: function of db size Query complexity: function of query size Combined : both ] certain

  10. Proof (sketch) :  given t, how hard to decide if t is in cert(Q, V, I)? Let I = {vi(tij)}, define Q’ by Q’ contains the rules that define V, and one more “large” rule: (t follows from facts in I) Claim: Hence deciding if t in cert(Q, V, I) is no harder than this containment (Note: for L1 = CQ, need to “massage” Q’ into CQ) certain

  11. How hard to check containment of Q1 in Q2? let p be a new predicate Define V by: rules of Q1, and v(c) :- q1(X), p(X) , let I = {v(c)} Define Q by: rules of Q2 , and q(c) :- q2(X), p(X) Then: (c) is in cert(Q, V, I) iff Q1 is contained in Q2 certain

  12. Consequences : computing certain answers (depends on L1,L2) Is: undecidable for Datalog, FO decidable if: one side <= datalog, other side <= nr-datalog For decidable cases, the above gives combined complexity, We are interested more in data complexity; here it is Co-NP data complexity is bad: impractical to compute, no datalog plan! We will not prove co-NP complexity results same certain

  13. Claim : For Q in Datalog, V in CQ(!=), let V~ be the same view def, with inequalities omitted Then cert(Q, V, I) = cert(Q, V~, I) (Computing the certain answers from I using V w/o the inequalities gives same results) Proof : (b) If t is in cert(Q, V~, I), then for each D in poss(V~, I), t in Q(D) If D also in poss(V, I) -- fine If D not in poss(V, I), exists larger D’ in poss(V, I) s.t. t is in Q(D’) Hence, t is in cert(Q, V, I) certain

  14. Proof of last claim: some s in I, but s not in V(D), because of some inequality Since s is in V(D’’), inequality involves attribute in view body • can add some tuples to D so obtain D1, s.t. s is in V(D1) • adding for all such s gives D’ that contains D, s.t. D’ is in poss(V, I) • If t in Q(D’), since Q has no inequalities, t also in Q(D) certain

  15. For CQ views, Datalog queries, Query plan: datalog program P on V exp(P) – replace views by their definitions (using fresh names for existential variables) P is maximally-contained in Q: • exp(P)(D) is contained in Q(D) • exp(P’)(D) is contained in ep(P)(D) for all other plans P’ Such a plan is best among all plans (This is a language-dependent notion – given a more expressive language, P may not be best any more) But, if a plan delivers cert(Q, V, I) it is absolutely best certain

  16. Thm : For CQ sound views, Datalog queries, the inverse rules algorithm computes cert(Q, V, I) (Thus, for this case, a Datalog query plan can give the absolute best possible answer) Corollary: If P is max-cont(Q) then, for all view instances, I P(I) = cert(Q, V, I) we proceed to prove the theorem certain

  17. Def: A tableau is a collection of atoms, with constants and variables A tableau T represents a db D: there is a valuation from T into D Rep(T) = {D | for some h, D contains H(T) } certain

  18. Claim : For a Datalog query Q, tableau T cert(Q, rep(T)) = the tuples w/o variables in Q(T) Proof : • Can consider only D in rep(T) s.t. D = h(T) every tuple in Q(D’) but not in Q(D) where D’ is larger than h(T) is not in cert(Q, rep(T)) (b) For such D, h(Q(T)) = Q(D)  a ground tuple in Q(T) is in cert(Q, rep(T)) (c) For a non-ground t tuple in Q(T), can find D1, D2 in rep(T) that give different values to variables in t  no instance of this tuple is in cert(Q, rep(T)) certain

  19. The inverse rules of V create from a view I a database with elements that are skolem functions. Consider each skolem term to be a distinct variable • This is a tableau T(V, I) Claim : T(V, I) represents poss(V, I) Proof : easy Corollary : is cert(Q, V, I) This is precisely what the inverse rule algorithm produces: For each I, the inverse rules produce T(V, I), then apply Q end of story Next: one more (last) algorithm, for CQ queries and views, that is fastest so far certain

More Related