270 likes | 361 Views
On Answering Queries in the Presence of Limited Access Patterns. Chen Li Stanford University joint work with Edward Chang, UC Santa Barbara. Harrison Ford. Air Force One. On Golden Pond. Oscar, Best Actor. Henry Fonda. On Golden Pond. On Golden Pond. Oscar, Best Actress. Kevin Spacey.
E N D
On Answering Queries in the Presence of Limited Access Patterns Chen Li Stanford University joint work with Edward Chang, UC Santa Barbara ICDT'2001, London, UK
Harrison Ford Air Force One On Golden Pond Oscar, Best Actor Henry Fonda On Golden Pond On Golden Pond Oscar, Best Actress Kevin Spacey American Beauty American Beauty Oscar, Best Picture … … … … A movie database r(Star, Movie) Q(Award) :- r(henry fonda,Movie), s(Movie,Award) s(Movie, Award)
Harrison Ford Air Force One On Golden Pond Oscar, Best Actor Henry Fonda On Golden Pond On Golden Pond Oscar, Best Actress Kevin Spacey American Beauty American Beauty Oscar, Best Picture … … … … Limited access patterns r(Star, Movie) Should provide a star. Should provide a movie. s(Movie, Award)
Harrison Ford Air Force One On Golden Pond Oscar, Best Actor Henry Fonda On Golden Pond On Golden Pond Oscar, Best Actress Kevin Spacey American Beauty American Beauty Oscar, Best Picture … … … … Answering Q given the restrictions Q(Award) :- r(henry fonda,Movie), s(Movie,Award) r(Star, Movie) s(Movie, Award)
Harrison Ford Air Force One On Golden Pond Oscar, Best Actor Henry Fonda On Golden Pond On Golden Pond Oscar, Best Actress Kevin Spacey American Beauty American Beauty Oscar, Best Picture … … … … The answer is complete • We did not retrieve all the tuples from the relations. • Still we computed all tuples in the answer to the query. r(Star, Movie) Q(Award) :- r(henry fonda,Movie), s(Movie,Award) s(Movie, Award)
Harrison Ford Air Force One On Golden Pond Oscar, Best Actor Henry Fonda On Golden Pond On Golden Pond Oscar, Best Actress Kevin Spacey American Beauty American Beauty Oscar, Best Picture … … … … Change the restriction • We cannot compute the complete answer to Q. • There can always be some tuples that are not retrievable. r(Star, Movie) Q(Award) :- r(henry fonda,Movie), s(Movie,Award) s(Movie, Award)
General questions • Given a query on relations with limited access patterns, can we compute its complete answer by accessing the relations with legal patterns? • Stable queries • Different classes of queries • Another problem studied: testing query containment in the presence of binding patterns.
Rest of the talk • Binding patterns, query stability • Testing stability of queries: • Conjunctive queries • Unions of conjunctive queries • Conjunctive queries with arithmetic comparisons • Datalog queries • Dynamic computability of complete answer to conjunctive queries • Conclusion and related work
(I) Binding patterns • Attributes with adornments: • b: bound • f: free • Example: r(Starb, Movief), s(Movieb, Awardf) • A relation can have multiple binding patterns.
Reasons of the restrictions: • Web search forms • Legacy databases • Security concerns • Observations: If a relation does not have an “all-free” binding pattern, then after certain queries are sent to this relation, there can always be some tuples that have not been retrieved.
Query stability • A query Q on relations with binding patterns is stable if for any database, we can compute Q’s complete answer by accessing the relations with legal patterns. • The complete answer is the computable answer if we could retrieve all the tuples from the relations. • Use partial tuples to derive the complete answer: we need reasoning.
Assumptions about bindings • Use values from Q and results from the relations as bindings: • The definition says “for any database” • Relations not in the query can be assumed to be empty • Not allowed: try arbitrary strings as bindings to access the relations • Does not terminate • Impractical
(II) Testing stability of queries Conjunctive query: q(X) :- g1(X1),…,gn(Xn) • Feasible order of some subgoals of a CQ Q. • Each subgoal in the order is executable • That is, we have enough bound variables to satisfy one binding pattern of the relation • Example: Q(Award) :- r(henry fonda,Movie), s(Movie,Award)
Feasible CQs • A CQ is feasible if it has a feasible order of all its subgoals. • Lemma: A feasible CQ is stable. • Testing feasibility of a CQ • A greedy algorithm: Inflationary
What if Q is not feasible? Q’(Award) :- r(henry fonda,Movie), s(Movie,Award),r(Star,Movie) • Not feasible: variable Star cannot be bound • Equivalent to the old query: Q(Award) :- r(henry fonda,Movie), s(Movie,Award) • The new query Q’is stable!
Testing stability of a CQ Theorem: A CQ Q is stable iff its minimal equivalentQm is feasible. • Minimal equivalent query Qm • Qm is unique
Database D1 Database D2 Main idea of the proof • Construct two databases of the relations • They have the same observable tuples, but yield different answers to the query • Thus, we cannot tell whether the computed answer is complete or not Same observable tuples Different answers to Q
Two algorithms for CQs • Algorithm CQStable • MinimizeQ, get its minimal equivalent Qm • Test feasibility of Qm by calling Inflationary • Algorithm CQStable* • Compute all executable subgoals of Q • If all subgoals become executable, then Q is stable • Otherwise, test equivalence between Q and the new query with the executable subgoals • CQStable* is more efficient thanCQStable • Testing stability of a CQ is NP-complete.
Other classes of queries • Unions of CQs: two algorithms • CQs with arithmetic comparisons: • An algorithm for the testing stability • Datalog queries: • Undecidable • Give a sufficient condition for stability of Datalog
(III) Dynamic computability of complete answer to CQs For a nonstable CQ Q, for certain database, Q’s complete answer might be computed.
An example Q1: ans(B) :- r(a,B,C),s(C,D) • Not stable • For the following database, we can still compute Q1’s complete answer: {b1,b2}. r(Ab, Bf, Cf) s(Cf, Db) p(Df) a b1 c1 c1 d1 d1 a b2 c2 c2 d2 d2 … … … a b2 c3 … … …
Change the head argument Q2: ans(D) :- r(a,B,C),s(C,D) • Still not stable • For the database, we cannot compute Q2’s complete answer. r(Ab, Bf, Cf) s(Cf, Db) p(Df) a b1 c1 c1 d1 d1 a b2 c2 c2 d2 d2 … … … a b2 c3 … … …
Difference between Q1 and Q2 b f f f b Q1: ans(B) :- r(a,B,C),s(C,D) Q2: ans(D) :- r(a,B,C),s(C,D) • Q1’s head argument B is bound by the executable subgoal r(a,B,C). • Q2’s head argument D is not bound by the executable subgoal r(a,B,C).
Generalization q(X) :- g1(X1), …, gk(Xk), gk+1(Xk+1), …, gn(Xn) • Executable subgoals: E = g1(X1),…, gk(Xk) • If all arguments in X are bound in E: • we might compute its complete answer. • The computability is database dependent. • If some arguments in X are not bound in E: • we can never compute its complete answer. • Unless the relation after the subgoals in Eis empty.
A decision tree • It guides the planning process of computing the complete answer to a query. • Two approaches while traversing the tree: • optimistic • pessimistic
Conclusion • Stability of queries with binding patterns • Various classes of queries: • CQs (two algorithms) • Unions of CQs (two algorithms) • CQs with arithmetic comparisons (one algorithm) • Datalog (undecidable) • Dynamic computability of a CQ’s complete answer • Another contribution: decidability result of testing relative query containment with binding restrictions
Related work • Answering queries using views with binding patterns [RSU95] • Query optimization [YLUGM99,FLMS99] • Computing maximal answer to queries [DL97,LC00] Our work considers whether the complete answer to a query is computable.