280 likes | 403 Views
Minimizing View Sets without Losing Query-Answering Power. Chen Li Stanford University joint work with Mayank Bawa and Jeff Ullman. A web-caching scenario. user query. Client. cache. source query. answer. Server. Client. Cached query results: Q1(T,A,Pr) :- book(T,A,Pub,Pr)
E N D
Minimizing View Sets without Losing Query-Answering Power Chen Li Stanford University joint work with Mayank Bawa and Jeff Ullman ICDT'2001, London, UK
A web-caching scenario user query Client cache source query answer Server
Client Cached query results: Q1(T,A,Pr) :- book(T,A,Pub,Pr) Q2(T,A,Pr) :- book(T,A,prenhall,Pr) Q3(A1,A2) :- book(T,A1,prenhall,Pr1), book(T,A2,prenhall,Pr2) Source relation: Book(Title, Author, Pub, Price)
What query results to remove? Book(Title, Author, Pub, Price) Cached query results: Q1(T,A,Pr) :- book(T,A,Pub,Pr) Q2(T,A,Pr) :- book(T,A,prenhall,Pr) Q3(A1,A2) :- book(T,A1,prenhall,Pr1), book(T,A2,prenhall,Pr2) • Q2 Q1 • Remove Q2? Cannot answer query: • Q(T,Pr) :- book(T,smith,prenhall,Pr)
How about removing Q3? Book(Title, Author, Pub, Price) Cached query results: Q1(T,A,Pr) :- book(T,A,Pub,Pr) Q2(T,A,Pr) :- book(T,A,prenhall,Pr) Q3(A1,A2) :- book(T,A1,prenhall,Pr1), book(T,A2,prenhall,Pr2) Compute Q3 using Q2: Q3(A1,A2) :- Q2(T,A1,Pr1),Q2(T,A2,Pr2) We are not losing any query-answering power!
Observations: • Traditional query-containment does not help [Chandra and Merlin, 1977] . • We should consider query-answering power. • General questions: • How to describe “query-answering power”? • How to minimize a view set without losing its query-answering power?
Rest of the talk • Answering queries using views • Query-answering power • p-containment • Relationship with traditional query containment • Minimizing a view set • p-containment relative to a set of queries • Conclusion and open problems
Answering queries using views • Conjunctive queries and views: h(X) :- g1(X1),…,gn(Xn) • Example: V1(T,A,Pr) :- book(T,A,Pub,Pr) V2(T,A,Pr) :- book(T,A,prenhall,Pr) V3(A1,A2) :- book(T,A1,prenhall,Pr1), book(T,A2,prenhall,Pr2)
Query answerability • A query Q is answerable by a view set V if we can rewrite Q using views in V [LMSS95]. • Example: V2(T,A,Pr) :- book(T,A,prenhall,Pr) V3(A1,A2) :- book(T,A1,prenhall,Pr1), book(T,A2,prenhall,Pr2) V3 is answerable by V2: V3(A1,A2) :- V2(T,A1,Pr1),V2(T,A2,Pr2)
Algorithms • Bucket algorithm [LRO96] • Inverse-rule algorithm [DG97,Qia96] • MiniCon algorithm [PL00] • SVB algorithm [Mit99] • CoreCover Algorithm [ALU00] Testing whether a query is answerable by a set of views is NP-complete.
Views are expensive to maintain • Require storage space. • Need to be kept up-to-date. We want to minimize a given view set while keeping its query-answering power.
p-containment • A view set V is p-contained in another view set W if W can answer all the queries that are answerable by V. • “p” stands for “power.” • Denoted: V p W • Two view sets are equipotent,if V p W and Wp V. • They have the same power to answer queries.
Example: V1(T,A,Pr) :- book(T,A,Pub,Pr) V2(T,A,Pr) :- book(T,A,prenhall,Pr) V3(A1,A2) :- book(T,A1,prenhall,Pr1), book(T,A2,prenhall,Pr2) {v1,v2,v3}p {v1,v2} {v1,v2} p {v1,v2,v3} Therefore: {v1,v2,v3} and {v1,v2} are equipotent.
Lemma: V p W iff each view in V can be answered by W. • Implies an algorithm for testing p-containment. • Assuming view sets are finite. • Theorem: Testing V p W is NP-complete.
p-containment and query containment V1(T,A,Pr) :- book(T,A,Pub,Pr) V2(T,A,Pr) :- book(T,A,prenhall,Pr) V3(A1,A2) :- book(T,A1,prenhall,Pr1), book(T,A2,prenhall,Pr2) • Query containment does not imply p-containment {v1} and {v2} • p-containment does not imply query containment {v2} and {v3}
Minimizing a view set • Keep removing views from the view set while retaining the equipotence. • Might have multiple equipotent minimals V1(A) :- r(A,B) V2(B) :- r(A,B) V3(A,B) :- r(A,X),r(Y,B) {V1,V2,V3} has two equipotent minimals: {V1,V2}, {V3}
p-containment relative to queries Queries: Q={Q1,Q2,…} V = {V1,V2,…,Vm} W = {W1,W2,…,Wn} V is p-contained in W w.r.t. Q if the queries in Q that are answerable by V are also answerable by W.
Example of relative p-containment Relations: car(Make,Dealer) loc(Dealer,City) Queries: Q1(D,C) :- car(toyota,D),loc(D,C) Q2(D,C) :- car(honda,D), loc(D,C) Views: V = {V1,V2}, V1 = Q1, V2 = Q2 W = {W1} W1(M,D,C) :- car(M,D),loc(D,C)
Testing relative p-containment • Q is finite: test by the definition. • Q is infinite?
Parameterized queries • Motivation: web search forms. • A PQ is a conjunctive query with placeholders. • Example: q(D) :- car($M,D),loc(D,$C) • Placeholders $M,$C, replaced by constants • Instances: q(D) :- car(toyota,D),loc(D,sf) q(D) :- car(honda,D),loc(D,pa) • The domain of each placeholder is infinite. • Thus, represent infinite number of queries.
Q: q(D) :- car($M,D),loc(D,$C) • v1(M,D,C) :- car(M,D),loc(D,C) • Answer all instances of Q. • v2(M,D) :- car(M,D),loc(D,sf) • Answer some instances of Q. • Answerable instances of Q are instances of: q(D) :- car($M,D),loc(D,sf) • v3(M) :- car(M,D),loc(D,sf) • Answer no instances of Q.
Assume queries are generated by one PQ; • Results easily extendable to the case with finite set of PQs. • Complete answerability of a PQ using views • V can answer all instances of a PQ Q. • Example: q(D) :- car($M,D),loc(D,$C) v1(M,D,C) :- car(M,D),loc(D,C)
An algorithm for testing complete answerability • Replace each placeholder with a new distinct constant, get a canonical instance I; • Test if I is answerable by V. Example: PQ: q(D) :- car($M,D),loc(D,$C) View: v1(M,D,C) :- car(M,D),loc(D,C) Canonical instance: q(D) :- car(m0,D),loc(D,c0) Rewriting: q(D) :- v1(m0,D,c0)
Partial answerability • Some instances of Q are answerable by V q(D) :- car($M,D),loc(D,$C) v2(M,D) :- car(M,D),loc(D,sf) • Theorem: All the answerable instances of a PQ using V are instances of a finite set of PQs, s.t. each of them is completely answerable by V. q(D) :- car($M,D),loc(D,sf)
All instances of Q answerable instances PQ1 PQ2 a parameterized query Q … PQk V={V1,…,Vn} An algorithm for finding the finite set of PQs.
Testing p-containment w.r.t. PQ • Find the PQs whose instances are all the instances of Q that are answerable by V. • For each of the PQs, test if it is completely answerable by V. • Details are in the paper.
Conclusion • Introduced p-containment, which is different from query containment. • Showed how to minimize a view set without losing query-answering power. • Developed an algorithm for testing relative p-containment w.r.t. instances of PQs. • Extended to MCR-containment.
Open problems • Find a view subset with lowest “cost.” • If views are not given, find the best views to materialize.