430 likes | 536 Views
Materializing Views With Minimal Size To Answer Queries. Rada Chirkova (North Carolina State University) and Chen Li (University of California, Irvine). Materializing Minimal-Size Views. Context: relational databases
E N D
Materializing ViewsWith Minimal SizeTo Answer Queries Rada Chirkova (North Carolina State University) and Chen Li (University of California, Irvine)
Materializing Minimal-Size Views • Context: relational databases • The problem: minimize the amount of data required to answer queries, by: • automatically designing new relations (views), and • precomputing and storing (materializing) the new relations • Central issue: inventing new views to materialize • Applications include: • Mediators in data-integration systems • “Database as a service” in enterprise computing Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 2
Example: Modified TPC-H Query Q(name,o_date,priority,comment,o_key,quantity, shipmode) :- customer(c_key,name,’building’), order(o_key,c_key,o_date,priority,comment), lineitem(lineno,o_key,quantity,shipmode). V1(name,o_date,priority,comment,o_key) :- customer(c_key,name,’building’), order(o_key,c_key,o_date,priority,comment), lineitem(lineno,o_key,quantity,shipmode). V2(o_key,quantity,shipmode) :- customer(c_key,name,’building’), order(o_key,c_key,o_date,priority,comment), lineitem(lineno,o_key,quantity,shipmode). Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 3
Partial Answer to the Query Q Name O_Date Priority Comment O_Key Quantity Shipmode Tom 3/14/95 0 close… 134721 26 REG AIR Tom 3/14/95 0 close… 134721 75 REG AIR Tom 3/14/95 0 close… 134721 43 AIR Jack 12/21/94 0 final… 571683 43 MAIL Jack 12/21/94 0 final… 571683 33 AIR Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 4
Minimal-Size Views for the Query Q Q(name,o_date,priority,comment,o_key,quantity, shipmode) :- customer(c_key,name,’building’), order(o_key,c_key,o_date,priority,comment), lineitem(lineno,o_key,quantity,shipmode). V1(name,o_date,priority,comment,o_key) :- customer(c_key,name,’building’), order(o_key,c_key,o_date,priority,comment), lineitem(lineno,o_key,quantity,shipmode). V2(o_key,quantity,shipmode) :- customer(c_key,name,’building’), order(o_key,c_key,o_date,priority,comment), lineitem(lineno,o_key,quantity,shipmode). Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 5
Questions • How do we know that views V1 and V2 are minimal-size views for the query Q? On what databases? • How to find a set of minimal-size views, given a set of queries and a database: • Is the problem decidable? For what inputs? • What is the complexity of the problem? • Are there good efficient algorithms for finding minimal-size views? Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 6
Preliminaries • Two queries are equivalent if they return the same answers on any database. • An equivalent rewriting of a query Q in terms of views V is a query that: • is defined using the relations in V only, and • is equivalent to Q • A conjunctive query (view) can be defined using only equality selections, projections, and joins • A disjunctive query (view) can be defined as a union of a finite number of conjunctive queries (views) Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 7
Problem Specification • Input: • Database instance D with schema R • Workload Q of queries on D • Output (optimal solution): a set Vof views, such that: • each query in Q has an equivalent rewriting in terms of V, and • the total size of the views, SViÎ Vsize(Vi), is minimalon D Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 8
Assumptions • Single database instance • Set semantics • Finite query workloads • Conjunctive queries • Disjunctive views and rewritings Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 9
Main Results • Decidability and upper bounds on the complexity of the problem • Relationship between: • a restriction on the language of the queries, and • the language of optimal views • Dynamic-programming algorithm for finding an optimal solution for conjunctive queries (restricted case) Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 10
Conjunctive Views and Rewritings Theorem. Given a query workload Q and a database D. It is possible to construct a finite search space of views that includes all views in all optimal solutions for Q on D. The number of views in the search space is at most doubly-exponential in the size of the input query workload Q. Corollary. The problem of finding a minimal-size conjunctive viewset is decidable for finite workloads of conjunctive queries, assuming all rewritings are conjunctive. Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 11
Self-Joins in Queries Q1(X,Y) :- p(X,Z), p(Z,T), s(Z,Y). // self-join Q2(X,Y) :- p(X,Z), r(Z,T), s(Z,Y). // no self-joins • Result 1. For some databases and queries, there is a set of disjunctive views that is better than any conjunctive solution. • Example for a single query with self-joins • Result 2. The problem of finding an optimal solution in the space of disjunctive views is decidable, assuming conjunctive rewritings. • Result 3. It is not necessary to consider disjunctive rewritings. • Result 4. The size of the search space of views is at most triply-exponential in the size of the input query workload. Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 12
Queries Without Self-Joins: The Problem Is in NP Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 13
Queries Without Self-Joins: The Problem Is in NP disjunctive views Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 13
Queries Without Self-Joins: The Problem Is in NP disjunctive views conjunctive views Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 13
Queries Without Self-Joins: The Problem Is in NP disjunctive views conjunctive views subexpression views Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 13
Queries Without Self-Joins: The Problem Is in NP disjunctive views conjunctive views subexpression views full-reducer views Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 13
1. Conjunctive Views Are Enough Theorem. Given a database D and a set of queries Qwithout self-joins. Suppose a set Vof disjunctive views is a solution for (D,Q). Then there exists another solution V’ for (D,Q), such that: • all views in V’ are conjunctive, and • size (V’) £ size (V). Corollary. For any database and any set of queries without self-joins, some optimal disjunctive solution is a set of conjunctive views. Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 14
What We Have Shown disjunctive views conjunctive views Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 15
Idea of the Proof • Given:Q(…) :- S1(…), S2(…), …, Sn(…); rewriting P of Q that usesV: V = V1È V2È … È Vt • Then there exists: V’ = V’1È V’2È … È V’t such that: • for some mapping m, each V’i is an image of Vi,and • each V’i alone can replace any Vj in the rewriting of Q Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 16
Details of the Proof (1) • P º Q, P = P1È P2 È ... È Ps • There exists a conjunctive query Pi:Piº Q • Pi(…) :- Vi1(…), …, Vij(…), …, Vim(…), G(…). • Fix any Vijin Pi; consider, in P, Pr(…) :- Vij(…), …, Vij(…), …, Vij(…), G(…). • Because Pr is contained in Q, there exists a mapping bfrom Q to the expansion of Pr • We can always change b, to redirect all subgoals of Q that map into subgoals of Vij in Pr Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 17
Details of the Proof (1) • P º Q, P = P1È P2 È ... È Ps • There exists a conjunctive query Pi:Piº Q • Pi(…) :- Vi1(…), …, Vij(…), …, Vim(…), G(…). • Fix any Vijin Pi; consider, in P, Pr(…) :- Vij(…), …, Vij(…), …, Vij(…), G(…). • Because Pr is contained in Q, there exists a mapping bfrom Q to the expansion of Pr • We can always change b, to redirect all subgoals of Q that map into subgoals of Vij in Pr Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 17
Details of the Proof (1) • P º Q, P = P1È P2 È ... È Ps • There exists a conjunctive query Pi:Piº Q • Pi(…) :- Vi1(…), …, Vij(…), …, Vim(…), G(…). • Fix any Vijin Pi; consider, in P, Pr(…) :- Vij(…), …, Vij(…), …, Vij(…), G(…). • Because Pr is contained in Q, there exists a mapping bfrom Q to the expansion of Pr • We can always change b, to redirect all subgoals of Q that map into subgoals of Vij in Pr Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 17
Details of the Proof (1) • P º Q, P = P1È P2 È ... È Ps • There exists a conjunctive query Pi:Piº Q • Pi(…) :- Vi1(…), …, Vij(…), …, Vim(…), G(…). • Fix any Vijin Pi; consider, in P, Pr(…) :- Vij(…), …, Vij(…), …, Vij(…), G(…). • Because Pr is contained in Q, there exists a mapping bfrom Q to the expansion of Pr • We can always change b, to redirect all subgoals of Q that map into subgoals of Vij in Pr Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 17
Details of the Proof (2) • We can always change b, to redirect all subgoals of Q that map into subgoals of more than oneVij in Pr • Then, we can replace Pr with P’r: Pr(…) :- Vij(…), …, Vij(…), …, Vij(…), G(…). P’r(…):- Vij(…), G(…). • And P’rº Q Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 18
Details of the Proof (3) • Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19
Details of the Proof (3) • Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19
Details of the Proof (3) • Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) b Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19
Details of the Proof (3) • Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) b Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19
Details of the Proof (3) • Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) b Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19
Details of the Proof (3) • Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) b Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19
Details of the Proof (3) • Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) b’ b Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19
Details of the Proof (4) • Thus, we can replace Pr with P’r: Pr(…) :- Vij(…), …, Vij(…), …, Vij(…), G(…). P’r(…):- Vij(…), G(…). • And P’rº Q Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 20
2. Subexpression Views Are Enough Theorem. Given a database D and a set of queries Q without self-joins. Suppose a set Vof disjunctive views is a solution for (D,Q). Then there exists another solution V’ for (D,Q), such that: • all views in V’ are conjunctive subexpression-type, and • size (V’) £ size (V). Corollary. For any database and set of queries without self-joins, some optimal disjunctive solution is a set of conjunctive subexpression-type views. The size of the search space of views is at most singly-exponential in the size of the input query workload Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 21
3. Full-Reducer Views Are Enough A view V is a full-reducer view for a query Q if V and Q have the same body. Theorem. Given a database D and a single queryQ without self-joins. Suppose a set Vof disjunctive views is a solution for (D,Q). Then there exists another solution V’ for (D,Q), such that: • all views in V’ are conjunctive full-reducer views for Q, and • size (V’) £ size (V). Corollary. For any database and any query without self-joins, some optimal disjunctive solution is a set of conjunctive full-reducer views. Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 22
Using Full-Reducer Views To Rewrite Sets of Queries For query workloads with more than one query, we can merge optimal full-reducer views for individual queries in the workload - and the number of subgoals in the merged views never exceeds the number of subgoals in full-reducer views. Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 23
What We Have Shown disjunctive views conjunctive views subexpression views full-reducer views Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 24
The Problem Is in NP Theorem. Given a database instance, for any finite workload of conjunctive queries without self-joins, the problem of finding a minimal-size disjunctive viewset is in NP. Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 25
Generating Minimal-Size Views • Input: a conjunctive query without self-joins and a database • Output: a minimal-size disjunctive viewset for the query on the database • Method: produce a minimal-size set of conjunctive full-reducer views, • by doing exhaustive search in the space of the views • using a dynamic-programming algorithm (cf. query optimization in System R) • The algorithm returns an optimal solution • Can be modified to work for non-singleton query workloads Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 26
Heuristics for Generating Views • Consider only those views that “cover” up to a fixed number of subgoals of the query • Consider only those views that have up to a fixed number of head attributes • Apply the algorithm separately to several subsets of subgoals of the query, then combine the solutions Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 27
Main Results • Decidability and upper bounds on the complexity of the problem • Relationship between: • a restriction on the language of the queries, and • the language of optimal views • Dynamic-programming algorithm for finding an optimal solution for conjunctive queries (restricted case) Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 28
Some Directions of Future Work • Rewriting queries in more expressive languages: • built-in predicates • disjunctive queries • … • Using more expressive languages of views and rewritings • Maximally-contained rewritings of queries in terms of views Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 29
Reference Jia Li, Rada Chirkova, and Chen Li. Minimizing Data-Communication Costs by Decomposing Query Results in Client-Server Environments. UCI ICS Technical Report, 2003. http://www-db.ics.uci.edu/pages/raccoon/ Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 30