170 likes | 277 Views
Join Synopses for Approximate Query Answering. Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy. Purpose. Provide good approximate answers for join queries. Make use of join synopsis. Allocate space for join synopses. Maintain join synopses. Why?????.
E N D
Join Synopses for Approximate Query Answering Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy
Purpose • Provide good approximate answers for join queries. • Make use of join synopsis. • Allocate space for join synopses. • Maintain join synopses.
Why????? • Reduce overhead for large DB’s. • Approximate answer is best. • Reduce access to base relation. • Fast response time.
Proposed • Make use of pre-computed samples of DJ’s. (Join Synopses). • For queries with foreign key joins, provide good approximate join aggregates using small number of join synposes. • Allocation of space among the sets. • If workload characteristics is available. • Provide confidence bounds.
Aqua System • Goal- improve response time for queries by reducing access to base relation. • Maintains smaller sized statistical summaries – “SYNOPSES” • Provides confidence bounds. • It has 3 components 1:Statistics Collection 2:Query Rewriting 3:Maintenance. • It sits on top of a DBMS.
Problem with Joins • Natural set of synopses would be random samples from each of the base relation. • Non-Uniform Result Sample.- for the join of base samples to be a uniform random sample of original relation the probability must equal. • Small join result sizes. • Using samples on base relations is not feasible.
Join Synopses • Pre-compute samples of join results. • Compute samples of the results of a small set of distinguishing joins. • Can obtain random samples for all possible joins in the schema. • This scheme is for foreign key joins. • Model the DB as a graph.
Join Synopses • There is a 1-1 correspondence between a tuple in a relation ‘r’ & a tuple in the output of any foreign key join involving ‘r’ & any of its descendants in the graph. • A sample Sr of a relation ‘r’ can be used to produce another relation J(Sr) called a join synopsis of ‘r’. ( provides random samples). • The subgraph of G on the ‘k’ nodes in any k-way foreign key join must be a connected subgraph with a single root node.
Join Synopses • For each node u in G, corresponding to a relation r1, define J(u) to be the output of the maximum foreign key join r1xr2x..xrk with source r1. • Let Su be a uniform random sample of r1. • The join synopsis J(Su) is the output of Suxr2xr3…..xrk. • J(Su) is a uniform random sample of J(u) with |Su| tuples. • Thus we can extract from our synopsis a uniform random sample of the output of any k-way foreign key join. • From 1 join synopsis for a node whose foreign key join has k relations, we can extract a URS of the output of between k-1 & pow(2,k-1)-1 distinct foreign ey joins.
Allocation • Allocate space among various join synopses when certain properties of query workload are known. • Identify heuristics for the common case when such properties are not known. • Let ‘S’ be a set of queries with selects, aggregates, group by’s & foreign key joins. • For each relation Ri, find fraction Fi of queries in S for which Ri is the source relation in a foreign key join. • Select join synopsis sizes so as to minimize the average relative error. • It is known that the error bounds are inversely proportional to sqrt(n).(n- number of tuples in join sample).
Allocation • The average relative error bound over the queries is proportional to sum(Fi/sqrt(ni)). • In the absence of query work load information heuristic strategies can be used. • There are 3- • EqJoin • CubeJoin • PropJoin
Maintenance of Join Synopses • Need to maintain synchronization when base relation is updated. • If a tuple is added do this. • Let Pu be current probability for including a newly arrived tuple for relation u in the random sample Su. • Let uxr2xr3x….xrk be the max foreign key join with source u. • We add Tp (new tuple) to Su with probability Pu. • If Tp is added to Su, we add to J(Su) the tuple Tpxr2xr3x….rk.
Maintenance of Join Synopses • On delete of a tuple Tp from u, do this • Find if Tp is in Su. If there, then delete it from Su & remove the tuple in J(Su) corresponding to Tp. • If sample becomes too small , repopulate the sample by rescanning relation u. • This algorithm performs look ups to base relation with probability Pu. • We never update join synopses for any ancestors of u.
Experimental Evaluation • Test bed – TPC-D decision support benchmark. DB of around 300 MB. • Machine – 296MHz UltraSPARC-II, 256 MB of memory, Solaris 5.6. • Query used is based on Q5 & an aggregate computed on join of Lineitem, Customer, Order, Supplier, Nation, Region. • The query used is
Conclusion • Focus on computing approximate answers to aggregates computed on multi-way joins. • For DB’s with schema’s that involve only foreign key joins, join synopses has been proposed as s solution. • Join synopses can be maintained effectively during updates.