220 likes | 408 Views
ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan CSE6339 – Data exploration. Raghavendra Madala. In this presentation…. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee
E N D
ICICLES: Self-tuning Samples for Approximate Query AnsweringBy Venkatesh Ganti, Mong Li Lee, and Raghu RamakrishnanCSE6339 – Data exploration Raghavendra Madala
In this presentation… • Introduction • Icicles • Icicle Maintenance • Icicle-Based Estimators • Quality Guarantee • Performance Evaluation • Conclusion ICICLES: Self-tuning Samples for Approximate Query
Introduction Analysis of data in data warehouses useful in decision support • OLAP-provide interactive response times to aggregate queries • AQUA- Approximate query answering systems provide very fast alternatives to OLAP systems ICICLES: Self-tuning Samples for Approximate Query
Approaches • Sampling-based • Histogram-based • Probabilistic-based • Wavelet-based • Clustering-based ICICLES: Self-tuning Samples for Approximate Query
Join synopsis Is a Uniform Random Sampling • All tuples are assumed to be equally important • OLAP queries follow a predictable repetitive pattern • Sampling wastes precious main-memory • Join of random samples of base relations may not be a random sample of the join of the base relations. This is basis for Join Synopsis by Gibbons ICICLES: Self-tuning Samples for Approximate Query
Why Icicles? • To capture the data locality of aggregate queries on foreign key joins • Is expected to consist of more tuples in regions that are accessed more frequently • Sample relation space better utilized if more samples from actual result set are present • Dynamic algorithm that changes the sample to suit the queries being executed in the workload ICICLES: Self-tuning Samples for Approximate Query
Icicles Is a uniform random sample of a multiset of tuples L (an extension of R), which is the union of a relation R and all sets of tuples that were required to answer queries in the workload ICICLES: Self-tuning Samples for Approximate Query
Icicle Maintenance The intuition is to incrementally maintain a sample, called icicles. We maintain an icicle such that the probability of a tuple being selected is proportional to frequency with which it is required to answer queries(exactly). ICICLES: Self-tuning Samples for Approximate Query
Icicle Maintenance Algorithm Efficient incremental maintenance is possible for the the following reasons • Uniform Random Sample of L(extension of relation R) ensures that tuple’s selection in the icicle is proportional to it’s frequency • Incremental maintenance of icicle requires only the segment of R that satisfies the new query each time • Reservoir Sampling Algorithm is used to stream each tuple being appended to L. ICICLES: Self-tuning Samples for Approximate Query
Icicle Maintenance Algorithm ICICLES: Self-tuning Samples for Approximate Query
Icicle Maintenance Example ICICLES: Self-tuning Samples for Approximate Query
Icicle-Based Estimators • Icicle is a non-uniform sample of original data • Frequency must be maintained over all tuples • Different Estimation mechanisms for Average, Count and Sum ICICLES: Self-tuning Samples for Approximate Query
Estimators for Aggregate queries • Average is the average of distinct tuples in sample satisfying query • Count is the sum of expected contributions of all tuples in icicle that satisfy the query • Sum is the product of average and count ICICLES: Self-tuning Samples for Approximate Query
Maintaining Frequency Relation • Add Frequency Attribute to the Relation R • Frequency of each tuples is set to 1 • Frequency incremented each time when a tuple is used to answer a query • Frequencies of relevant tuples updated only when icicle updated with new query ICICLES: Self-tuning Samples for Approximate Query
Quality Guarantees • When queries in workload exhibit data locality, then icicles consists of more tuples from frequently accessed subsets of the relation • Accuracy improves with increase in number of tuples used to compute it ICICLES: Self-tuning Samples for Approximate Query
Performance Evaluation Plots definition: • Static sample: Uniform random sample on the relation • Icicle: Icicle evolves with the workload • Icicle-complete The tuned icicle again on the same workload ICICLES: Self-tuning Samples for Approximate Query
Performance Evaluation SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice) FROM LI, C, O, S, N, R WHERE C_Custkey=O_Custkey AND O_Orderkey=LI_Orderkey AND LI_Suppkey=S_Suppkey AND C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998 Qworkload : Template for generating workloads SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice) FROM LICOS-icicle, N, R WHERE C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998 Template for obtaining approximate answers ICICLES: Self-tuning Samples for Approximate Query
Performance Evaluation ICICLES: Self-tuning Samples for Approximate Query
Performance Evaluation ICICLES: Self-tuning Samples for Approximate Query
Conclusion • Icicles are class of samples that are sensitive to workload characteristics • Adapt quickly to changing workload • Icicles are useful when the workload focuses on relatively small subsets in relation • Icicle is a trade-off between accuracy and cost ICICLES: Self-tuning Samples for Approximate Query
References • V. Ganti, M. Lee, and R. Ramakrishnan. ICICLES: Self-tuning Samples for Approximate Query Answering. VLDB Conference 2000. ICICLES: Self-tuning Samples for Approximate Query
Thank you! ICICLES: Self-tuning Samples for Approximate Query