1 / 22

Raghavendra Madala

ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan CSE6339 – Data exploration. Raghavendra Madala. In this presentation…. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee

delora
Download Presentation

Raghavendra Madala

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICICLES: Self-tuning Samples for Approximate Query AnsweringBy Venkatesh Ganti, Mong Li Lee, and Raghu RamakrishnanCSE6339 – Data exploration Raghavendra Madala

  2. In this presentation… • Introduction • Icicles • Icicle Maintenance • Icicle-Based Estimators • Quality Guarantee • Performance Evaluation • Conclusion ICICLES: Self-tuning Samples for Approximate Query

  3. Introduction Analysis of data in data warehouses useful in decision support • OLAP-provide interactive response times to aggregate queries • AQUA- Approximate query answering systems provide very fast alternatives to OLAP systems ICICLES: Self-tuning Samples for Approximate Query

  4. Approaches • Sampling-based • Histogram-based • Probabilistic-based • Wavelet-based • Clustering-based ICICLES: Self-tuning Samples for Approximate Query

  5. Join synopsis Is a Uniform Random Sampling • All tuples are assumed to be equally important • OLAP queries follow a predictable repetitive pattern • Sampling wastes precious main-memory • Join of random samples of base relations may not be a random sample of the join of the base relations. This is basis for Join Synopsis by Gibbons ICICLES: Self-tuning Samples for Approximate Query

  6. Why Icicles? • To capture the data locality of aggregate queries on foreign key joins • Is expected to consist of more tuples in regions that are accessed more frequently • Sample relation space better utilized if more samples from actual result set are present • Dynamic algorithm that changes the sample to suit the queries being executed in the workload ICICLES: Self-tuning Samples for Approximate Query

  7. Icicles Is a uniform random sample of a multiset of tuples L (an extension of R), which is the union of a relation R and all sets of tuples that were required to answer queries in the workload ICICLES: Self-tuning Samples for Approximate Query

  8. Icicle Maintenance The intuition is to incrementally maintain a sample, called icicles. We maintain an icicle such that the probability of a tuple being selected is proportional to frequency with which it is required to answer queries(exactly). ICICLES: Self-tuning Samples for Approximate Query

  9. Icicle Maintenance Algorithm Efficient incremental maintenance is possible for the the following reasons • Uniform Random Sample of L(extension of relation R) ensures that tuple’s selection in the icicle is proportional to it’s frequency • Incremental maintenance of icicle requires only the segment of R that satisfies the new query each time • Reservoir Sampling Algorithm is used to stream each tuple being appended to L. ICICLES: Self-tuning Samples for Approximate Query

  10. Icicle Maintenance Algorithm ICICLES: Self-tuning Samples for Approximate Query

  11. Icicle Maintenance Example ICICLES: Self-tuning Samples for Approximate Query

  12. Icicle-Based Estimators • Icicle is a non-uniform sample of original data • Frequency must be maintained over all tuples • Different Estimation mechanisms for Average, Count and Sum ICICLES: Self-tuning Samples for Approximate Query

  13. Estimators for Aggregate queries • Average is the average of distinct tuples in sample satisfying query • Count is the sum of expected contributions of all tuples in icicle that satisfy the query • Sum is the product of average and count ICICLES: Self-tuning Samples for Approximate Query

  14. Maintaining Frequency Relation • Add Frequency Attribute to the Relation R • Frequency of each tuples is set to 1 • Frequency incremented each time when a tuple is used to answer a query • Frequencies of relevant tuples updated only when icicle updated with new query ICICLES: Self-tuning Samples for Approximate Query

  15. Quality Guarantees • When queries in workload exhibit data locality, then icicles consists of more tuples from frequently accessed subsets of the relation • Accuracy improves with increase in number of tuples used to compute it ICICLES: Self-tuning Samples for Approximate Query

  16. Performance Evaluation Plots definition: • Static sample: Uniform random sample on the relation • Icicle: Icicle evolves with the workload • Icicle-complete The tuned icicle again on the same workload ICICLES: Self-tuning Samples for Approximate Query

  17. Performance Evaluation SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice) FROM LI, C, O, S, N, R WHERE C_Custkey=O_Custkey AND O_Orderkey=LI_Orderkey AND LI_Suppkey=S_Suppkey AND C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998 Qworkload : Template for generating workloads SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice) FROM LICOS-icicle, N, R WHERE C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998 Template for obtaining approximate answers ICICLES: Self-tuning Samples for Approximate Query

  18. Performance Evaluation ICICLES: Self-tuning Samples for Approximate Query

  19. Performance Evaluation ICICLES: Self-tuning Samples for Approximate Query

  20. Conclusion • Icicles are class of samples that are sensitive to workload characteristics • Adapt quickly to changing workload • Icicles are useful when the workload focuses on relatively small subsets in relation • Icicle is a trade-off between accuracy and cost ICICLES: Self-tuning Samples for Approximate Query

  21. References • V. Ganti, M. Lee, and R. Ramakrishnan. ICICLES: Self-tuning Samples for Approximate Query Answering. VLDB Conference 2000. ICICLES: Self-tuning Samples for Approximate Query

  22. Thank you! ICICLES: Self-tuning Samples for Approximate Query

More Related