1 / 20

Anthony Okorodudu CSE 6392 2006-2-7

ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan. Anthony Okorodudu CSE 6392 2006-2-7. Outline. Introduction Uniform random sampling Icicles Icicle maintenance Maintaining frequency relation

lyris
Download Presentation

Anthony Okorodudu CSE 6392 2006-2-7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICICLES: Self-tuning Samples for Approximate Query AnsweringBy Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan Anthony Okorodudu CSE 6392 2006-2-7

  2. Outline • Introduction • Uniform random sampling • Icicles • Icicle maintenance • Maintaining frequency relation • Estimators for aggregate queries • Quality Guarantee • Performance evaluation ICICLES: Self-tuning Samples for Approximate Query Answering

  3. Introduction • Analysis of data in data warehouses useful in decision support • Users of decision support systems want interactive systems • Most decision support systems can tolerate approximate results • Approximate query answering systems (AQUA) ICICLES: Self-tuning Samples for Approximate Query Answering

  4. Approximate Querying • Various approaches to answering approximate queries • Sampling-based • Histogram-based • Clustering • Probabilistic • Wavelet-based ICICLES: Self-tuning Samples for Approximate Query Answering

  5. Uniform Random Sampling Sales S_sales 50% Sample scale factor SELECT SUM(sales) x 2 AS cnt FROM s_sales WHERE state = ‘TX’ ICICLES: Self-tuning Samples for Approximate Query Answering

  6. Biased Sampling Sales S_sales Sample relation for aggregation query workload regarding Texas branches ICICLES: Self-tuning Samples for Approximate Query Answering

  7. Icicles • Class of samples to capture data locality of aggregate queries of foreign key joins • Join synopsis is the join of a uniform random sample of the fact table with a set of dimension tables • Sample relation space better utilized if more samples from actual result set are present ICICLES: Self-tuning Samples for Approximate Query Answering

  8. Icicle-Based Estimators • Icicle is a non-uniform sample of the original relation • Traditional scaling up not appropriate for icicles • Frequency must be maintained for each tuple ICICLES: Self-tuning Samples for Approximate Query Answering

  9. Reasoning • Accuracy of approximate answer proportional to number of tuples used for computation • If a lot of queries in workload use the frequent set of tuples, then average quality of answer improves • Drastic changes to workload or queries that don’t conform to workload have less accuracy than static sample ICICLES: Self-tuning Samples for Approximate Query Answering

  10. Icicle Maintenance • Probability of tuples presence is proportional to its importance in answering queries in workload • Tuple is selected for icicle base on its frequency • A workload where all tuples are retrieved equally frequently is a uniform workload ICICLES: Self-tuning Samples for Approximate Query Answering

  11. Icicle Maintenance ICICLES: Self-tuning Samples for Approximate Query Answering

  12. Icicle Maintenance Example SELECT average(*) FROM widget-tuners WHERE date.month = ‘April’ ICICLES: Self-tuning Samples for Approximate Query Answering

  13. Estimators for Aggregate Queries • Traditional estimators can’t be used due to selection bias and duplicates in icicle • Average is the average of distinct tuples in sample satisfying query • Doesn’t require frequency attribute • Count is the sum of expected contributions of all tuples in icicle that satisfy the query • Sum is the product of the average and count ICICLES: Self-tuning Samples for Approximate Query Answering

  14. Quality Guarantees • If queries in workload exhibit data locality, then the icicles contain tuples from frequently accessed subset of relation • A new query to workload is more accurate than uniform random sample if query accesses frequent tuples in icicle ICICLES: Self-tuning Samples for Approximate Query Answering

  15. Performance Evaluation SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice) FROM LI, C, O, S, N, R WHERE C_Custkey=O_Custkey AND O_Orderkey=LI_Orderkey AND LI_Suppkey=S_Suppkey AND C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998 Qworkload: Template for generating workloads SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice) FROM LICOS-icicle, N, R WHERE C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998 Template for obtaining approximate answers ICICLES: Self-tuning Samples for Approximate Query Answering

  16. Performance Evaluation ICICLES: Self-tuning Samples for Approximate Query Answering

  17. Performance Evaluation: Mixed Workload ICICLES: Self-tuning Samples for Approximate Query Answering

  18. Conclusion • Icicles are a new class of samples that are sensitive to workload characteristics • Icicles adapt quickly to changing workload • Experiments show that icicle are good when workload focuses on relatively small subsets in relation ICICLES: Self-tuning Samples for Approximate Query Answering

  19. References • V. Ganti, M. Lee, and R. Ramakrishnan. ICICLES: Self-tuning Samples for Approximate Query Answering. VLDB Conference 2000. ICICLES: Self-tuning Samples for Approximate Query Answering

  20. Thanks ICICLES: Self-tuning Samples for Approximate Query Answering

More Related