1 / 30

ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishna

ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan. Harikrishnan Karunakaran Sulabha Balan. CSE 6339 . Outline. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality & Performance Conclusion.

enya
Download Presentation

ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishna

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICICLES: Self-tuning Samples for Approximate Query AnsweringBy VenkateshGanti, Mong Li Lee, and RaghuRamakrishnan Harikrishnan Karunakaran Sulabha Balan • CSE 6339

  2. Outline • Introduction • Icicles • Icicle Maintenance • Icicle-Based Estimators • Quality & Performance • Conclusion

  3. Introduction • Analysis of data in data warehouses useful in decision support • Users of decision support systems want interactive systems OLAP – Online Analytical Processing • Aggregate Query Answering Systems (AQUA) developed to reduce response time to desirable levels • Tolerant of approximate results

  4. Approximate Querying • Various Approaches • Sampling-based • Histogram-based • Clustering • Probabilistic • Wavelet-based

  5. Uniform Random Sampling Sales S_sales 50% Sample scale factor SELECT SUM(sales) x 2 AS cnt FROM s_sales WHERE state = ‘TX’

  6. Biased Sampling Sales S_sales Sample relation for aggregation query workload regarding Texas branches

  7. Methodology • All tuples in a Uniform Random Sample are treated as equally important for answering queries • Sample needs to be tuned to contain tuples which are more relevant to answer queries in a workload • Need for a dynamic algorithm that changes the sample as and according to suit the queries being executed in the workload

  8. SELECT COUNT(*), AVG(LI Extendedprice), SUM(LI Extendedprice) FROM LI, C, O, S, N, R WHERE C Custkey=O Custkey AND O Orderkey=LI Orderkey AND LI Suppkey=S Suppkey AND C Nationkey=N Nationkey AND N Regionkey=R Regionkey AND R Name=North America AND O Orderdate01-01-1998 AND O Orderdate12-31-1998; Join Synopsis • Join of a Uniform Random Sample of a Fact Table with a set of accompanying Dimension Tables

  9. Need for Icicles • Any aggregate query on the fact table can be answered approximately using exactly one of a smaller number of synopses • Uniform Random Sample of Relation wastes memory • OLAP queries exhibit locality in their data access

  10. Icicles • Class of samples to capture data locality of aggregate queries of foreign key joins • Identify focus of a query workload and sample accordingly • Is a uniform random sample of a multiset of tuples L, which is the union of R and all sets of tuples that were required to answer queries in the workload (an extension of R) • Is a non-uniform sample of the original relation R

  11. Icicle L

  12. Icicle Maintanence Algorithm

  13. Icicle Maintanence Algorithm Algorithm is efficient due to • Uniform Random Sample of L ensures tuple’s selection in its icicle is proportional to it’s frequency • Incremental maintenance of icicle requires only the segment of R that satisfies the new query from the workload Reservoir Sampling Algorithm

  14. Icicle Maintenance Example SELECT average(*) FROM widget-tuners WHERE date.month = ‘April’

  15. Icicle-Based Estimators • In spite of unified sampling being used the result is a biased sample • Frequency Relation maintained over all tuples in relation • Different Estimation mechanisms for Average, Count and Sum

  16. Estimators • Average Average taken over set of distinct sample tuples that satisfy the query predicate of the average query is a pretty good estimate of the average • Count Sum of Expected Contributionsof all tuples in the sample that satisfy the given query • Sum Estimate is given by the product of the average and the count estimates

  17. Maintaining Frequency Relation • Frequency Attribute added to the Relation • Starting Frequency set to 1 for all tuples • Incremented each time tuple is used to answer a query • Frequencies of relevant tuples updated only when icicle updated with new query

  18. Quality Guarantees • When queries exhibit data locality then icicle is constituted of more tuples from frequently accessed subsets of the relation • Accuracy improves with increase in number of tuples used to compute it • Class consisting of queries ‘focused’ with respect to workload will obtain more accurate approximate answers from the icicle

  19. Quality Guarantees contd...

  20. Performance Evaluation SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice) FROM LI, C, O, S, N, R WHERE C_Custkey=O_Custkey AND O_Orderkey=LI_Orderkey AND LI_Suppkey=S_Suppkey AND C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998 Qworkload : Template for generating workloads SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice) FROM LICOS-icicle, N, R WHERE C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998 Template for obtaining approximate answers

  21. Performance Evaluation contd...

  22. Performance Evaluation contd... • The Error Plots for Comparison • Static uniform random sample on Join Synopsis • Icicle as it evolves with the workload • Icicle-Complete which is formed after entire workload has been executed once

  23. Performance Evaluation contd... Focused Queries

  24. Performance Evaluation contd... Mixed Workload

  25. Observations (focused) • Rapid decrease in relative error of query answers from icicles with queries focused on a set of core tuples • Icicle plot shows a convergence to the Icicle-Complete plot • Quick Convergence of Icicle plot towards Icicle-Complete means Icicle adapts fast

  26. Observations (mixed) • Improvement due to usage of icicles is not significant • Can be concluded that icicles are at worst as good as the static samples

  27. Conclusion • Icicles provide class of samples that adapt according to the characteristics of the workload • It can never be worse than the case of static sampling • It focuses on relatively small subsets in the relation

  28. Inferences • There is no significant gains in the case of Uniform Workload • There is a trade-off between accuracy and cost • Restricted to certain scenarios where the queries tend to be increasingly focused towards the workload.

  29. References • V. Ganti, M. Lee, and R. Ramakrishnan. ICICLES: Self-tuning Samples for Approximate Query Answering. VLDB Conference 2000. • S Acharya, PB Gibbons, V Poosala, S RamaswamyJoin synopses for approximate query answering. ACM SIGMOD Record 1999

  30. Thank You Questions?

More Related