200 likes | 319 Views
ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan. Anthony Okorodudu CSE 6392 2006-2-7. Outline. Introduction Uniform random sampling Icicles Icicle maintenance Maintaining frequency relation
E N D
ICICLES: Self-tuning Samples for Approximate Query AnsweringBy Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan Anthony Okorodudu CSE 6392 2006-2-7
Outline • Introduction • Uniform random sampling • Icicles • Icicle maintenance • Maintaining frequency relation • Estimators for aggregate queries • Quality Guarantee • Performance evaluation ICICLES: Self-tuning Samples for Approximate Query Answering
Introduction • Analysis of data in data warehouses useful in decision support • Users of decision support systems want interactive systems • Most decision support systems can tolerate approximate results • Approximate query answering systems (AQUA) ICICLES: Self-tuning Samples for Approximate Query Answering
Approximate Querying • Various approaches to answering approximate queries • Sampling-based • Histogram-based • Clustering • Probabilistic • Wavelet-based ICICLES: Self-tuning Samples for Approximate Query Answering
Uniform Random Sampling Sales S_sales 50% Sample scale factor SELECT SUM(sales) x 2 AS cnt FROM s_sales WHERE state = ‘TX’ ICICLES: Self-tuning Samples for Approximate Query Answering
Biased Sampling Sales S_sales Sample relation for aggregation query workload regarding Texas branches ICICLES: Self-tuning Samples for Approximate Query Answering
Icicles • Class of samples to capture data locality of aggregate queries of foreign key joins • Join synopsis is the join of a uniform random sample of the fact table with a set of dimension tables • Sample relation space better utilized if more samples from actual result set are present ICICLES: Self-tuning Samples for Approximate Query Answering
Icicle-Based Estimators • Icicle is a non-uniform sample of the original relation • Traditional scaling up not appropriate for icicles • Frequency must be maintained for each tuple ICICLES: Self-tuning Samples for Approximate Query Answering
Reasoning • Accuracy of approximate answer proportional to number of tuples used for computation • If a lot of queries in workload use the frequent set of tuples, then average quality of answer improves • Drastic changes to workload or queries that don’t conform to workload have less accuracy than static sample ICICLES: Self-tuning Samples for Approximate Query Answering
Icicle Maintenance • Probability of tuples presence is proportional to its importance in answering queries in workload • Tuple is selected for icicle base on its frequency • A workload where all tuples are retrieved equally frequently is a uniform workload ICICLES: Self-tuning Samples for Approximate Query Answering
Icicle Maintenance ICICLES: Self-tuning Samples for Approximate Query Answering
Icicle Maintenance Example SELECT average(*) FROM widget-tuners WHERE date.month = ‘April’ ICICLES: Self-tuning Samples for Approximate Query Answering
Estimators for Aggregate Queries • Traditional estimators can’t be used due to selection bias and duplicates in icicle • Average is the average of distinct tuples in sample satisfying query • Doesn’t require frequency attribute • Count is the sum of expected contributions of all tuples in icicle that satisfy the query • Sum is the product of the average and count ICICLES: Self-tuning Samples for Approximate Query Answering
Quality Guarantees • If queries in workload exhibit data locality, then the icicles contain tuples from frequently accessed subset of relation • A new query to workload is more accurate than uniform random sample if query accesses frequent tuples in icicle ICICLES: Self-tuning Samples for Approximate Query Answering
Performance Evaluation SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice) FROM LI, C, O, S, N, R WHERE C_Custkey=O_Custkey AND O_Orderkey=LI_Orderkey AND LI_Suppkey=S_Suppkey AND C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998 Qworkload: Template for generating workloads SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice) FROM LICOS-icicle, N, R WHERE C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998 Template for obtaining approximate answers ICICLES: Self-tuning Samples for Approximate Query Answering
Performance Evaluation ICICLES: Self-tuning Samples for Approximate Query Answering
Performance Evaluation: Mixed Workload ICICLES: Self-tuning Samples for Approximate Query Answering
Conclusion • Icicles are a new class of samples that are sensitive to workload characteristics • Icicles adapt quickly to changing workload • Experiments show that icicle are good when workload focuses on relatively small subsets in relation ICICLES: Self-tuning Samples for Approximate Query Answering
References • V. Ganti, M. Lee, and R. Ramakrishnan. ICICLES: Self-tuning Samples for Approximate Query Answering. VLDB Conference 2000. ICICLES: Self-tuning Samples for Approximate Query Answering
Thanks ICICLES: Self-tuning Samples for Approximate Query Answering