1 / 15

Presented by:

Dynamic Sample Selection for Approximate Query Processing Brian Babcock, Surajit Chaudari & Gautam Das. Presented by: Mariam John CSE 6392 02/14/2006.

masato
Download Presentation

Presented by:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Sample Selection for Approximate Query ProcessingBrian Babcock, Surajit Chaudari & Gautam Das Presented by: Mariam John CSE 6392 02/14/2006

  2. Contents • Introduction • Dynamic Sample Selection • Policies for Sample Selection • Small Group Sampling • Pre-Processing Phase • Summary

  3. Why do we do Approximate Query Processing? • Multi-gigabyte data repositories • Data Analysis Application • Data mining • Decision Support Analysis • Fast query response time • Acceptability of inexact query response

  4. Problem • Constructing an optimal sample that well represents the underlying data. • Uniform sampling • Non-uniform sampling

  5. Non-uniform sampling • Purpose is to produce more accurate results across a particular set of queries. • Produces more approximate results than uniform sampling. • Optimal bias differs from query to query.

  6. Dynamic Sample Selection SAMPLE DATA DATA SAMPLE SAMPLE ? ? SAMPLE SAMPLE Dynamic Sample Selection Standard Sampling

  7. Dynamic Sample Selection • Pre-Processing Phase Query Workload Sample Data Select Strata Build Sample Data Meta- Data

  8. Dynamic Sample Selection • Runtime Phase Query Sample Data Choose Samples Rewrite Query Meta- Data

  9. Dynamic Sample Selection • How to identify the set of biased samples to be created? • Occurs during pre-processing phase • How to determine which of the various samples to use to answer a query? • Occurs during runtime phase • Simplest and most efficient strategy is when choice of samples is guided by the syntax of incoming query.

  10. Small Group Sampling • Specific dynamic sample selection technique which targets aggregate queries with “group-by’s”. • Small group sampling approach: • Overall sample – perform uniform sampling on large groups. • Small group tables-one or more sample tables for smaller groups.

  11. Small group Sampling • Set of small groups depends on: • grouping columns • selection predicates

  12. Small Group Sampling Idea behind Small Group Sampling: • Determine for which values in each column to create small group tables. • Create small group tables for each column of a table along with the overall sample. • During runtime, choose a subset of sample tables to answer a query most accurately. • Query is rewritten to run against the sample tables instead of the base tables.

  13. Pre-processing Phase • For every column, identify the rare values within it and create small group tables. • Pre-processing phase produces three outputs: • Overall sample table • Small group tables • Metadata table

  14. Pre-processing phase • Rows can appear in multiple sample tables. • Bitmask field is used to identify the set of sample tables to which a row was added. • Avoids double counting of rows assigned to multiple sample tables.

  15. Summary • Dynamic Sample Selection • Takes advantage of available disk space • Creates multiple biased sample tables during the pre-processing phase • Picks best samples during runtime for query processing. • Small Group Sampling • Notion is to treat large and small groups differently • Creates an overall sample table for large groups and a number of small group tables for each rare values in each column.

More Related