90 likes | 223 Views
CONTROL:. CONTROL group Joe Hellerstein, Ron Avnur, Christian Hidber, Bruce Lo, Chris Olston, Vijayshankar Raman, Tali Roth, Kirk Wylie, UC Berkeley. Continuous Output and Navigation Technology with Refinement On-Line. Batch vs. On-Line Processing. Batch Processing
E N D
CONTROL: CONTROL group Joe Hellerstein, Ron Avnur, Christian Hidber, Bruce Lo, Chris Olston, Vijayshankar Raman, Tali Roth, Kirk Wylie, UC Berkeley Continuous Output and Navigation Technology with Refinement On-Line
Batch vs. On-Line Processing • Batch Processing • Gives 100% accurate answers, but users must wait for entire query to finish . . . • On-Line Processing • Gives progressively refining answers as the query runs! • Allow users to control processing. • Applications of On-Line Processing • Large, ad-hoc queries in domains where approximate answers are acceptable (“big picture”)
estimate Demo Outline • On-Line Aggregation • Refining estimates • Statistics give confidence • User Control • The user can speed up the processing of certain groups • The user can stop the processing at any time • On-Line Visualization • Displays an approximation of an image based on data while the data is being fetched • Shows the estimated density and distribution of data
On-Line Agg.: Query Processing • New Access Methods • Randomly delivered data. • Index Striding • We can take advantage of B-Trees to access the groups • Heap Striding • More generally, on-line permutation • Non-blocking Join Algorithms • Ripple Join Family • RIPL = Rectangles of Increasing Perimeter Length • Join progressively larger samples of two tables
AAABABACDCDAAA... ABCDABCDABCD... Heap File Fair Sample Output Access Methods for On-Line Agg. • Index Stride • Round-robin through the groups to get a fair sample • Works with an index on the grouping column • Heap Stride (On-Line Permutation) • Reorder tuples on the fly to get a fair sample
R R S S Ripple Traditional Multi-Table On-Line Aggregation • Progressively refining join: Ripple Join • Ever-larger rectangles in R S • Comes in naive, block, and hash flavors • Benefits: • sample from both relations simultaneously • gives better statistical confidences much faster • intimate relationship between delivery and estimation
On-Line Aggregation User Interface Estimates for Each Group User Controls Graph of Estimates w/Confidence Intervals
On-Line Visualization: CLOUDS CLOUDS displays an approximation of an image based on data while the data is being fetched Conventional Algorithm CLOUDS Algorithm CLOUDS (with Index) Note that CLOUDS predicts the high density of cities in the Midwest
Quantifying the benefit of CLOUDS CLOUDS gives a better approximate image faster than the conventional algorithm Conventional Error CLOUDS Time (seconds)