Using State Modules for Adaptive Query Processing

Using State Modules for Adaptive Query Processing Vijayshankar Raman IBM Almaden Research Center Amol Deshpande Joseph M. Hellerstein University of California, Berkeley

All the material is taken directly or adapted from the paper “Using State Modules for Adaptive Query Processing” by Vijayshankar Raman, Amol Deshpande, Joseph M. Hellerstein

Contents • Background • Overview • Framework • Variations in Adaptations • Illustrated Examples and Experiments • Conclusion

Background • Uncertainties in query execution • Cardinality estimates are highly imprecise • Demands on memory, system load and network bandwidth are typically unknown at runtime • Data distribution and rates often cannot be known in advance • User preference in interactive system changes over time • Necessity of adaptive execution in stream system

Background • Federated Facts and Figures (FFF) query system to combine data from diverse and distributed data sources • Volatility of distributed data sources • Volatility of user interests during online query processing

What do you mean by adaptability? • No static plan of execution, dynamically changing execution plan according to the changing environment, at the same time, should guarantee the result is correct • Adaptability requires flexibility, such as • Choices of AM and Join Algorithms • Ordering of operators • Choices of query spanning tree

How to achieve flexibility? • Proposed Solutions • Refine the granularity of query models • Breaking down large operator, exposing the inside to the control of optimizer • Separate and encapsulate the state data structure from Join • Optimizer has more decisions to make • Consequence • Optimizer gains more flexibility and freedom at the expense of assuming more responsibilities

Overview • Adaptive Execution of SPJ • Routing constraints needed for on-the-fly Adaptation • Focus on the adaptive processing of join • Introduce the framework of dynamic routing • Keep on adding flexibility in execution by revising and relaxing the routing constraints gradually • Support other join algorithms • Support multiple access methods • Support cyclic query • Non-symmetric treatment of input relations

Join Operator • Logical construct, black box • Typically involve multiple physical operations • Q1: Which physical operations are involved?

Different Levels of Adaptation • Join of three table • Q2: What is the advantage of (b) compared to (a)? • Q3: What is the advantage of (c) compared to (b)?

Comparison & Discussion • Both (a) and (b) make use of only the index access method on T and pre-chosen implementation for RS and ST joins • (c) allows all access methods (tuples from AM are routed to SteMs, rather than joins) and allows a variety of routing decisions that permit different join algorithms and join order • Q4: Decomposing of Join operator brings about adaptation. Why?

Does the routing framework work? • (Appendix) Showed all SPJ can be executed by routing tuples carefully between AM, SteMs and selections • Caution • Arbitrary routing results in • Duplicate results • Missing results • Infinite loops • Solution • Flexibility comes at the price of Routing constraints • Proposed Routing constraints • Major topic of the following part is developing a set of routing constraints that guarantee correct results

Framework Components (overview) • Four kinds of modules • Selection modules: Query predicate • Access modules: Access method over data source • State modules: Encapsulate data structure in traditional join algorithms • Eddy modules: Route tuples between the other modules • Each module runs asynchronously

Functionality of Main Modules

Query Planning • Check that the query is valid • Create an AM on each access method • Create a SM on each predicate • Create a SteM on each base table • Create any seed tuples needed for scans

Example of N-way Symmetric Hash Join • Demonstrate how to implement n-way symmetric join with SteMs Q5: Comparing (ii) with (i), which one will you choose?

Executing Arbitrary SPJ Queries with SteMs 1. Acyclic SPJ queries with single scan AM on each table Example: n-ary SHJ Required Rules: SteMs implemented with hash indices. Eddy obeys Routing Constraints: BuildFirst:Singleton tuple from table T must first be routed to build into SteMT SteM BounceBack: All Build tuples and NO Probe tuples Atomicity: Build and Probing Coupled BoundedRepetition: No tuple routed to same module more than once.

Relax constraints to allow other Join Algorithms SteMs NEED NOT be implemented with hash indices. Build and Probe operations decoupled Potential problems?

2. Competitive AMs Example: Queries with more than one AM. Goal: Run multiple AMs/ source and let Eddy dynamically choose one AM or switch between AMs Duplicacy problem? Required Rules: • SteM BounceBack: A SteMS must bounce back a build tuple s unless it is a duplicate of another s’ that is already in SteMS.

3. Index AMs When a data source has an index AM. potential problem? Required Rules: • SteM BounceBack: • A SteMS must bounce back a build tuple s unless it is a duplicate of another s’ that is already in SteMS. • A SteMS must bounce back a probe tuple r unless S has a scan AM, or SteMS already contains all matches for r.

More Adaptation 4. Cyclic Queries • Static spanning tree choices hurt in two ways: • The spanning tree choice is typically made based on selectivities • A static spanning tree choice can also constrain the generation of partial query results Required Rules: • ProbeCompletion Constraint: A tuple t that has been bounced back after probing into a SteMS must not probe into any other SteM afterwards. The routing policy must however maintain t in the dataflow, routing it to other modules, until it has been probed into an AM on S. • Prior Probers and Probe Completion Table 5. Relaxing the BuildFirst Constraint if one of the input tables is much larger than the others?

Summary of Constraints

Conclusion The salient points of our experimental study are as follows : • Even a simple join algorithm like the index join encapsulates multiple physical operations, and this causes • A head-of-line blocking problem. This problem can be avoided by breaking the join module into SteMs. • SteMs allow the Eddy to efficiently learn between competitive access methods, while doing almost no redundant work. • SteMs allow the Eddy to dynamically choose the join spanning tree for cyclic queries. • SteMs allow the Eddy to dynamically switch between an index join algorithm and a symmetric hash join algorithm during query execution. • With SteMs, the Eddy can adaptively choose the way it reorders tuples in interactive environments. Thank you 

Using State Modules for Adaptive Query Processing

Using State Modules for Adaptive Query Processing

Presentation Transcript

Eddies: Continuously Adaptive Query Processing

Adaptive Query Processing: Progress and Challenges

Adaptive Query Processing

Adaptive Query Processing with Eddies

Adaptive Query Processing for Data Aggregation:

Approximate Query Processing using Wavelets

Query Processing

Query Processing

Adaptive Query Processing

Adaptive Query Processing

Adaptive Query Processing

BI515: Query Processing in Adaptive Server IQ

Adaptive Query Processing

Query Processing

Eddies: Continuously Adaptive Query Processing

Query Processing

Eddies: Continuously Adaptive Query processing

Approximate Query Processing using Wavelets

Adaptive Query Processing

Adaptive Query Processing in the Looking Glass

Adaptive Query Processing: Progress and Challenges

Adaptive Query Processing in Data Stream Systems