Selectivity-Based Partitioning

Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz

Query Optimization • Integral component of declarative query processing • Key problem: join ordering • Most important (and most complex!) module of a DBMS R1 R2 R3 R4 Parser R1 R2 R3 R4 Optimizer ( (R2 R3)R1) R4 Execution Engine

“Monolithic” Query Optimization • Output: a single join order based on join selectivities between tables Plan:(P E) D

Partition-Based Query Optimization • Output: multiple join orders based on selectivities between fragments of tables Plan: ( (P D2) E )  ( (E D1) P )

Selectivity-Based Partitioning • Divide-and-Union paradigm • Optimization problem and analysis • Partitioning algorithm • Experimental results

Roadmap • Preliminaries • Problem Definition • Partitioning Algorithm • Optimal Splits • Iterative Partitioning • Experimental Results • Conclusions

Data and Query Model • Chain-join queries • Example: R1 R2 R3 R4 • Relations may have optional selections • Relation  Frequency matrix • Left-deep evaluation plans • Example: R3 R2 R4 R1 R1 R4 R3 R2

Problem Definition • Given: query Q, maximum partition count N • Goal: find partitioning of Q in nN partitions that minimizes query cost • On-the-fly partitioning vs. Off-line partitioning • Difficult optimization problem! • Determine the pivot relation • Determine the number of partitions • Compute a partitioning of the pivot • Determine the orderings of partitioned plans R1R21 R4 R3 R1 R2 R3 R4 R3R22 R1 R4

Query Cost Function • One possibility: optimizer’s cost model • Accurate cost estimation • Solution depends on low-level system details • Difficult to gain intuitions • Our approach: query cost = number of intermediate results • Simple function that admits analysis • Sound connections to realistic cost models (Cluet and Moerkotte, ICDT’95) Cost(R3 R2 R4 R1 ) = |R3 R2| + |R3 R2 R4|

Partitioning Algorithm - Overview • State space: partitioned join orders • Partitioning algorithm: • Explore a set of states • Compute optimal partitioning for each state • Return global optimum • Our approach: order joins then partition • Another possibility: partition then order joins

Distributing Tuples • Goal: Distribute tuples to minimize cost • Optimal distribution depends on: • Frequency matrices of other relations • Position (m,l)

Optimal Split Theorem • Distribute each value (m,l) independently • Place (m,l) in partition that minimizes g(L,T,m,l)

Partitioning Algorithm - Overview • State space: partitioned join orders • Partitioning algorithm: • Explore a set of states • Compute optimal partitioning for each state • Return global optimum

Search Algorithm • Exhaustive search is impractical [ Pivot, Leading orders, Trailing orders ] • Search heuristics: • Tighter search space: [ Pivot, Optimal Leading orders ] • Iterative Partitioning • Guided search by using lower bounds on cost of partitions

Encoding of State Space • State: [ Pivot , Optimal leading orders ] • Transition: insert relation in a leading order

R21 R21 R22 R22 Iterative Partitioning • Key idea: (Partition, Optimize)+ • Compute optimal split for leading/trailing orders • Optimize trailing orders for the current split • Theorem: query cost can only decrease • Idea extended to more detailed cost models Leading Trailing R3 R5 R4 R3 R4 R5 R1 R2 R3 R4 R1 R5 R5 R1

Search Algorithm • Initial states: single-relation leading orders • Search process: • Compute partitions with IP • Open more states with transition function • Transitions are guided by lower bound on cost function • Same lower bound can also prune states • Stopping criteria: • Search space is exhausted • Time budget is exhausted

System Integration Monolithic Partition-based Parser Parser Partitioner Optimizer Optimizer Execution Engine Execution Engine

Effect of Skew Synthetic Data

Execution Time Synthetic Data (Skew=1.5)

Varying Time Budget Synthetic Data (Skew=1.5)

Results on Real-Life Data SwissProt

Conclusions • Monolithic optimization  Missed opportunities • Selectivity-Based Partitioning • Divide & Union approach • Multiple join orders per query • Join selectivity between relation fragments • Partitioning Algorithm • Iterative Partitioning • Experimental Results • Significant reduction of intermediate results

Future Work • Extension to multiple pivots • Partition-then-order optimization • Efficient execution of partitioned plans • Off-line workload-aware partitioning

Thank you!

Partitioning Model • General case: Multi-relation partitioning • Our approach: Single-relation partitioning R1R21 R4 R3 R1 R2 R3 R4 R31R22 R1 R4 R1R22R32 R4

Selectivity-Based Partitioning

Selectivity-Based Partitioning

Presentation Transcript

Oxidizer Selectivity

Ecological Selectivity

Partitioning

Scalability-Based Manycore Partitioning

Selectivity Estimation Example

Global Clustering-Based Performance-Driven Circuit Partitioning

Partitioning

Partitioning

Utility-Based Partitioning of Shared Caches

PARTITIONING:

Partitioning

Partitioning

Selectivity

Partitioning

Global Clustering-Based Performance-Driven Circuit Partitioning

Partitioning

Partitioning Sequences Based on Association Measures

Utility-Based Partitioning of Shared Caches

Partitioning

Gear Selectivity

Selectivity / Channel Rejection

Improved Cut Sequences for Partitioning Based Placement