280 likes | 613 Views
Parallel Query Optimization. Memory. One tuple at a time. Bucket B. Bucket A. Bucket Sizes and I/O Costs. Bucket B does not fit in the memory in its entirety, It must be loaded several times. Memory. One tuple at a time. Bucket B(1). Bucket A(1). Bucket A(2). Bucket B(2). Bucket A(3).
E N D
Parallel Query Optimization Parallel Query Optimization
Memory One tuple at a time Bucket B Bucket A Bucket Sizes and I/O Costs • Bucket B does not fit in the memory in its entirety, It must be loaded several times. Parallel Query Optimization
Memory One tuple at a time Bucket B(1) Bucket A(1) Bucket A(2) Bucket B(2) Bucket A(3) Bucket B(3) Fit in Memory • Bucket B fits in memory. It needs to be loaded only once. Parallel Query Optimization
Hash-Based Join Parallel Query Optimization
GRACE Algorithm Parallel Query Optimization
Data Skew • System performance is very sensitive to the skewness in tuple distribution. Parallel Query Optimization
Zipf-like Distribution Total: 1,000,000tuples Parallel Query Optimization
Partition Tuning • Best Fit Decreasing Strategy: • In this partition tuning strategy, the hash buckets are first sorted into decreasing order according to size. • In each iteration, the currently largest bucket is assigned to the currently smallest partition (or PN). • This process is repeated until all the buckets have been allocated. • This is a dynamic load balancing technique. Parallel Query Optimization
Best Fit Decreasing Strategy Parallel Query Optimization
Adaptive Load Balancing (ABJ+) Parallel Query Optimization
ABJ+ vs. GRACE Parallel Query Optimization
L_LBO in Multi-way Join Queries • L_LBO: Linear Tree with Load Balancing • A multi-way join query is treated as a sequential order of two-way (or single) joins by using ABJ+. Parallel Query Optimization
B_NLB in Multi-way Join Queries • B_NLB: Bushy Tree without Load Balancing • It tries to join as many pairs of relations as possible. Split Phase: Each PN partitions its portion of each relation into small subbuckets and each subbuckets is transferred to PN corresponding to the bucket ID. Join Phase: Each PN performs the local joins. Parallel Query Optimization
NLBO in Multi-way Join Queries • NLBO: No Load Balancing Optimization • Like B_NLB, it tries to join as many pairs of relations as possible. Hash Phase: Each PN partitions its portion of each relation into small subbuckets and stores them back to its own disks. Partition Tuning Phase: It allocates the buckets to the PNs using the Best Fit Decreasing Strategy. Join Phase: Each PN performs the local joins. Parallel Query Optimization
LBO in Multi-way Join Queries • LBO: Load Balancing Optimization Hash Phase: hashed and stored back into local disks. Optimization Phase: using best fit decreasing strategy and a greedy algorithm to select joins which will be executed concurrently. Executing Phase: Stage 1: Tune the partitions. Stage 2: Perform the join operation. Stage 3: Update the join graph, then go to Optimization Phase. Parallel Query Optimization
Optimization Phase of LBO Parallel Query Optimization
Effect of Bucket Skew Parallel Query Optimization
LBO-FR • LBO-SFR: LBO with Fragment & Replicate Feature • LBO-FR is similar to LBO, except it partitions bucket pairs into subbucket pairs if those buckets are too large. • Example: suppose bucket pair (S1, R1) is too large and |S1| > |R2|. Parallel Query Optimization
LBO-SFR • LBO-SFR: LBO with Symmetric Fragment & Replicate Feature |S1,1,1|>|R1,1,1| |S1|>|R1| |S1,1,1|<|R1,1,1| Parti. S1 Parti. R1 Parti. S1 Parallel Query Optimization
Effect of Bucket Skew Parallel Query Optimization