280 likes | 462 Views
Revisiting Pipelined Parallelism in Multi-Join Query Processing. Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute (binliu|rundenst)@cs.wpi.edu. http://www.davis.wpi.edu/dsrg. Multi-Join Queries. Data Integration over Distributed Data Sources
E N D
Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute (binliu|rundenst)@cs.wpi.edu http://www.davis.wpi.edu/dsrg VLDB 2005
Multi-Join Queries • Data Integration over Distributed Data Sources • i.e., Extract Transform Load (ETL) Services Data Source Data Warehouse Data Source … … Data Warehouse Data Source Persistent Storage (1) High IO costs given large intermediate results (2) Disk access undesirable since one time process VLDB 2005
Applying Parallelism • Processed in Main Memory of a PC Cluster • Make use of aggregated resources (main memory, CPU) Network Clusters of Machines VLDB 2005
Three Types of Parallelism Pipelined: Operators be composed into producer and consumer relationship Independent: Independent operators run simultaneously on distinct machines Partitioned: Single operator replicated and run on multiple machines VLDB 2005
key value ID Date … OID Item … 0011 5/13 … 0011 IPC … (2) Probe hash tables and output results 0012 5/14 … 0012 HPS … (1) Build hash tables of Orders based on ID … … … … … … Orders LineItems OID Item … ID Date … 0011 IPC … 0011 5/13 … 0012 HPS … 0012 5/14 … … … … … … … LineItems Orders Basics of Hash Join • Two-Phase Hash Join [SD89, LTS90] • Demonstrated High Performance • Potential High Degree of Parallelism VLDB 2005
key value key value key value (2) Probe hash tables and output results Split (1) Build hash tables of Orders based on ID OID Item … ID Date … 0011 IPC … 0011 5/13 … 0012 HPS … 0012 5/14 … … … … … … … LineItems Orders Partitioned Hash Join • Partition (Inputs) Hash Tables across Processors • Have Each Processing Node Run in Parallel VLDB 2005
Left-Deep Query Tree Steps: B8 P8 (9) Scan R9 – Probe P8 – Output B7 P7 R9 (8) Scan R8 – Probe P7 – Build B8 R8 B2 P2 (3) Scan R3 – Probe P2 – Build B3 B1 P1 (2) Scan R2 – Probe P1 – Build B2 R3 R1 R2 (1) Scan R1 – Build R1 Left-Deep Tree [SD90] Example Join Graph R8 R2 R9 R6 R1 R3 R7 R5 R4 … VLDB 2005
Right-Deep Query Tree B8 P8 R9 B7 P7 (2) Scan R1, Probe P1, Probe P2, … , Probe P8 R8 B2 P2 • Scan R2 – Build R1, • Scan R3 – Build R2, • …, • Scan R9 – Build R8 B1 P1 R3 R2 R1 Right-Deep Tree [SD90] Example Join Graph R8 R2 R9 R6 R1 R3 R7 R5 R4 VLDB 2005
Tradeoffs Between Left and Right Trees • Right-Deep Good potential for pipelined parallelism. Intermediate results exist only as a stream. Size of building relations can be predicted accurately. Large memory consumption. • Left-Deep Less memory consumption Less pipelined parallelism VLDB 2005
B8 P8 R9 B7 P7 R8 B4 P4 R5 B3 P3 B2 P2 R4 R3 B1 P1 R1 R2 State-of-the-Art Solutions • Implicit Assumption : Prefer Maximal Pipelined Parallelism !!! VLDB 2005
B8 P8 B8 P8 R9 R9 B7 P7 B7 P7 R8 R8 B4 P4 B4 P4 R5 R5 B3 P3 B3 P3 B2 P2 B2 P2 R4 R4 R3 R3 B1 P1 B1 P1 R1 R1 R2 R2 State-of-the-Art Solutions • Break tree into several pieces, and • Process one piece at a time (as pipeline) • I.e., Static Right-Deep[SD90], • ZigZag [ZZBS93], • Segmented Right-Deep [CLYY92]. • What if : Memory Constrained Environments ? • Strategy : Pipeline ! VLDB 2005
t P12 Computation Machines P12 P13 P14 P22 P23 P24 P32 P33 P34 t R4 Probing Building R3 Partition Partition Partition Partition R2 R1 R1 R2 R3 R4 Pipelined Execution Redirection Cost: The intermediate results generated may need to be partitioned to a different machine. Optimal Degree of Parallelism? I.e., It may not be necessary to partition R2 over a large number of machines if it only has 1000 tuples? VLDB 2005
Pipelined Cost Model • Compute n-way join over k machines • Probing relation R0, building relations, R1, R2, …, Rn • Ii represents the intermediate results after joining with Ri • Total Work (Wb+Wp) & Total Processing Time (Tb+Tp) VLDB 2005
P4 R9 P2 P3 P1 R7 R1 R0 R3 R2 R7 R6 R4 R5 R1 R0 Break Pipelined Parallelism • To Break Long Pipeline and Introduce Independent Parallelism • Large number of small pipelines • High interdependence between pipelined segments • i.e., P1 > P2, P3 > P4, P2 > P4, VLDB 2005
P3 R8 R3 R2 R9 R7 R4 R1 P1 I1 R0 R5 I2 P2 R6 R5 R0 R1 R8 R7 R4 R3 R2 R6 R9 Segmented Bushy Tree • Basic Idea • Compose large pipelined segment • Run pipelined segments independently • Compose bushy tree with minimal interdependency To balance pipelined and independent parallelism VLDB 2005
Heuristics Cost-Based Composing Segmented Tree Input: A connected join graph G with n nodes. Number m specifies maximum number of nodes in each graph. Output: Segmented bushy tree with at least n/m subtrees. completed = false; WHILE (!completed) { Choose node V with largest cardinality that has not yet been grouped as probing relation; Enumerate all subgraphs starting from V with at most m nodes; Choose best subgraph, mark nodes in this group as having been selected in original join graph; IF !(exist K, K is a connected subgraph of G with unselected nodes) && (K.size() >= 2) { completed = true; } } Compose segmented bushy tree from all groups; VLDB 2005
G2 G1 G1 R8 R3 R2 R8 R3 R2 R8 R3 R2 R9 R7 R4 R1 R9 R7 R4 R1 R9 R7 R4 R1 R6 R5 R0 R6 R5 R0 R6 R5 R0 Example (1) R7, R8, R9, R6 (2) R7, R9, R6, R8 (3) R7, R4, R8, R5 ... (1) R1, R0, R2, R3 (2) R1, R2, R0, R3 (3) R1, R2, R3, R4 ... VLDB 2005
G2 G1 R8 R3 R2 R9 R7 R4 R1 I1 R0 R5 I2 R6 R5 R0 G3 R1 R8 R7 R4 R3 R2 R6 R9 Example : Segmented Bushy Tree VLDB 2005
Machine Allocation • Based on building relation sizes of each segment • Nb: total amount of building work. • ki: number of machines allocated to pipeline i k3 Nb = k1 I1 R0 R5 I2 k2 R1 R8 R7 R4 R3 R2 R6 R9 VLDB 2005
Insufficient Main Memory • Break query based on main memory availability • Compose segmented bushy tree for each part R14 R19 R12 R8 R3 R2 R13 R15 R18 R10 R9 R7 R4 R1 R16 R17 R11 R6 R5 R0 VLDB 2005
Experimental Setup • 10 Machine Cluster • Each machine has 2 2.4GHz Xeon CPUs, 2G Memory. • Connect by gigabit ethernet switch PIII 800M Hz PC, 256M Memory Application Oracle 8i 10 Machine Cluster ... 2 PIII 1G CPUs, 1G Memory PIII 800M Hz PC, 256M Memory Controller VLDB 2005
Experimental Setup (cont.) • Generated Data Set with Integer Join Values • Around 40 bytes per tuple • Randomly Generated Join Queries • Acyclic join graph with 8, 12, 16 nodes • Each node represents one join relation • Each edge represents one join condition • Average join ratio is 1 • Cardinality of each relation is from 1K ~ 100K • Up to 600MB per query VLDB 2005
Pipelined vs. Segmented (I) VLDB 2005
Pipelined vs. Segmented (II) VLDB 2005
Insufficient Main Memory VLDB 2005
Related Work • [SD90] Tradeoffs in processing complex join queries via hashing in multiprocessor database machines. VLDB 1990. • [CLYY92] Using segmented right deep trees for execution of pipelined hash joins. VLDB 1992. • [MLD94] Parallel hash based join algorithms for a shared everything environment. TKDE 1994. • [MD97] Data placement in shared nothing parallel database systems. VLDB 1997. • [WFA95] Parallel evaluation of multi-join queries. SIGMOD 1995. • [HCY94] On parallel execution of multiple pipelined hash joins. SIGMOD 1994. • [DNSS92] Practical skew handling in parallel joins. VLDB 1992. • [SHCF03] Flux: an adaptive partitioning operator for continuous query systems. ICDE, 2003. VLDB 2005
Conclusions • Observation: Maximal pipelined hash join processing • Redirection costs? optimal degree of parallelism? • Hypothesis: Worthwhile to incorporate independent parallelism into processing • Both, so several shorter pipelines in parallel • Solution: Segmented bushy tree processing • Heuristics and cost-driven algorithm developed • Validation : Extensive experimental studies • Achieve around 50% improvement over pure pipelined processing VLDB 2005