321 likes | 518 Views
Handling Data Skew in Parallel Joins in Shared-Nothing Systems. Yu Xu , Pekka Kostamaa , XinZhou (Teradata) Liang Chen (University of California) SIGMOD’08 Presented by Kisung Kim. Introduction. Parallel processing continues to be important in large data warehouses
E N D
Handling Data Skew in Parallel Joins in Shared-Nothing Systems Yu Xu, PekkaKostamaa, XinZhou (Teradata) Liang Chen (University of California) SIGMOD’08 Presented by Kisung Kim
Introduction • Parallel processing continues to be important in large data warehouses • Shared nothing architecture • Multiple nodes communicate via high-speed interconnect network • Each node has its own private memory and disks • Parallel Unit (PU) • Virtual processors doing the scans, joins, locking, transaction management,… • Relations are horizontally partitioned across all Pus • Hash partitioning is commonly used PU PU PU PU PU PU PU PU Data Data Data Data
Introduction • Partitioning column • R: x • S: y • Hash function • h(i) = i mod 3 + 1
Two Join Geographies • Redistribution plan • Redistribute the tables based on join attributes if they are not partitioned by the join attributes • Join is performed on each PU in parallel
Two Join Geographies • Duplication plan • Duplicate tuples of the smaller relation on each PU to all Pus • Join is performed on each PU in parallel
Redistribution Skew • Hot PU • After redistribution, some PUs have larger number of tuples than others • Performance bottleneck in the whole system • Relations with many rows with the same value in the join attributes • Adding more nodes will not solve the skew problem • Examples • In travel booking industry, a big customer often makes a large number of reservations on behalf of its end users • In online e-commerce, a few professionals make millions of transactions a year • …
Redistribution Skew • Relations in these applications are almost evenly partitioned • When the join attribute is a non-partitioning column attribute, severe redistribution skew happens • Duplication plan can be a solution only when one join relation is fairly small • Our solution • Partial Redistribution & Partial Duplication (PRPD) join
PRPD Join • Assumptions • DBAs evenly partition their data for efficient parallel processing • Skewed rows tend to be evenly partitioned on each PU • The system knows the set of skewed values • Intuition • Deal with the skewed rows and non-skewed rows of R differently
PRPD • L1: set of skewed values R.a • L2: set of skewed values S.b • Step 1 • Scan Riand split the rows into three sets • Ri2-loc: all skewed rows of Ri • Ri2-dup: every rows of Riwhose R.a value matches any value in L2 • Ri2-redis: all other rows of Ri • Three spools for each PUi • Riloc: all rows from Ri2-loc • Ridup: all rows of R duplicated to PUi • Riredis: all rows of R redistributed to Pui • Similarly on S Kept Locally Duplicated to all PUs Hash redistributed on R.a
PRPD: Example L1 = {1} L2 = {2}
PRPD Step 1 Ri2-loc : Store Locally R12-loc R1loc PU1 PU1 R12-dup R1dup R12-redis R1redis R22-loc R2loc PU2 PU2 R22-dup R2dup R22-redis R2redis R32-loc R3loc PU3 PU3 R32-dup R3dup R32-redis R3redis
PRPD Step 1 Ri2-dup : Duplicate R12-loc R1loc PU1 PU1 R12-dup R1dup R12-redis R1redis R22-loc R2loc PU2 PU2 R22-dup R2dup R22-redis R2redis R32-loc R3loc PU3 PU3 R32-dup R3dup R32-redis R3redis
PRPD Step 1 Ri2-redis : Redistribute R12-loc R1loc PU1 PU1 R12-dup R1dup R12-redis R1redis R22-loc R2loc PU2 PU2 R22-dup R2dup R22-redis R2redis R32-loc R3loc PU3 PU3 R32-dup R3dup R32-redis R3redis
PRPD Step 2 • On each PUi, PU1 R1loc S1loc R1dup S1dup R1redis S1redis
PRPD • All sub-steps in each step can run in parallel • Overlapping skewed values • The overlapping skew values Ri2-loc or Ri2-dup ? • System chooses to include the overlapping skewed value in only one of L1 and L2 • Calculate the size of rows and choose small one
Comparison with Redistribution Plan • Use more total spool space than redistribution plan • PRPD duplicate some rows • Less networking cost • Keep the skewed rows locally • PRPD does not send all skewed rows to a single PU Ri2-loc Keep locally , less network cost Ri2-dup Duplicate, more spool space Ri2-redis Same as redistribution plan
Comparison with Duplication Plan • Less spool space than duplication plan • Partial duplication • More networking cost • When data skew is not significant • PRPD plan needs to redistribute a large relation • Less join cost • Duplication plan always joins a complete copy of the duplicated relation
PRPD: Hybrid of Two Plans • L1= Ø, L2=Ø • Same as redistribution plan • L1=Uniq(R.a)⊃Uniq(S.b) • Same as duplication plan (duplicate S)
PRPD: Hybrid of Two Plans • n: the number of PUs • x: percentage of the skewed rows in a relation R • The number of rows of R after redistribution in redistribution • Hot PU: • Non-hot PU: • The number of rows of R after redistribution in PRPD • Hot PU: • Ratio of the number of rows of hot PU in redistribution over the number of rows of R in PPRD
Experimental Evaluation • Compare PRPD with redistribution plan • Redistribution plan is more widely used than duplication plan • Schema & test query
Generating Skewed Data • Originally 25 unique nations in TPC-H • We increased the number of unique nations to 1000 • 5% skewness
Query Execution Time • 10 nodes, 80 PUs • Node • Pentium IV 3.6 GHz CPUs, 4GB memory, 8 PUs • 1 million rows for Supplier relation • 1 million rows for Customer relation • The size of query result is around 1 billion rows
Query Execution Time • 1 Hot PUs
Query Execution Time • 2 Hot PUs
Different Number of PUs • Speedup ratio of PRPD over redistribution plan • As the skewness increases, the speedup ratio increases • The larger the system, the larger the speed up
Conclusions • Effectively handle data skew in joins • Important challenges in parallel DBMS • We propose PRPD join • Hybrid of redistribution and duplication plan • PRPD also can be used in multiple joins