Handling Data Skew in Parallel Joins in Shared-Nothing Systems

Handling Data Skew in Parallel Joins in Shared-Nothing Systems Yu Xu, PekkaKostamaa, XinZhou (Teradata) Liang Chen (University of California) SIGMOD’08 Presented by Kisung Kim

Introduction • Parallel processing continues to be important in large data warehouses • Shared nothing architecture • Multiple nodes communicate via high-speed interconnect network • Each node has its own private memory and disks • Parallel Unit (PU) • Virtual processors doing the scans, joins, locking, transaction management,… • Relations are horizontally partitioned across all Pus • Hash partitioning is commonly used PU PU PU PU PU PU PU PU Data Data Data Data

Introduction • Partitioning column • R: x • S: y • Hash function • h(i) = i mod 3 + 1

Two Join Geographies • Redistribution plan • Redistribute the tables based on join attributes if they are not partitioned by the join attributes • Join is performed on each PU in parallel

Two Join Geographies • Duplication plan • Duplicate tuples of the smaller relation on each PU to all Pus • Join is performed on each PU in parallel

Redistribution Skew • Hot PU • After redistribution, some PUs have larger number of tuples than others • Performance bottleneck in the whole system • Relations with many rows with the same value in the join attributes • Adding more nodes will not solve the skew problem • Examples • In travel booking industry, a big customer often makes a large number of reservations on behalf of its end users • In online e-commerce, a few professionals make millions of transactions a year • …

Redistribution Skew • Relations in these applications are almost evenly partitioned • When the join attribute is a non-partitioning column attribute, severe redistribution skew happens • Duplication plan can be a solution only when one join relation is fairly small • Our solution • Partial Redistribution & Partial Duplication (PRPD) join

PRPD Join • Assumptions • DBAs evenly partition their data for efficient parallel processing • Skewed rows tend to be evenly partitioned on each PU • The system knows the set of skewed values • Intuition • Deal with the skewed rows and non-skewed rows of R differently

PRPD • L1: set of skewed values R.a • L2: set of skewed values S.b • Step 1 • Scan Riand split the rows into three sets • Ri2-loc: all skewed rows of Ri • Ri2-dup: every rows of Riwhose R.a value matches any value in L2 • Ri2-redis: all other rows of Ri • Three spools for each PUi • Riloc: all rows from Ri2-loc • Ridup: all rows of R duplicated to PUi • Riredis: all rows of R redistributed to Pui • Similarly on S  Kept Locally  Duplicated to all PUs  Hash redistributed on R.a

PRPD: Example L1 = {1} L2 = {2}

PRPD Step 1 Ri2-loc : Store Locally R12-loc R1loc PU1 PU1 R12-dup R1dup R12-redis R1redis R22-loc R2loc PU2 PU2 R22-dup R2dup R22-redis R2redis R32-loc R3loc PU3 PU3 R32-dup R3dup R32-redis R3redis

PRPD Step 1 Ri2-dup : Duplicate R12-loc R1loc PU1 PU1 R12-dup R1dup R12-redis R1redis R22-loc R2loc PU2 PU2 R22-dup R2dup R22-redis R2redis R32-loc R3loc PU3 PU3 R32-dup R3dup R32-redis R3redis

PRPD Step 1 Ri2-redis : Redistribute R12-loc R1loc PU1 PU1 R12-dup R1dup R12-redis R1redis R22-loc R2loc PU2 PU2 R22-dup R2dup R22-redis R2redis R32-loc R3loc PU3 PU3 R32-dup R3dup R32-redis R3redis

PRPD Step 1

PRPD Step 2 • On each PUi, PU1 R1loc S1loc R1dup S1dup R1redis S1redis

PRPD • All sub-steps in each step can run in parallel • Overlapping skewed values • The overlapping skew values  Ri2-loc or Ri2-dup ? • System chooses to include the overlapping skewed value in only one of L1 and L2 • Calculate the size of rows and choose small one

Comparison with Redistribution Plan • Use more total spool space than redistribution plan • PRPD duplicate some rows • Less networking cost • Keep the skewed rows locally • PRPD does not send all skewed rows to a single PU Ri2-loc  Keep locally , less network cost Ri2-dup  Duplicate, more spool space Ri2-redis  Same as redistribution plan

Comparison with Duplication Plan • Less spool space than duplication plan • Partial duplication • More networking cost • When data skew is not significant • PRPD plan needs to redistribute a large relation • Less join cost • Duplication plan always joins a complete copy of the duplicated relation

PRPD: Hybrid of Two Plans • L1= Ø, L2=Ø • Same as redistribution plan • L1=Uniq(R.a)⊃Uniq(S.b) • Same as duplication plan (duplicate S)

PRPD: Hybrid of Two Plans • n: the number of PUs • x: percentage of the skewed rows in a relation R • The number of rows of R after redistribution in redistribution • Hot PU: • Non-hot PU: • The number of rows of R after redistribution in PRPD • Hot PU: • Ratio of the number of rows of hot PU in redistribution over the number of rows of R in PPRD

Experimental Evaluation • Compare PRPD with redistribution plan • Redistribution plan is more widely used than duplication plan • Schema & test query

Generating Skewed Data • Originally 25 unique nations in TPC-H • We increased the number of unique nations to 1000 • 5% skewness

Query Execution Time • 10 nodes, 80 PUs • Node • Pentium IV 3.6 GHz CPUs, 4GB memory, 8 PUs • 1 million rows for Supplier relation • 1 million rows for Customer relation • The size of query result is around 1 billion rows

Query Execution Time • 1 Hot PUs

Query Execution Time • 2 Hot PUs

Different Number of PUs • Speedup ratio of PRPD over redistribution plan • As the skewness increases, the speedup ratio increases • The larger the system, the larger the speed up

Conclusions • Effectively handle data skew in joins • Important challenges in parallel DBMS • We propose PRPD join • Hybrid of redistribution and duplication plan • PRPD also can be used in multiple joins

Thank you

Handling Data Skew in Parallel Joins in Shared-Nothing Systems

Handling Data Skew in Parallel Joins in Shared-Nothing Systems

Presentation Transcript

Data Handling in Science

Concurrency in Shared Memory Systems

Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes ADBI

I-Files: Handling Intermediate Data In Parallel Dataflow Graphs

Joins in mapreduce

Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems

Efficient Parallel kNN Joins for Large Data in MapReduce

Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems

Shared Nothing Architecture

Joins in Hadoop

Adaptive proxies: handling widely-shared data in shared-memory multiprocessors

Handling Spatial Data In P2P Systems

Distributed Systems: Shared Data

Security of Shared Data in Large Systems

Parallel Shared Memory

Compressing Data Cube in Parallel OLAP Systems

Nothing In

handling data in data warehouse technology

Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems