190 likes | 323 Views
Elastic Data Partitioning for Cloud-based SQL Processing Systems. Lipyeow Lim Information & Computer Science Department University of Hawai`i at M ā noa. Outline. Shared Nothing Parallel DBMS. results. query. Parallel DB layer. DBMS. DBMS. DBMS. DBMS. Network.
E N D
Elastic Data Partitioning for Cloud-based SQL Processing Systems Lipyeow Lim Information & Computer Science Department University of Hawai`i at Mānoa Lipyeow Lim -- University of Hawai`i at Manoa
Outline Lipyeow Lim -- University of Hawai`i at Manoa
Shared Nothing Parallel DBMS • results • query Parallel DB layer • DBMS • DBMS • DBMS • DBMS Network Lipyeow Lim -- University of Hawaii at Manoa
Cloud-basedArchitecture Virtual Machines CPU CPU CPU CPU Memory Memory Memory Memory Disk Disk Disk Disk (Virtualized) Network Amazon EC2 Physical Resources Lipyeow Lim -- University of Hawaii at Manoa
“Scaling” Up and Down • results • query Parallel DB layer • DBMS • DBMS • DBMS • DBMS • DBMS • DBMS Network Lipyeow Lim -- University of Hawaii at Manoa
Problem Statement Given • A relation T • A partitioning function F on a fixed partitioning key • An initial number p of partitions/fragments • An initial mapping of p fragments to p database nodes • A target number q of partitions Find • a mapping of {T1, T2, .. Tp} to {T1, T2, ... Tq} and • an assignment of the q fragments to q database nodes Such that we minimize • The number of tuples re-partitioned • The number of tuples moved between database nodes Lipyeow Lim -- University of Hawai`i at Manoa
Partitioning a Relation 2 2 4 6 7 7 7 13 20 : 2 2 4 6 7 7 7 13 20 : • Partitioning attribute/key. • Partitioning type. Eg. Range or Hash • Partitioning constraint. Eg. Equi-width, equi-size • Number of partitions/fragments. hash function Lipyeow Lim -- University of Hawai`i at Manoa
Horizontal Fragmentation: Range Partition Partition 1 Partition 2 • Range Partition on rating column • Partition 1: 0 <= rating < 5 • Partition 2: 5 <= rating <= 10 Lipyeow Lim -- University of Hawaii at Manoa
Range Partition: Query Processing Partition 1 • Which partitions? • Better than non-parallel ? SELECT * FROM Sailors S SELECT * FROM Sailors S WHERE rating = 2 Partition 2 SELECT * FROM Sailors S WHERE age > 30 SELECT * FROM Sailors S WHERE rating < 2 and age < 30 Lipyeow Lim -- University of Hawaii at Manoa
Horizontal Fragmentation: Hash Partition Partition 1 • Hash partitioning using hash function • Partition = rating mod 2 Partition 2 Lipyeow Lim -- University of Hawaii at Manoa
Hash Partition: Query Processing Partition 1 • Which partitions? • Better than non-parallel ? SELECT * FROM Sailors S SELECT * FROM Sailors S WHERE rating = 2 Partition 2 SELECT * FROM Sailors S WHERE age > 30 SELECT * FROM Sailors S WHERE rating < 2 and age < 30 Lipyeow Lim -- University of Hawaii at Manoa
Method N: Naive Resize Lipyeow Lim -- University of Hawai`i at Manoa
Method C : Chunk-based Lipyeow Lim -- University of Hawai`i at Manoa
Method T : Tree-based Lipyeow Lim -- University of Hawai`i at Manoa
Method H : Hash-based Lipyeow Lim -- University of Hawai`i at Manoa