270 likes | 487 Views
Parallel Database System: The Future of High Performance Database Systems. Present by: Suresh Babu L. Outline . Why parallel Databases? Scale up and Speedup Parallel DB’s Architectures Parallel Data Flow Data Partitioning Parallelism with Relational Operators The State of the Art.
E N D
Parallel Database System: The Future of High Performance Database Systems Present by: Suresh Babu L
Outline • Why parallel Databases? • Scale up and Speedup • Parallel DB’s Architectures • Parallel Data Flow • Data Partitioning • Parallelism with Relational Operators • The State of the Art
Why Parallel Databases? Edgar F.Codd
1,000 x parallel 100 second SCAN. 1 Terabyte 1 Terabyte BANDWIDTH 10 GB/s 10 MB/s Parallelism: divide a big problem into many smaller ones to be solved in parallel. Parallel Access to Data At 10 MB/s 1.2 days to scan
Pipeline Any Any Sequential Sequential Program Program Sequential Sequential Any Any Sequential Sequential Sequential Sequential Partition outputs split N ways inputs merge M ways Program Program Parallel DBMS: Intro • Pipeline parallelism: • Pipeline partition:
Pipelined and Partitioned Parallelism • Both are natural in DBMS! Pipeline parallelism Partitioned data allows partitioned parallelism Merge Sort Sort Sort Sort Sort Scan Scan Scan Scan Scan Source Data Source Data Source Data Source Data Source Data
Scale-Up And Speed-Up • Speedup • Scale-up: 1TB 100GB 100GB 100GB
A Bad Speedup Curve 3-Factors Interference Skew Startup Processers & Discs Barriers to Achieving Linear Speedup and Scaleup
Architectures for Parallel DBs • Shared memory: • Shared –disks: IBM/370 ,Sequent, SGI, Sun VMScluster, Sysplex
Architectures for Parallel DBs(contd.) • Shared Nothing: Tandem, Teradata, SP2
Architectures (contd.) • Shared Nothing • Teradata: 400 nodes • 80x12 nodes • Tandem: 110 nodes • IBM / SP2 / DB2: 128 nodes • Informix/SP2 100 nodes • ATT & Sybase 8x14 nodes • Shared Disk • Oracle 170 nodes • Rdb 24 nodes • Shared Memory • Informix 9 nodes • RedBrick ? nodes
Parallel Data Flow and Relational Systems Merge Sort Sort Sort Sort Scan Scan Scan Scan Source Data Source Data Source Data Source Data
Data Partitioning • Three main techniques: • Round Robin • Hash Partitioning • Range partitioning
Round Robin Partitioning …. P2 P1 Pn …..
Hash Partitioning …. P2 P1 Pn
Range Partitioning …. …… P2 P1 Pn a….c d…..g w…z
Parallelism with Relational Operators • Two basic operations: • Merge • Split
Split Operation • Split • Used to partition or replicate the stream produced by a relational operator
Example of Parallelizing Relational Operators C A B INSERT JOIN SCAN SCAN
The State of the Art • Teradata • Tandem Nonstop sql • Gamma • The super database computer • Bubba
Specialized Parallel Relational Operators • Algorithms for traditional relational operators written to improve their parallel execution, to better handle data and execution skew. • Look at join • Sort merge • Hash join
THANK YOU QUESTIONS ?