1 / 26

Parallel Database Systems

Parallel Database Systems. The Future Of High Performance Database Systems David Dewitt and Jim Gray 1992 Presented By – Ajith Karimpana. Parallel Databases. History of Parallel Databases Why Parallel Databases ? How are they implemented ? Where are they implemented ?

codys
Download Presentation

Parallel Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Database Systems The Future Of High Performance Database Systems David Dewitt and Jim Gray 1992 Presented By – Ajith Karimpana

  2. Parallel Databases • History of Parallel Databases • Why Parallel Databases ? • How are they implemented ? • Where are they implemented ? • Future of Parallel Databases

  3. Parallel Databases Parallel Databases The History

  4. History of Parallel Databases • Mainframes dominated most database and transaction processing tasks. • Parallel Machines were practically written off. • Specialized Database Machines came up with trendy hardware. • Relational Data Model brought about a revolution.

  5. History of Parallel Databases Relational Data Model Revolution • Uniform operations applied to uniform streams of data. • Each operator produces a new relation. • Pipelined Parallelism • Partitioned Parallelism

  6. History of Parallel Databases Pipelined Parallelism Streaming the output of one operator into the output of another operator. Partitioned Parallelism Partitioning the input data among multiple processors and memories, such that an operator is split into many independent operators each working on a part of the data.

  7. Parallel Databases Parallel Databases WHY ?

  8. Parallel Databases – Why ? The Philosophy – The ideal database machine would be a single infinitely fast processor with an infinite memory with infinite bandwidth – and it would be infinitely cheap (free). But do we have such an ideal machine ? NO So the challenge is to build an infinitely fast processor out of infinitely many processors of finite speed, and to build an infinitely large memory with infinite memory bandwidth from infinitely many storage units of finite speed. Answer To This Challenge – Parallel Databases

  9. Parallel Databases Parallel Databases The Implementation

  10. Parallel Databases- Implementation Parallel Database Implementation – The Basic Techniques Two Key Properties -

  11. Parallel Databases- Implementation Two Kinds of Scale up – • Batch – Same query running on N-times larger database. • Transactional – N-times as many clients, submitting N-times as many requests against an N-times larger database.

  12. Parallel Databases- Implementation Threats To Linear Speedup/Scale up

  13. Parallel Databases- Implementation Hardware Architecture Shared Memory Shared Disk

  14. Parallel Databases- Implementation Hardware Architecture Shared Nothing

  15. Parallel Databases- Implementation Parallel Dataflow Approach To SQL Software • SQL data model was originally proposed to improve programmer productivity by offering a nonprocedural database language. • SQL came with Data Independence since the programs do not specify how the query is to be executed. • Relational Queries with their properties can be executed as a dataflow graph and can use both pipelined and partitioned parallelism.

  16. Parallel Databases- Implementation Data Partitioning • Partitioning a relation involves distributing its tuples over several disks. • Three Kinds – • Round-robin Partitioning • Range Partitioning • Hashing Partitioning

  17. Parallel Databases- Implementation Range Round-Robin Hashing

  18. Parallel Databases- Implementation • Round-Robin • Ideal for applications that wish to read entire relation sequentially for each query. • Not ideal for point and range queries, since each of the n disks must be searched. Hash • Ideal for point queries based on the partitioning attribute. • Ideal for sequential scans of the entire relation. • Not ideal for point queries on non-partitioning attributes. • Not ideal for range queries on the partitioning attribute. • Range • Ideal for point and range queries on the partitioning attribute.

  19. Parallel Databases- Implementation Handling Of Skew The distribution of tuples when a relation is partitioned (except for Round-Robin) may be skewed, with a high percentage of tuples placed in some partitions and fewer tuples in other partitions. 2 Kinds – Data Skew (Attribute-value Skew) Execution Skew (Partition Skew)

  20. Parallel Databases- Implementation Parallelism With Relational Operators Consider a simple sequential query –

  21. Parallel Databases- Implementation A Relational Dataflow Graph

  22. Parallel Databases- Implementation

  23. Parallel Databases- Implementation Famous Implementations Of Parallel Databases • Teradata • Tandem NonStop SQL • Gamma • The Super Database Computer • Bubba • nCUBE

  24. Parallel Databases Parallel Databases The Future

  25. Parallel Databases- The Future Research Problems • Parallel Query Optimization • Application Program Parallelism • Physical Database Design • On-line Data Reorganization and Utilities Future Directions • Many commercial success stories. • But research issues still remain unresolved. • Some applications are not well supported by relational data model. • Object-oriented design ??

  26. Parallel Databases Thank You Grilling Time !!

More Related