Bulk Synchronous Parallel Processing Model

Bulk Synchronous Parallel Processing Model Jamie Perkins

Overview • Four W’s – Who, What, When and Why • Goals for BSP • BSP Design and Program • Cost Functions • Languages and Machines

A Bridge for Parallel Computation • Von Neumann model • Designed to insulate hardware and software • BSP model (Bulk Synchronous Parallel) • Proposed by Leslie Valiant of Harvard University in 1990 • Developed by W.F. McColl of Oxford • Designed to be a “bridge” for parallel computation

Goals for BSP • Scalability – performance of HW & SW must be scalable from a single processor to thousands of processors • Portability – SW must run unchanged, with high performance, on any general purpose parallel architecture • Predictability – performance of SW on different architecture must be predictable in a straight forward way

BSP Design • Three Components • Node • Processor and Local Memory • Router or Communication Network • Message Passing or Point-to-Point communication • Barrier or Synchronization Mechanism • Implemented in hardware

BSP Design • Fixed memory architecture • Hashing to allocate memory in “random” fashion • Fast access to local memory • Uniformly slow access to remote memory

P P P M M M Illustration of BSP Computer Node Node Node Barrier Communication Network http://peace.snu.ac.kr/courses/parallelprocessing/

BSP Program • Composed of S supersteps • Superstep consists of: • A computation where each processor uses only locally held values • A global message transmission from each processor to any subset of the others • A barrier synchronization

Strategies for programming on BSP • Balance the computation between processes • Balance the communication between processes • Minimize the number of supersteps

BSP Program P1 P2 P3 P4 Superstep 1 Computation Communication Barrier Superstep 2 http://peace.snu.ac.kr/courses/parallelprocessing/

Advantages of BSP • Eliminates need for programmers to manage memory, assign communication and perform low-level synchronization (w/ sufficient parallel slackness) • Synchronization allows automatic optimization of the communication pattern • BSP model provides a simple cost function for analyzing the complexity of algorithms

Cost Function • g – “gap” or bandwidth inefficiency • L – “latency”, minimum time needed for one superstep • w – largest amount of work performed (per processor) • h – largest number of packets sent or received wi + ghi + L = execution time for the superstep i

BSP ++ C C++ Fortran JBSP Opal IBM SP1 SGI Power Challenge (Shared Memory) Cray T3D Hitachi SR2001 TCP/IP Languages & Machines

Thank You Any Questions

References • http://peace.snu.ac.kr/courses/parallelprocessing/ • http://wwwcs.uni-paderborn.de/fachbereich/AG/agmad • http://www.cs.mu.oz.au/677/notes/node41.html • McColl, W.F. The BSP Approach to Architecture Independent Parallel Programming. Technical report, Oxford University Computing Laboratory, Dec. 1994 • United States Patent 5083265 • Valiant, L.G. A Bridging Model for Parallel Computation. Communications of the ACM 33,8 (1990), 103-111.

Bulk Synchronous Parallel Processing Model

Bulk Synchronous Parallel Processing Model

Presentation Transcript

CS575 Parallel Processing

Parallel Processing

Parallel Image Processing

Parallel operation of synchronous generators

Parallel Processing

PARALLEL PROCESSING

Bulk-Synchronous Parallel ML Semantics and Implementation of the Parallel Juxtaposition

Parallel Processing

Parallel Processing

Bulk Synchronous Parallel Computing

Bulk Synchronous Parallel Processing Model

Bulk Synchronous Processing (BSP) Model

Blended Synchronous Learning Model

Parallel Processing

Parallel Processing

Extending the Unified Parallel Processing Speedup Model

Bulk-Synchronous Parallel ML Implementation of the Parallel Superposition

Bulk Synchronous Parallel (BSP) Model

A Polymorphic Type System for Bulk Synchronous Parallel ML

Bulk Synchronous Parallel (BSP) Model

Parallel Processing