1 / 18

Parallel Computing Overview

Parallel Computing Overview. CS 524 – High-Performance Computing. Parallel Computing. Multiple processors that are able to work cooperatively to solve a computational problem

ilar
Download Presentation

Parallel Computing Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Computing Overview CS 524 – High-Performance Computing

  2. Parallel Computing • Multiple processors that are able to work cooperatively to solve a computational problem • Example of parallel computing include specially designed parallel computers and algorithms to geographically distributed network of workstations cooperating on a task • There are problems that cannot be solved by present-day serial computers or they take an impractically long time to solve • Parallel computing exploits concurrency and parallelism inherent in the problem domain • Task parallelism • Data parallelism CS 524 (Au 05-06)- Asim Karim @ LUMS

  3. Development Trends • Advances in IC technology and processor design • CPU performance double every 18 months for past 20+ years (Moore’s Law) • Clock rates increase from 4.77 MHz for 8088 (1979) to 3.6 GHz for Pentium 4 (2004) • FLOPS increase from a handful (1945) to 35.86 TFLOPS (Earth Simulator by NEC, 2002 to date) • Decrease in cost and size • Advances in computer networking • Bandwidth increase from a few bits per second to > 10 Gb/s • Decrease in size and cost, and increase in reliability • Need • Solution of larger and more complex problems CS 524 (Au 05-06)- Asim Karim @ LUMS

  4. Issues in Parallel Computing • Parallel architectures • Design of bottleneck-free hardware components • Parallel programming models • Parallel view of problem domain for effective partitioning and distribution of work among processors • Parallel algorithms • Efficient algorithms that take advantage of parallel architectures • Parallel programming environments • Programming languages, compilers, portable libraries, development tools, etc CS 524 (Au 05-06)- Asim Karim @ LUMS

  5. Two Key Algorithm Design Issues • Load balancing • Execution time of parallel programs is the time elapsed from start of processing by the first processor to end of processing by the last processor • Partitioning of computational load among processors • Communication overhead • Processors are much faster than communication links • Partitioning of data among processors CS 524 (Au 05-06)- Asim Karim @ LUMS

  6. Parallel MVM: Row-Block Partition do i = 1, N do j = 1, N y(i) = y(i)+A(i,j)*x(j) end do end do P0 P1 P2 P3 x j P0 P1 i P2 P3 y A CS 524 (Au 05-06)- Asim Karim @ LUMS

  7. Parallel MVM: Column-Block Partition do j = 1, N do i = 1, N y(i) = y(i)+A(i,j)*x(j) end do end do P0 P1 P2 P3 x j P0 P1 i P2 P3 y A CS 524 (Au 05-06)- Asim Karim @ LUMS

  8. Parallel MVM: Block Partition • Can we do any better? • Assume same distribution of x and y • Can A be partitioned to reduce communication? P0 P1 P2 P3 x j P0 P0 P1 P1 i P2 P2 P3 P3 y A CS 524 (Au 05-06)- Asim Karim @ LUMS

  9. Parallel Architecture Models • Bus-based shared memory or symmetric multiprocessor [SMP] (e.g. suraj, dual/quad processor Xeon machines) • Network-based distributed-memory (e.g. Cray T3E, our linux cluster) • Network-based distributed-shared-memory (e.g. SGI Origin 2000) • Network-based distributed shared-memory (e.g. SMP clusters) CS 524 (Au 05-06)- Asim Karim @ LUMS

  10. Bus-Based Shared-Memory (SMP) P P P P • Any processor can access any memory location at equal cost (symmetric multiprocessor) • Tasks “communicate” by writing/reading commonly accessible locations • Easier to program • Cannot scale beyond 30 processors (bus bottleneck) • Examples: most workstation vendors make SMPs (Sun, IBM, Intel-based SMPs), Cray T90, SV1 (uses cross-bar) Shared memory Bus CS 524 (Au 05-06)- Asim Karim @ LUMS

  11. Network-Connected Distributed-Memory P P P P • Each processor can only access own memory • Explicit communication by sending and receiving messages • More tedious to program • Can scale to thousand of processors • Examples: Cray T3E, clusters M M M M Interconnection network CS 524 (Au 05-06)- Asim Karim @ LUMS

  12. Network-Connected Distributed-Shared-Memory Interconnection network P P P P • Each processor can directly access any memory location • Physically distributed memory • Non-uniform memory access costs • Example: SGI Origin 2000 M M M M CS 524 (Au 05-06)- Asim Karim @ LUMS

  13. Network-Connected Distributed Shared-Memory P P P P • Network of SMPs • Each SMP can only access own memory • Explicit communication between SMPs • Can take advantage of both shared-memory and distributed-memory programming models • Can scale to hundreds of processors • Examples: SMP clusters M M Bus Bus Interconnection network CS 524 (Au 05-06)- Asim Karim @ LUMS

  14. Parallel Programming Models • Global-address (or shared-address) space model • POSIX threads (PThreads) • OpenMP • Message passing (or distributed-address) model • MPI (message passing interface) • PVM (parallel virtual machine) • Higher level programming environments • High-Performance Fortran (HPF) • PETSc (portable extensible toolkit for scientific computation) • POOMA (parallel object-oriented methods and applications) CS 524 (Au 05-06)- Asim Karim @ LUMS

  15. Other Parallel Programming Models • Task and channel • Similar to message passing • Instead of communicating between named tasks (as in message passing model), it communicates through named channels • SPMD (single program multiple data) • Each processor executes the same program code that operates on different data • Most message passing programs are SPMD • Data parallel • Operations on chunks of data (e.g. arrays) are parallelized • Grid • Problem domain viewed in parcels with processing for parcel(s) allocated to different processors CS 524 (Au 05-06)- Asim Karim @ LUMS

  16. Example real a(n,n), b(n,n) do k = 1, NumIter do i = 2, n-1 do j = 2, n-1 a(i,j) = (b(i-1,j) + b(i,j-1 + b(i+1,j) + b(i,j+1))/4 end do end do do i = 2, n-1 do j = 2, n-1 b(i,j) = a(i,j) end do end do end do CS 524 (Au 05-06)- Asim Karim @ LUMS

  17. Shared-Address Space Model: OpenMP real a(n,n), b(n,n) c$omp parallel shared(a,b,k) private(i,j) do k = 1, NumIter c$omp do do i = 2, n-1 do j = 2, n-1 a(i,j) = (b(i-1,j) + b(i,j-1) + b(i+1,j) + b(i,j+1))/4 end do end do c$omp do do i = 2, n-1 do j = 2, n-1 b(i,j) = a(i,j) end do end do end do CS 524 (Au 05-06)- Asim Karim @ LUMS

  18. Message Passing Pseudo-code real aLoc(NdivP,n), bLoc(0:NdivP+1,n) me = get_my_procnum() do k = 1, NumIter if (me .ne. P-1) send(me+1, bLoc(NdivP, 1:n)) if (me .ne. 0) recv(me-1, bLoc(0, 1:n)) if (me .ne. 0) send(me-1, bLoc(1, 1:n)) if (me .ne. P-1) recv(me+1, bLoc(NdivP+1, 1:n)) if (me .eq. 0) then ibeg = 2 else ibeg = 1 endif if (me .eq. P-1) then iend = NdivP-1 else iend = NdivP endif do i = ibeg, iend do j = 2, n-1 aLoc(i,j) = (bLoc(i-1,j) + bLoc(i,j-1) + bLoc(i+1,j) + bLoc(i,j+1))/4 end do end do do i = ibeg, iend do j = 2, n-1 bLoc(i,j) = aLoc(i,j) end do end do end do CS 524 (Au 05-06)- Asim Karim @ LUMS

More Related