400 likes | 416 Views
Understand MPI principles and explore Co-Array Fortran, Unified Parallel C, and Titanium for efficient parallel computing on distributed memory clusters.
E N D
Chapter 7:MPI and Other Local View Languages Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder
Figure 7.2 Replacement code (for lines 16–48 of Figure 7.1) to distribute data using a scatter operation.
Figure 7.3 Each message must be copied as it moves across four address spaces, each contributing to the overall latency.
Code Spec 7.11MPI_Bcast(). MPI routine to broadcast data from one root process to all other processes in the communicator.
Figure 7.4 Example of collective communication within a group.
Figure 7.5 A 2D relaxation replaces—on each iteration—all interior values by the average of their four nearest neighbors.
Figure 7.6 MPI code for the main loop of the 2D SOR computation.
Figure 7.6 MPI code for the main loop of the 2D SOR computation. (cont.)
Figure 7.6 MPI code for the main loop of the 2D SOR computation. (cont.)
Figure 7.8 A 2D SOR MPI program using non-blocking sends and receives.
Figure 7.8 A 2D SOR MPI program using non-blocking sends and receives. (cont.)
Figure 7.8 A 2D SOR MPI program using non-blocking sends and receives. (cont.)
Partitioned Global Address Space Languages Higher level of abstraction Built on top of distributed memory clusters Considered a single address space Allows definition of global data structures Must consider local vs global data No longer consider message passing details or distributed data structures Use a more efficient one sided substrate
Main PGAS • Co-Array Fortran • https://bluewaters.ncsa.illinois.edu/caf • Unified Parallel C • http://upc.lbl.gov/ • Titanium • http://titanium.cs.berkeley.edu/
Co-Array Fortran (CAF) • Extends FORTRAN • Originally called F - - • Elegant and simple • Uses co-array (communication array) • Real, dimension (n,n)[p,*]:: a, b, c • a, b, c are co-arrays • Memory for co-array is dist across each process determined by the dimension statement
Unified Parallel C (UPC) • Global view of address space • Shared arrays are distributed in cyclic or block cyclic arrangement (aides load balancing) • Supports pointers (C) 4 types • private private • shared private • private shared • shared shared
C pointers • Private pointer pointing locally • int *p1; • Private pointer pointing to shared space • shared int *p2; • Shared pointer pointing locally • int *shared p3; • Shared pointer pointing into shared space • shared int *shared p4;
UPC • Has a forall verb • upc_forall • Distributes normal C for loop iterations across all processes • A global operation whereas most other operations are local
Titanium • Extends java • Object oriented • Adds regions • Supports safe memory management • Unordered iteration • Foreach • Allows concurrency over multiple indices in a block