1 / 31

Parallel Systems

Parallel Systems. Dr. Guy Tel- Zur. Agenda. Barnes-Hut (final remarks) Continue slides5 from previous lecture MPI Virtual Topologies Scalapack Mixing programming languages Impressions from the SC12 conference Home Assignment #2. נא לחשוב במרץ ולשלוח אלי הצעות למצגת גמר!!!.

donar
Download Presentation

Parallel Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Systems Dr. Guy Tel-Zur

  2. Agenda • Barnes-Hut (final remarks) • Continue slides5 from previous lecture • MPI Virtual Topologies • Scalapack • Mixing programming languages • Impressions from the SC12 conference • Home Assignment #2 נא לחשוב במרץ ולשלוח אלי הצעות למצגת גמר!!!

  3. A simple Jacobi iteration • Analyze program from: http://www.mcs.anl.gov/research/projects/mpi/tutorial/mpiexmpl/src/jacobi/C/main.html

  4. In this example, you will put together some of the previous examples to implement a simple Jacobi iteration for approximating the solution to a linear system of equations. In this example, we solve the Laplace equation in two dimensions with finite differences. This may sound involved, but really amount only to a simple computation, combined with the previous example of a parallel mesh data structure. Any numerical analysis text will show that iterating while (not converged) { for (i,j) xnew[i][j] = (x[i+1][j] + x[i-1][j] + x[i][j+1] + x[i][j-1])/4; for (i,j) x[i][j] = xnew[i][j]; }

  5. will compute an approximation for the solution of Laplace's equation. There is one last detail; this replacement of xnew with the average of the values around it is applied only in the interior; the boundary values are left fixed. In practice, this means that if the mesh is n by n, then the values x[0][j] x[n-1][j] x[i][0] x[i][n-1] are left unchanged. Of course, these refer to the complete mesh; you'll have to figure out what to do with for the decomposed data structures (xlocal). Because the values are replaced by averaging around them, these techniques are called relaxation methods. We wish to compute this approximation in parallel. Write a program to apply this approximation. For convergence testing, compute diffnorm = 0; for (i,j) diffnorm += (xnew[i][j] - x[i][j]) * (xnew[i][j] - x[i][j]); diffnorm = sqrt(diffnorm);

  6. You'll need to use MPI_Allreduce for this. (Why not use MPI_Reduce?) Have process zero write out the value of diffnorm and the iteration count at each iteration. When diffnorm is less that 1.0e-2, consider the iteration converged. Also, if you reach 100 iterations, exit the loop.

  7. For simplicity, consider a 12 x 12 mesh on 4 processors The example solution uses the boundary values from the previous exercise; they are -1 on the top and bottom, and the rank of the process on the side. The initial data (the values of x that are being relaxed) are also the same; the interior points have the same value as the rank of the process. This is shown below: -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

  8. #include "mpi.h" /* This example handles a 12 x 12 mesh, on 4 processors only. */ #define maxn 12 int main( argc, argv ) intargc; char **argv; { int rank, value, size, errcnt, toterr, i, j, itcnt; inti_first, i_last; MPI_Status status; double diffnorm, gdiffnorm; double xlocal[(12/4)+2][12]; double xnew[(12/3)+2][12]; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); MPI_Comm_size( MPI_COMM_WORLD, &size ); if (size != 4) MPI_Abort( MPI_COMM_WORLD, 1 );

  9. /* xlocal[][0] is lower ghostpoints, xlocal[][maxn+2] is upper */ /* Note that top and bottom processes have one less row of interior points */ i_first = 1; i_last = maxn/size; if (rank == 0) i_first++; if (rank == size - 1) i_last--; /* Fill the data as specified */ for (i=1; i<=maxn/size; i++) for (j=0; j<maxn; j++) xlocal[i][j] = rank; for (j=0; j<maxn; j++) { xlocal[i_first-1][j] = -1; xlocal[i_last+1][j] = -1; }

  10. itcnt = 0; do { /* Send up unless I'm at the top, then receive from below */ /* Note the use of xlocal[i] for &xlocal[i][0] */ if (rank < size - 1) MPI_Send( xlocal[maxn/size], maxn, MPI_DOUBLE, rank + 1, 0, MPI_COMM_WORLD ); if (rank > 0) MPI_Recv( xlocal[0], maxn, MPI_DOUBLE, rank - 1, 0, MPI_COMM_WORLD, &status ); /* Send down unless I'm at the bottom */ if (rank > 0) MPI_Send( xlocal[1], maxn, MPI_DOUBLE, rank - 1, 1, MPI_COMM_WORLD ); if (rank < size - 1) MPI_Recv( xlocal[maxn/size+1], maxn, MPI_DOUBLE, rank + 1, 1, MPI_COMM_WORLD, &status );

  11. /* Compute new values (but not on boundary) */ itcnt ++; diffnorm = 0.0; for (i=i_first; i<=i_last; i++) for (j=1; j<maxn-1; j++) { xnew[i][j] = (xlocal[i][j+1] + xlocal[i][j-1] + xlocal[i+1][j] + xlocal[i-1][j]) / 4.0; diffnorm += (xnew[i][j] - xlocal[i][j]) * (xnew[i][j] - xlocal[i][j]); } /* Only transfer the interior points */ for (i=i_first; i<=i_last; i++) for (j=1; j<maxn-1; j++) xlocal[i][j] = xnew[i][j]; MPI_Allreduce( &diffnorm, &gdiffnorm, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD ); gdiffnorm = sqrt( gdiffnorm ); if (rank == 0) printf( "At iteration %d, diff is %e\n", itcnt, gdiffnorm ); } while (gdiffnorm > 1.0e-2 && itcnt < 100); MPI_Finalize( ); return 0; }

  12. The Makefile # Generated automatically from Makefile.in by configure. ALL: jacobi SHELL = /bin/sh DIRS = jacobi: jacobi.c mpicc -o jacobijacobi.c -lm profile.alog: jacobi.c mpicc -o jacobi.log -mpilogjacobi.c -lm mpirun -np 4 jacobi.log /bin/mv jacobi.log_profile.log profile.alog clean: /bin/rm -f jacobijacobi.o jacobi.log #for dir in $(DIRS) ; do \ # ( cd $$dir ; make clean ) ; done

  13. MPI Virtual Topologies • http://www.pdc.kth.se/education/historical/previous-years-summer-schools/2009/handouts/lect5.pdf

  14. Cartesian Constructor intMPI_Cart_create ( MPI_Commcomm_old, intndims, int *dims, int *periods, int reorder, MPI_Comm *comm_cart)

  15. Cartesian Convenience - MPI_Dims_create intMPI_Dims_create( intnnodes, intndims, int *dims) MPI_Dims_create creates a division of processors in a cartesian grid.

  16. See more: http://www.rc.usf.edu/tutorials/classes/tutorial/mpi/chapter10.html

  17. #include "mpi.h" #include <stdio.h> #define SIZE 16 #define UP 0 #define DOWN 1 #define LEFT 2 #define RIGHT 3 int main(argc,argv) intargc; char *argv[]; { intnumtasks, rank, source, dest, outbuf, i, tag=1, inbuf[4]={MPI_PROC_NULL,MPI_PROC_NULL,MPI_PROC_NULL,MPI_PROC_NULL,}, nbrs[4], dims[2]={4,4}, periods[2]={0,0}, reorder=0, coords[2]; MPI_Requestreqs[8]; MPI_Status stats[8]; MPI_Commcartcomm; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD, &numtasks); An Example

  18. if (numtasks == SIZE) { MPI_Cart_create(MPI_COMM_WORLD, 2, dims, periods, reorder, &cartcomm); MPI_Comm_rank(cartcomm, &rank); MPI_Cart_coords(cartcomm, rank, 2, coords); MPI_Cart_shift(cartcomm, 0, 1, &nbrs[UP], &nbrs[DOWN]); MPI_Cart_shift(cartcomm, 1, 1, &nbrs[LEFT], &nbrs[RIGHT]); printf("rank= %d coords= %d %d neighbors(u,d,l,r)= %d %d %d %d\n", rank,coords[0],coords[1],nbrs[UP],nbrs[DOWN],nbrs[LEFT], nbrs[RIGHT]); outbuf = rank; for (i=0; i<4; i++) { dest = nbrs[i]; source = nbrs[i]; MPI_Isend(&outbuf, 1, MPI_INT, dest, tag, MPI_COMM_WORLD, &reqs[i]); MPI_Irecv(&inbuf[i], 1, MPI_INT, source, tag, MPI_COMM_WORLD, &reqs[i+4]); } MPI_Waitall(8, reqs, stats); printf("rank= %d inbuf(u,d,l,r)= %d %d %d %d\n", rank,inbuf[UP],inbuf[DOWN],inbuf[LEFT],inbuf[RIGHT]); } else printf("Must specify %d processors. Terminating.\n",SIZE); MPI_Finalize(); }

  19. Jacobi Iteration using MPI virtual topologies • Reference: Intermediate MPI from “Using MPI” book • http://www.mcs.anl.gov/research/projects/mpi/usingmpi/examples/intermediate/main.htm • Similar: an MPI parallelization of the solution of the 2D Poisson's equation • http://www.ee.bgu.ac.il/~tel-zur/2003B/lecture10.pdf

  20. Scalapack

  21. http://www.netlib.org/scalapack/ Hands-On Exercises for ScaLAPACK http://acts.nersc.gov/scalapack/hands-on/ http://www.netlib.org/scalapack/pblas_qref.html PBLAS Quick Reference Card

  22. C interface to BLACS http://www.netlib.org/blacs/cblacsqref.ps See next slide

  23. BLACS

  24. http://netlib.org/scalapack/slug/node184.html

  25. Mixing C and FORTRAN • Demo on my other laptop • ~/tests/cprog.c and ffunction.f • http://www.cae.tntech.edu/help/programming/mixed_languages • Important! Column-major order vs. Row-major order: C => Row, Fortran=>Column • Ref: http://en.wikipedia.org/wiki/Row-major_order

  26. Example: PDGEMV() The sample is using PDGEMV(), which computes a distributed matrix-vector product y =alphaAx+beta*y. A= [1 4 7 10 13 ; 3 6 9 12 15 ; 5 8 11 14 17 ; 7 10 13 16 19; 9 12 15 18 21] and x=[1 ; 1; 0; 0; 1]T, x is a column vector, Call PDGEMV() routine to compute y=Ax. The right result is y=[18; 24; 30; 36; 42]T T=transpose Code sample in C:pdgemv.c    The function call in C is like,     double alpha = 1.0; double beta = 0.0;pdgemv_("N",&M,&M,&alpha,A,&ONE,&ONE,descA,x,&ONE,&ONE,descx,&ONE,&beta,y,&ONE,& amp;ONE,descy,&ONE);

  27. PDGEMV() Check pdgemv_c. here: http://www.netlib.org/scalapack/explore-html/da/dca/pdgemv___8c_source.html http://www.cs.utexas.edu/~plapack/Code/ScaLAPACK/PBLAS/pdgemv_.c

  28. Purpose ======= PDGEMV performs one of the matrix-vector operations sub( Y ) := alpha*sub( A ) *sub( X ) + beta*sub( Y ), or sub( Y ) := alpha*sub( A )'*sub( X ) + beta*sub( Y ), where sub( A ) denotes A(IA:IA+M-1,JA:JA+N-1). When TRANS = 'N', sub( X ) denotes X(IX:IX,JX:JX+N-1), if INCX = M_X, X(IX:IX+N-1,JX:JX), if INCX = 1 and INCX <> M_X, and, sub( Y ) denotes Y(IY:IY,JY:JY+M-1), if INCY = M_Y, Y(IY:IY+M-1,JY:JY), if INCY = 1 and INCY <> M_Y, and, otherwise sub( X ) denotes X(IX:IX,JX:JX+M-1), if INCX = M_X, X(IX:IX+M-1,JX:JX), if INCX = 1 and INCX <> M_X, and, sub( Y ) denotes Y(IY:IY,JY:JY+N-1), if INCY = M_Y, Y(IY:IY+N-1,JY:JY), if INCY = 1 and INCY <> M_Y. Alpha and beta are scalars, and sub( X ) and sub( Y ) are subvectors and sub( A ) is an m by n submatrix.

More Related