1 / 52

MPI Virtual Topologies Kadin Tseng Scientific Computing and Visualization Group Boston University

MPI Virtual Topologies Kadin Tseng Scientific Computing and Visualization Group Boston University. Virtual Topologies. Frequently, realistic science and engineering problems are inherently two or higher dimensional. For parallel computing, these higher physical

moira
Download Presentation

MPI Virtual Topologies Kadin Tseng Scientific Computing and Visualization Group Boston University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MPI Virtual Topologies Kadin Tseng Scientific Computing and Visualization Group Boston University

  2. Virtual Topologies Frequently, realistic science and engineering problems are inherently two or higher dimensional. For parallel computing, these higher physical dimensions often lead to domain decompositions of comparable number of dimensions in the computational domain. Ordinarily, the process ranks of the decomposition are expressed in linear order (i.e., 0, 1, 2, …). To enable a code to refer to these ranks in multi-dimensional coordinates similar to that which arises in a domain decomposition, the MPI library provides many virtual topology routines. There are two kinds of virtual topologies: Cartesian and general. Only the Cartesian topology routines will be discussed in the following ...

  3. Example: A 9 x 4 Array j • Consider case of calculations associated with a 9x4 matrix. • The parenthesized number-pairs, (i, j), denote array row and column indexes, respectively. • Assume six processes available for parallel computation. • One domain decomposition leads to six 3x2 submatrices as shown. • Bottom number in each cell denotes process rank to which array element (i, j) belongs. (0,0) 0 (0,0) 0 (0,1) 0 (0,2) 1 (0,3) 1 (1,0) 0 (1,1) 0 (1,2) 1 (1,3) 1 i (2,0) 0 (2,1) 0 (2,2) 1 (2,3) 1 (3,0) 2 (3,1) 2 (3,2) 3 (3,3) 3 (4,0) 2 (4,1) 2 (4,2) 3 (4,3) 3 (5,0) 2 (5,1) 2 (2,2) 3 (2,3) 3 (6,0) 4 (6,1) 4 (6,2) 5 (6,3) 5 (7,0) 4 (7,1) 4 (7,2) 5 (7,3) 5 (8,0) 4 (8,1) 4 (8,2) 5 (8,3) 5

  4. Domain Decomposition • Domain decomposition yields linear order representation of process topology. • Number in each cell denotes process, or thread, number. • Each cell represents a 3x2 array out of the 9x4 array. • Work within each cell is performed by a single process. 0 1 2 3 4 5

  5. 2D Cartesian Topology • Can map linear rank order into a 2D Cartesian topology (i,j) via MPI function call. • Example: linear rank 3 can be addressed by 2D Cartesian coordinates, (1,1). • Each cell represents a 3x2 matrix block whose work will be performed by the indicated process rank. • MPI range index starts from 0. • MPI ranks follows row-major convention. (0,0) 0 (0,1) 1 (1,0) 2 (1,1) 3 (2,0) 4 (2,1) 5

  6. Cartesian Topology Solver Example • Laplace Equation • Laplace Equation Discretized • Computational Domain • Five-point finite-difference stencil • Jacobi Scheme • Parallel Jacobi Scheme • Unknowns At Border Cells • Message Passing For Boundary Cells • For Individual Threads . . . • MPI Functions Needed For Job • Successive Over Relaxation (SOR) Scheme • Parallel SOR Scheme • Scalability

  7. Laplace Equation Laplace Equation: (1) Boundary Conditions: (2) Analytical solution: (3)

  8. Laplace Equation Discretized Discretize Equation (1) by centered-difference yields: (4) where n and n+1 denote current and next time step, respectively, while (5) For simplicity, we take

  9. Computational Domain y, j x, i

  10. Five-point Finite-Difference Stencil x x o x x Interior (or solution) cells Where solution of the Laplace equation is sought. (i, j) Exterior (or boundary) cells Blue cells denote cells where non-homogeneous boundary conditions are imposed while homogeneous boundary conditions are shown as green cells. x x o x x

  11. Jacobi Scheme • Make initial guess for u at all interior points (i,j) at time n. • Use 5-pt stencil to compute at all interior points (i,j). • Stop if prescribed convergence threshold is reached, otherwise continue on to the next step. • Update: for all interior i and j . • Go to step 2. This is a simple iterative scheme that lends itself as an intuitive instructional procedure. Slowness in convergence renders it impractical for real applications.

  12. Solution Contour Plot

  13. Jacobi Iterative Solver – F77 PROGRAM Jacobi ! Solve Laplace equation using Jacobi iteration method IMPLICIT NONE INTEGER m, mp, mp2, iter, i, j, K, P REAL*8 TOL, gdel, PI REAL start_time, end_time CHARACTER*40 header, outfile INTEGER MAX_M, MAXSTEPS, INCREMENT PARAMETER (P=1, K=0, TOL=1.d-6, PI=3.14159265) PARAMETER (MAX_M=512, MAXSTEPS=50000, INCREMENT=100) REAL*8 u((MAX_M+2)*(MAX_M+2)) REAL*8 unew((MAX_M+2)*(MAX_M+2)) m = MAX_M mp = m/P gdel = 1.0d0 iter = 0

  14. Jacobi . . . F77 (continued) call CPUtime(start_time) ! start timer, measured in seconds CALL bc(m, mp, u, K, P) ! initialize and define B.C. for u DO WHILE (gdel .gt. TOL) ! iterate until error below threshold iter = iter + 1 ! increment iteration counter IF(iter .gt. MAXSTEPS) THEN header = 'Number of iterations exceeds limit‘ CALL error(header,-1) ENDIF CALL update_jacobi(m, mp, u, unew, gdel) IF(MOD(iter,INCREMENT) .eq. 0) THEN WRITE(*,"('iter,gdel:',i6,e12.4)")iter,gdel ENDIF ENDDO call CPUtime(end_time) ! stop timer

  15. Jacobi . . . F77 (continued) WRITE(*,"(40('#'))") WRITE(*,"(' Wall clock time =',f10.2,' x ',i3)") & end_time - start_time,p WRITE(*,"(' Stopped at iteration =',i5)") iter WRITE(*,"(' The maximum error =',f12.6)") gdel WRITE(*,"(40('#'))") outfile = 'plots‘ OPEN(unit=40,file=outfile,form='formatted',status='unknown') WRITE(40,"(3i5)")m+2,mp+2,P close(40) WRITE(outfile, "('plots.',i1)")K OPEN(unit=41,file=outfile,form='formatted‘,status='unknown') WRITE(41,"(6e13.4)")(u(i),i=1,(m+2)*(mp+2)) close(41) END

  16. bc.f SUBROUTINE bc(m, n, v, k, p) IMPLICIT NONE INTEGER p, m, n, k, i CHARACTER*40 header REAL*8 v(0:m+1, 0:n+1), PI PARAMETER (PI = ACOS(-1.0d0)) CALL init_array(m, n, v) ! Init. v IF (p .eq. 1) THEN ! serial DO i=0,m+1 v(i,0) = sin(PI*i/(m+1)) ! Y=0 v(i,n+1) = v(i,0)*exp(-PI) ! Y=1 ENDDO ELSE IF (p .gt. 1) THEN ! multithreads IF (k .eq. 0) THEN ! 1st thread DO i=0,m+1 v(i,0) = sin(PI*i/(m+1)) ! at y=0 ENDDO ENDIF IF (k .eq. p-1) THEN ! last thread DO i=0,m+1 ! at y=1 v(i,n+1) = sin(PI*i/(m+1))*exp(-PI) ENDDO ENDIF ELSE ! Invalid p header = 'p invalid in subroutine BC‘ CALL error(header,-1) ENDIF END

  17. update_jacobi.f SUBROUTINE update_jacobi(m, n, v, vnew, del) IMPLICIT NONE INTEGER i, j, m, n REAL*8 v(0:m+1,0:n+1), vnew(0:m+1,0:n+1), del del = 0.0d0 DO j=1,n DO i=1,m vnew(i,j) = ( v(i,j+1) + v(i+1,j) + v(i-1,j) + v(i,j-1) )*0.25 del = del + DABS(vnew(i,j)-v(i,j)) ! local error sum ENDDO ENDDO DO j=1,n DO i=1,m v(i,j) = vnew(i,j) ! update v ENDDO ENDDO END

  18. Jacobi Iterative Solver – C #include "solvers.h" INT main() { /********** MAIN PROGRAM ****************************** * Solve Laplace equation using Jacobi iterative method * * Kadin Tseng, Boston University, August, 2000 * *********************************************************/ INT iter, m, mp; REAL gdel, **u, **un; CHAR line[10]; fprintf(OUTPUT,"Enter size of interior points, m :"); (void) fgets(line, sizeof(line), stdin); (void) sscanf(line, "%d", &m); fprintf(OUTPUT,"m = %d\n",m); mp = m/P; u = allocate_2D(m, mp); /* allocate mem for 2D array */ un = allocate_2D(m, mp);

  19. Jacobi … Solver – C (Cont’d) gdel = 1.0; /* initialize global error to 1 */ iter = 0; /* initialize iteration counter */ bc(m, mp, u, K, P); /* initialize and define B.C. for u */ replicate(m, mp, u, un); /* copy u to un */ while (gdel > TOL) { /* iterate until error below threshold */ iter++; /* increment iteration counter */ if(iter > MAXSTEPS) { fprintf(OUTPUT,"Iteration terminated (exceeds %6d", MAXSTEPS); fprintf(OUTPUT," )\n"); return (1); /* nonconvergent solution */ } update_jacobi(m, mp, u, un, &gdel); /* update u by Jacobi scheme */ if(iter%INCREMENT == 0) { fprintf(OUTPUT,"iter,gdel: %6d, %lf\n",iter,gdel); } }

  20. Jacobi … Solver – C (Cont’d) fprintf(OUTPUT,"Stopped at iteration %d\n",iter); fprintf(OUTPUT,"The maximum error = %f\n",gdel); /* write u to file for use in MATLAB plots */ write_file( m, mp, u, K, P ); return (0); }

  21. bc.c void bc(INT m, INT n, REAL **u, INT k, INT p) { /*********** Boundary Conditions***************************** * PDE: Laplacian u = 0; 0<=x<=1; 0<=y<=1 * * B.C.: u(x,0)=sin(pi*x); u(x,1)=sin(pi*x)*exp(-pi); u(0,y)=u(1,y)=0 * * SOLUTION: u(x,y)=sin(pi*x)*exp(-pi*y) * ************************************************************/ INT i; init_array( m, n, u); /* initialize u to 0 */ if (p == 1) { for (i = 0; i <= m+1; i++) { u[i][ 0] = sin(PI*i/(m+1)); /* at y = 0; all x */ u[i][n+1] = u[i][0]*exp(-PI); /* at y = 1; all x */ } } else if (p < 1) { printf("p is invalid\n"); }

  22. bc.c (continued) if (p > 1) { /* parallel processing */ if (k == 0) { /* the 1st process */ for (i = 0; i <= m+1; i++) { u[i][0] = sin(PI*i/(m+1)); /* at y = 0; all x */ } } if (k == p-1) { /* the last process */ for (i = 0; i <=m+1; i++) { u[i][n+1] = sin(PI*i/(m+1))*exp(-PI); /* at y = 1; all x */ } } } }

  23. update_jacobi.c INT update_jacobi( INT m, INT n, REAL **u, REAL **unew, REAL *del) { INT i, j; *del = 0.0; for (i = 1; i <= m; i++) { for (j = 1; j <= n; j++) { unew[i][j] = ( u[i ][j+1] + u[i+1][j ] + u[i-1][j ] + u[i ][j-1] )*0.25; *del += fabs(unew[i][j] - u[i][j]); /* find local max error */ } } for (i = 1; i <= m; i++) { for (j = 1; j <= n; j++) { u[i][j] = unew[i][j]; } } return (0); }

  24. Domain Decompositions 1D Domain Decomposition 2D Domain Decomposition Thread 3 Thread 2 Thread 3 Thread 2 Thread 1 Thread 1 Thread 0 Thread 0

  25. Unknowns At Border Cells – 1D Five-point finite-difference stencil applied at thread domain border cells require cells from neighboring threads and/or boundary cells. x x o x thread 2 x x x o x Message passing required x thread 1 x thread 0 x o x x

  26. Message Passing to Fill Boundary Cells thread k+1 thread k current thread thread k-1

  27. For Individual Threads . . . Recast 5-pt finite-difference stencil for individual threads Boundary Conditions thread k of p threads • For simplicity, assume m divisible by p • B.C. time-dependent • B.C. obtained by message-passing • Additional boundary conditions on next page

  28. Relationship Between u and v Physical boundary conditions Relationship between global solution u and thread-local solution v

  29. MPI Functions Used For Job • MPI_Sendrecv ( = MPI_Send + MPI_Recv) – to set boundary conditions for individual threads • MPI_Allreduce – to search for global error to determine whether convergence has been reached. • MPI_Cart_Create – to create Cartesian topology • MPI_Cart_Coords – to find equivalent Cartesian coordinates of given rank • MPI_Cart_Rank – to find equivalent rank of Cartesian coordinates • MPI_Cart_shift – to find current thread’s adjoining neighbor threads

  30. Successive Over Relaxation • Make initial guess for u at all interior points (i,j). • Define a scalar • Use 5-pt stencil to compute at all interior points (i,j). • Compute • Stop if prescribed convergence threshold is reached. • Update: • Go to step 2. In Step 3, compute u' with u at time n+1 wherever possible to accelerate convergence. This inhibits parallelism.

  31. Red-Black SOR Scheme Note that, by virtue of 5-pt stencil, solution at black cells depend on 4 adjacent red cells. Conversely, red solution cells depend only on 4 respective adjoining black cells. This suggests the following parallel algorithm: x x o x x x • Compute v at black cells at time n+1 in parallel using v at red cells at time n. • Compute v at red cells at time n+1 in parallel with v at black cells at time n+1. • Repeat steps 1 and 2 until v fully converged. x o x X x x x o x O x X Can alternate order of steps 1 and 2.

  32. Parallel SOR F77 Program PROGRAM sor_parallel_f77 IMPLICIT NONE INTEGER m, mp, iter, p, i, j, k REAL start_time, end_time CHARACTER*40 header, outfile REAL*8 TOL, omega, rhoj, rhojsq, PI, del, delr, delb, gdel INTEGER grid_comm, me, iv, coord, dims, periods, ndim, ierr INTEGER below, above LOGICAL reorder INTEGER MAX_M, MAXSTEP, INCREMENT PARAMETER (MAX_M = 512, MAXSTEP = 50000, INCREMENT=100) PARAMETER ( TOL =1.d-6, PI = 3.14159265) PARAMETER ( periods=0, ndim=1, reorder = .false.) REAL*8 v((MAX_M+1+2)*(MAX_M+2)) INCLUDE 'mpif.h' outfile = ‘plots’

  33. Parallel SOR F77 Program (cont’d) CALL MPI_Init(ierr) ! starts MPI CALL MPI_Comm_rank(MPI_COMM_WORLD, k, ierr) CALL MPI_Comm_size(MPI_COMM_WORLD, p, ierr) start_time = MPI_Wtime() ! Start timer, in seconds m = MAX_M mp = m/p ! m’ gdel = 1.0d0 ! Initialize global error to 1 iter = 0 ! Initialize iteration counter rhoj = 1.0d0 - PI*PI*0.5/(m+2)/(m+2) ! Chebyshev coefficients rhojsq = rhoj*rhoj

  34. Parallel SOR F77 Program (cont’d) ! create cartesian topology for matrix dims = p CALL MPI_Cart_create(MPI_COMM_WORLD, ndim, dims, & periods, reorder, grid_comm, ierr) CALL MPI_Comm_rank(grid_comm, me, ierr) CALL MPI_Cart_coords(grid_comm, me, ndim, coord, ierr) iv = coord CALL bc(m, mp, v, iv, p) ! set up boundary conditions CALL MPI_Cart_shift(grid_comm, 0, 1, below, above, ierr) omega = 1.0d0 CALL update_sor( m, mp, v, omega, delr, 'r') CALL update_bc_2( m, mp, v, iv, below, above) omega = 1.0d0/(1.0d0 - 0.50d0*rhojsq) CALL update_sor( m, mp, v, omega, delb, 'b') CALL update_bc_2( m, mp, v, iv, below, above)

  35. Parallel SOR F77 Program (cont’d) DO WHILE (gdel .gt. TOL) ! Loop until gdel <=TOL iter = iter + 1 ! increment iteration counter omega = 1.0d0/(1.0d0 - 0.25d0*rhojsq*omega) CALL update_sor( m, mp, v, omega, delr, 'r') CALL update_bc_2( m, mp, v, iv, below, above) omega = 1.0d0/(1.0d0 - 0.25d0*rhojsq*omega) CALL update_sor( m, mp, v, omega, delb, 'b') CALL update_bc_2( m, mp, v, iv, below, above) IF(MOD(iter,INCREMENT) .eq. 0) THEN ! Check error once awhile del = (delr + delb)*4.d0 CALL MPI_Allreduce( del, gdel, 1, MPI_DOUBLE_PRECISION, & MPI_MAX, MPI_COMM_WORLD, ierr ) ! find global max error IF ( iter .ge. MAXSTEP) THEN header = 'Number of iterations exceeded limit' CALL error(header,-1) ENDIF IF(k .eq. 0) WRITE(*,'(i5,3d13.5)')iter,del,gdel,omega ENDIF ENDDO

  36. Parallel SOR F77 Program (cont’d) end_time = MPI_Wtime() ! stop timer IF (k .eq. 0) THEN WRITE(*,"(40('#'))") WRITE(*,"(' Wall clock time =',f10.2,' x ',i3)") & end_time - start_time,p WRITE(*,"(' Stopped at iteration ',i5)")iter WRITE(*,"(' The maximum error =',f12.6)")gdel WRITE(*,"(40('#'))") OPEN(unit=40, file=outfile, form='formatted', status='unknown') WRITE(40,"(3i5)")m+2,mp+2,p close(40) ENDIF WRITE(outfile, "('plots.',i1)")k OPEN(unit=41, file=outfile, form='formatted', status='unknown') WRITE(41,"(6e13.4)")(v(i),i=1,(m+2)*(mp+2)) close(41) CALL MPI_Finalize(ierr) END ! PROGRAM sor_parallel_f77

  37. update_sor.f SUBROUTINE update_sor(m, n, u, omega, del, redblack) !*********************************************************** !* Updates Laplace solution u base on red-black iterative scheme * !* m - (INPUT) 1st dimension size * !* n - (INPUT) 2nd dimension size * !* u - (INPUT) solution array; u = u(0:m+1,0:n+1) * !* omega - (INPUT) iterative parameter * !* del - (OUTPUT) error between two iterative u * !* redblack - (INPUT) which to update; 'r' for red * !*********************************************************** IMPLICIT NONE CHARACTER redblack INTEGER m, n, i, ib, ie, j, jb, je REAL*8 omega, del, up REAL*8 u(0:m+1,0:n+1) del = 0.0d0

  38. update_sor.f (cont’d) IF (redblack .eq. 'r') THEN ! process RED odd points ... jb = 1 je = n ib = 1 ie = m DO j=jb,je,2 DO i=ib,ie,2 up = ( u(i ,j+1) + u(i+1,j ) + & u(i-1,j ) + u(i ,j-1) )*0.25d0 u(i,j) = (1.0d0 - omega)*u(i,j) + & omega*up del = del + dabs(up-u(i,j)) ENDDO ENDDO ! process RED even points ... jb = 2 je = n ib = 2 ie = m DO j=jb,je,2 DO i=ib,ie,2 up = ( u(i ,j+1) + u(i+1,j ) + & u(i-1,j ) + u(i ,j-1) )*0.25d0 u(i,j) = (1.0d0 - omega)*u(i,j) + & omega*up del = del + dabs(up-u(i,j)) ENDDO ENDDO

  39. update_sor.f (cont’d) ELSE IF ( redblack .eq. ‘b’) THEN ! process BLACK odd points ... jb = 2 je = n ib = 1 ie = m DO j=jb,je,2 DO i=ib,ie,2 up = ( u(i ,j+1) + u(i+1,j ) + & u(i-1,j ) + u(i ,j-1) )*0.25d0 u(i,j) = (1.0d0 - omega)*u(i,j) + & omega*up del = del + dabs(up-u(i,j)) ENDDO ENDDO ! process BLACK even points ... jb = 1 je = n ib = 2 ie = m DO j=jb,je,2 DO i=ib,ie,2 up = ( u(i ,j+1) + u(i+1,j ) + & u(i-1,j ) + u(i ,j-1) )*0.25d0 u(i,j) = (1.0d0 - omega)*u(i,j) + & omega*up del = del + dabs(up-u(i,j)) ENDDO ENDDO ELSE STOP ENDIF END

  40. Subroutine update_bc_2 SUBROUTINE update_bc_2( m, mp, v, k, below, above ) IMPLICIT NONE INCLUDE 'mpif.h' INTEGER m, mp, k, below, above, ierr REAL*8 v(0:m+1,0:mp+1) INTEGER status(MPI_STATUS_SIZE) CALL MPI_Sendrecv( & v(1,mp), m, MPI_DOUBLE_PRECISION, above, 0, & v(1, 0), m, MPI_DOUBLE_PRECISION, below, 0, & MPI_COMM_WORLD, status, ierr ) CALL MPI_Sendrecv( & v(1, 1), m, MPI_DOUBLE_PRECISION, below, 1, & v(1,mp+1), m, MPI_DOUBLE_PRECISION, above, 1, & MPI_COMM_WORLD, status, ierr ) RETURN END ! SUBROUTINE update_bc_2

  41. Parallel SOR C Program #include "solvers.h" #include "mpi.h" INT main(INT argc, CHAR *argv[]) { /***************MAIN PROGRAM **************************** * Solve Laplace equation using Successive Over Relaxation * * and Chebyshev Acceleration (see Numerical Recipe for detail) * * Kadin Tseng, Boston University, August, 2000 * ***********************************************************/ INT iter, m, mp, p, k, below, above; REAL omega, rhoj, rhojsq, del, delr, delb, gdel; CHAR line[80], red, black; MPI_Comm grid_comm; INT me, iv, coord[1], dims, periods, ndim, reorder; REAL **v, **vt; MPI_Init(&argc, &argv); /* starts MPI */ MPI_Comm_rank(MPI_COMM_WORLD, &k); /* get current process id */ MPI_Comm_size(MPI_COMM_WORLD, &p); /* get # procs from env or */

  42. Parallel SOR C Program (cont’d) periods = 0; ndim = 1; reorder = 0; red = 'r'; black = 'b'; if(k == 0) { fprintf(OUTPUT,"Enter size of interior points, m :\n"); (void) fgets(line, sizeof(line), stdin); (void) sscanf(line, "%d", &m); fprintf(OUTPUT,"m = %d\n",m); } MPI_Bcast(&m, 1, MPI_INT, 0, MPI_COMM_WORLD); mp = m/p; v = allocate_2D(m, mp); /* allocate mem for 2D array */ vt = allocate_2D(mp, m); /* allocate mem for 2D array */ gdel = 1.0; /* initialize global error to 1 */ iter = 0; /* initialize iteration counter */ rhoj = 1.0 - PI*PI*0.5/(m+2)/(m+2); /* Chebyshev coefficient */ rhojsq = rhoj*rhoj; /* create cartesian topology for matrix */ dims = p; /* cartesian coordinates has same size of procs */

  43. Parallel SOR C Program (cont’d) MPI_Cart_create(MPI_COMM_WORLD, ndim, &dims, &periods, reorder, &grid_comm); MPI_Comm_rank(grid_comm, &me); /* current proc */ MPI_Cart_coords(grid_comm, me, ndim, coord); /* current proc coord. */ iv = coord[0]; bc( m, mp, v, iv, p); /* set up boundary conditions */ transpose(m, mp, v, vt); /* transpose v into vt */ replicate(mp, m, vt, v); MPI_Cart_shift(grid_comm, 0, 1, &below, &above); omega = 1.0; update_sor( mp, m, vt, omega, &delr, red); update_bc_2( mp, m, vt, iv, below, above); omega = 1.0/(1.0 - 0.50*rhojsq); update_sor( mp, m, vt, omega, &delb, black); update_bc_2( mp, m, vt, iv, below, above);

  44. Parallel SOR C Program (cont’d) while (gdel > TOL) { iter++; /* increment iteration counter */ omega = 1.0/(1.0 - 0.25*rhojsq*omega); update_sor( mp, m, vt, omega, &delr, red); update_bc_2( mp, m, vt, iv, below, above); omega = 1.0/(1.0 - 0.25*rhojsq*omega); update_sor( mp, m, vt, omega, &delb, black); update_bc_2( mp, m, vt, iv, below, above); if(iter%INCREMENT == 0) { del = (delr + delb)*4.0; MPI_Allreduce( &del, &gdel, 1, MPI_DOUBLE, MPI_MAX, MPI_COMM_WORLD); /* find global max error */ if (k == 0) { fprintf(OUTPUT,"iter gdel omega: %5d %13.5f %13.5f\n",iter,gdel,omega); } } if(iter > MAXSTEPS) { fprintf(OUTPUT,"Iteration terminated (exceeds %6d", MAXSTEPS); fprintf(OUTPUT," )\n"); return (1); /* nonconvergent solution */ } }

  45. Parallel SOR C Program (cont’d) if (k == 0) { fprintf(OUTPUT,"Stopped at iteration %d\n",iter); fprintf(OUTPUT,"The maximum error = %f\n",gdel); } /* write v to file for use in MATLAB plots */ transpose(mp, m, vt, v); /* transpose v into vt */ write_file( m, mp, v, k, p); MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); return (0); }

  46. Parallel SOR C Program (cont’d) INT update_sor( INT m, INT n, REAL **u, REAL omega, REAL *del, CHAR redblack) { /*********************************************************** * Updates u according to successive over relaxation method * * m - (INPUT) size of interior rows * * n - (INPUT) size of interior columns * * u - (INPUT) array * * omega - (INPUT) adjustable constant used to speed up * * convergence of SOR * * del - (OUTPUT) error norm between 2 solution steps * * redblack - (INPUT) either 'r' for red and 'b' for black * ************************************************************/ INT i, ib, ie, j, jb, je; REAL up; *del = 0.0;

  47. Function update_bc_2 #include "solvers.h" #include "mpi.h" INT update_bc_2( INT mp, INT m, REAL **vt, INT k, INT below, INT above ) { MPI_Status status; MPI_Sendrecv( vt[mp]+1, m, MPI_DOUBLE, above, 0, vt[0 ]+1, m, MPI_DOUBLE, below, 0, MPI_COMM_WORLD, status ); MPI_Sendrecv( vt[1 ]+1, m, MPI_DOUBLE, below, 1, vt[mp+1]+1, m, MPI_DOUBLE, above, 1, MPI_COMM_WORLD, status ); return (0); }

  48. Parallel SOR C Program (cont’d) if (redblack == 'r') { /* process RED odd points ... */ jb = 1; je = n; ib = 1; ie = m; for ( j = jb; j <= je; j+=2 ) { for ( i = ib; i <=ie; i+=2 ) { up = ( u[i ][j+1] + u[i+1][j ] + u[i-1][j ] + u[i ][j-1] )*0.25; u[i][j] = (1.0 - omega)*u[i][j] + omega*up; *del += fabs(up-u[i][j]); } } /* process RED even points ... */ jb = 2; je = n; ib = 2; ie = m; for ( j = jb; j <= je; j+=2 ) { for ( i = ib; i <=ie; i+=2 ) { up = ( u[i ][j+1] + u[i+1][j ] + u[i-1][j ] + u[i ][j-1])*0.25; u[i][j] = (1.0 - omega)*u[i][j] + omega*up; *del += fabs(up-u[i][j]); } }

  49. Parallel SOR C Program (cont’d) } else { if (redblack == ‘b') { /* process BLACK odd points ... */ jb = 2; je = n; ib = 1; ie = m; for ( j = jb; j <= je; j+=2 ) { for ( i = ib; i <=ie; i+=2 ) { up = ( u[i ][j+1] + u[i+1][j ] + u[i-1][j ] + u[i ][j-1] )*0.25; u[i][j] = (1.0 - omega)*u[i][j] + omega*up; *del += fabs(up-u[i][j]); } } /* process BLACK even points ... */ jb = 1; je = n; ib = 2; ie = m; for ( j = jb; j <= je; j+=2 ) { for ( i = ib; i <=ie; i+=2 ) { up = ( u[i ][j+1] + u[i+1][j ] + u[i-1][j ] + u[i ][j-1])*0.25; u[i][j] = (1.0 - omega)*u[i][j] + omega*up; *del += fabs(up-u[i][j]); } } return (0); } else { return (1); } } }

  50. SOR Timings On Origin2000

More Related