OpenMP: Case Studies

OpenMP: Case Studies • Partial Differential Equations - Laplace Equation and Programs - 2D Helmholtz Equation

Partial Differential Equation – 2D Laplace Equation Helmholtz Equation Heat Equation Wave Equation

Implementation • Compute • Compare • Update

Serial Code program lpcache integer imax,jmax,im1,im2,jm1,jm2,it,itmax parameter (imax=12,jmax=12) parameter (im1=imax-1,im2=imax-2,jm1=jmax-1,jm2=jmax-2) parameter (itmax=10) real*8 u(imax,jmax),du(imax,jmax),umax,dumax,tol parameter (umax=10.0,tol=1.0e-6) ! Initialize do j=1,jmax do i=1,imax-1 u(i,j)=0.0 du(i,j)=0.0 enddo u(imax,j)=umax enddo

! Main computation loop do it=1,itmax dumax=0.0 do j=2,jm1 do i=2,im1 du(i,j)=0.25*(u(i-1,j)+u(i+1,j)+u(i,j-1)+u(i,j+1))-u(i,j) dumax=max(dumax,abs(du(i,j))) enddo enddo do j=2,jm1 do i=2,im1 u(i,j)=u(i,j)+du(i,j) enddo enddo write (*,*) it,dumax enddo stop end

Shared Memory Parallelization

OpenMP Code program lpomp use omp_lib integer imax,jmax,im1,im2,jm1,jm2,it,itmax parameter (imax=12,jmax=12) parameter (im1=imax-1,im2=imax-2,jm1=jmax-1,jm2=jmax-2) parameter (itmax=10) real*8 u(imax,jmax),du(imax,jmax),umax,dumax,tol parameter (umax=10.0,tol=1.0e-6) !$OMP PARALLEL DEFAULT(SHARED) PRIVATE(i,j) !$OMP DO do j=1,jmax do i=1,imax-1 u(i,j)=0.0 du(i,j)=0.0 enddo u(imax,j)=umax enddo !$OMP END DO !$OMP END PARALLEL

!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(i,j) do it=1,itmax dumax=0.0 !$OMP DO REDUCTION (max:dumax) do j=2,jm1 do i=2,im1 du(i,j)=0.25*(u(i-1,j)+u(i+1,j)+u(i,j-1)+u(i,j+1))-u(i,j) dumax=max(dumax,abs(du(i,j))) enddo enddo !$OMP END DO !$OMP DO do j=2,jm1 do i=2,im1 u(i,j)=u(i,j)+du(i,j) enddo enddo !$OMP END DO enddo !$OMP END PARALLEL end

DEMO /home/jemmyhu/CES706/openmp/laplace/test

2D Helmholtz Equation • Finite Difference Method Finite Difference Method to solve the • 2D Helmholtz equation: Δu + λu = f • With homogeneous Dirichlet-boundary conditions is solved with a finite difference method. • The Laplace operator Δ is discretized with the central 5-point • difference star. The corresponding linear equations • AU = F • with a banded coefficient matrix are solved iteratively with the • (simple) Jacobi method.

Jacobi-method:2D Helmholtz equation The iterative solution in centred finite-difference form where is the discrete Laplace operator and and are known. If is the th iteration for this solution, we get the "residual" vector or and we take the th iteration of such that the new residual is zero

Jacobi Solver – Serial code with 2 loop nests k = 1; while (k <= maxit && error > tol) { error = 0.0; /* copy new solution into old */ for (j=0; j<m; j++) for (i=0; i<n; i++) uold[i + m*j] = u[i + m*j]; /* compute stencil, residual and update */ for (j=1; j<m-1; j++) for (i=1; i<n-1; i++){ resid =( ax * (uold[i-1 + m*j] + uold[i+1 + m*j]) + ay * (uold[i + m*(j-1)] + uold[i + m*(j+1)]) + b * uold[i + m*j] - f[i + m*j] ) / b; /* update solution */ u[i + m*j] = uold[i + m*j] - omega * resid; /* accumulate residual error */ error =error + resid*resid; } /* error check */ k++; error = sqrt(error) /(n*m); } /* while */

Run Result on silky: /home/jemmyhu/CES706/openmp/jacobi/c [jemmyhu@silky:~/CES706/openmp/jacobi/c] time ./jacobi_ser_1-icc.exe < input.medium Input n,m - grid dimension in x,y direction : Input alpha - Helmholts constant : Input relax - Successive over-relaxation parameter: Input tol - error tolerance for iterrative solver: Input mits - Maximum iterations for solver: -> 200, 200, 0.8, 1, 1e-07, 1000 Total Number of Iteratuons 1001 Residual 0.000000441056646 elapsed time : 0.400464 MFlops : 1272.65 Solution Error : 0.00218961 real 0m0.415s user 0m0.408s sys 0m0.000s

Jacobi Solver – Parallel code with 2 parallel regions k = 1; while (k <= maxit && error > tol) { error = 0.0; /* copy new solution into old */ #pragma omp parallel for private(i) for (j=0; j<m; j++) for (i=0; i<n; i++) uold[i + m*j] = u[i + m*j]; /* compute stencil, residual and update */ #pragma omp parallel for reduction(+:error) private(i,resid) for (j=1; j<m-1; j++) for (i=1; i<n-1; i++){ resid = ( ax * (uold[i-1 + m*j] + uold[i+1 + m*j]) + ay * (uold[i + m*(j-1)] + uold[i + m*(j+1)]) + b * uold[i + m*j] - f[i + m*j] ) / b; /* update solution */ u[i + m*j] = uold[i + m*j] - omega * resid; /* accumulate residual error */ error =error + resid*resid; } /* error check */ k++; error = sqrt(error) /(n*m); } /* while */

Run Result on silky: /home/jemmyhu/CES706/openmp/jacobi/c [jemmyhu@silky:~/CES706/openmp/jacobi/c] export OMP_NUM_THREADS=2 [jemmyhu@silky:~/CES706/openmp/jacobi/c] time ./jacobi_omp_1-icc.exe < input.medium Input n,m - grid dimension in x,y direction : Input alpha - Helmholts constant : Input relax - Successive over-relaxation parameter: Input tol - error tolerance for iterrative solver: Input mits - Maximum iterations for solver: -> 200, 200, 0.8, 1, 1e-07, 1000 Total Number of Iteratuons 1001 Residual 0.000000441056646 elapsed time : 0.309421 MFlops : 1647.11 Solution Error : 0.00218961 real 0m0.331s user 0m0.456s sys 0m0.000s

Jacobi Solver – Parallel code with 2 parallel loops in 1 PR k = 1; while (k <= maxit && error > tol) { error = 0.0; #pragma omp parallel private(resid, i){ #pragma omp for for (j=0; j<m; j++) for (i=0; i<n; i++) uold[i + m*j] = u[i + m*j]; /* compute stencil, residual and update */ #pragma omp for reduction(+:error) for (j=1; j<m-1; j++) for (i=1; i<n-1; i++){ resid = ( ax * (uold[i-1 + m*j] + uold[i+1 + m*j]) + ay * (uold[i + m*(j-1)] + uold[i + m*(j+1)]) + b * uold[i + m*j] - f[i + m*j]) / b; /* update solution */ u[i + m*j] = uold[i + m*j] - omega * resid; /* accumulate residual error */ error =error + resid*resid; } } /* end parallel */ k++; error = sqrt(error) /(n*m); } /* while */

Run Result on silky: /home/jemmyhu/CES706/openmp/jacobi/c/ [jemmyhu@silky:~/CES706/openmp/jacobi/c] time ./jacobi_omp_2-icc.exe < input.medium Input n,m - grid dimension in x,y direction : Input alpha - Helmholts constant : Input relax - Successive over-relaxation parameter: Input tol - error tolerance for iterrative solver: Input mits - Maximum iterations for solver: -> 200, 200, 0.8, 1, 1e-07, 1000 Total Number of Iteratuons 1001 Residual 0.000000441056646 elapsed time : 0.195231 MFlops : 2610.51 Solution Error : 0.00218961 real 0m0.223s user 0m0.372s sys 0m0.008s

OpenMP: Case Studies