460 likes | 674 Views
/home/jemmyhu/CES706/openmp/Fortran/synch/. ! serail code to demo data-sharing later program sharing-seq implicit none integer, parameter :: N = 50000000 integer(selected_int_kind(17)) :: x(N) integer(selected_int_kind(17)) :: total integer :: i do i = 1, N x(i) = i
E N D
/home/jemmyhu/CES706/openmp/Fortran/synch/ ! serail code to demo data-sharing later program sharing-seq implicit none integer, parameter :: N = 50000000 integer(selected_int_kind(17)) :: x(N) integer(selected_int_kind(17)) :: total integer :: i do i = 1, N x(i) = i end do total = 0 do i = 1, N total = total + x(i) end do write(*,*) "total = ", total end program [jemmyhu@saw-login1:~ ]$ ./sharing-atomic-seq total = 1250000025000000
program sharing_par1 implicit none integer, parameter :: N = 50000000 integer(selected_int_kind(17)) :: x(N) integer(selected_int_kind(17)) :: total integer :: i !$omp parallel !$omp do do i = 1, N x(i) = i end do !$omp end do total = 0 !$omp do do i = 1, N total = total + x(i) end do !$omp end do !$omp end parallel write(*,*) "total = ", total end program ! Parallel code with openmp do directives ! Run result varies from run to run ! Due to the chaos with ‘total’ global variable [jemmyhu@saw-login1:~] ./sharing-atomic-par1 total = 312500012500000 [jemmyhu@saw-login1:~] ./sharing-atomic-par1 total = 937500012500000 [jemmyhu@saw-login1:~] ./sharing-atomic-par1 total = 312500012500000 [jemmyhu@saw-login1:~] ./sharing-atomic-par1 total = 312500012500000 [jemmyhu@saw-login1:~] ./sharing-atomic-par1 total = 937500012500000
Synchronization categories • Mutual Exclusion Synchronization critical atomic • Event Synchronization barrier ordered master • Custom Synchronization flush (lock – runtime library)
Named Critical Sections A named critical section must synchronize with other critical sections of the same name but can execute concurrently with critical sections of a different name. cur_max = min_infinity cur_min = plus_infinity !$omp parallel do do I = 1, n if (a(i).gt. cur_max) then !$omp critical (MAXLOCK) if (a(i).gt. cur_max) then cur_max = a(i) endif !$omp critical (MAXLOCK) endif if (a(i).lt. cur_min) then !$omp critical (MINLOCK) if (a(i).lt. cur_max) then cur_min = a(i) endif !$omp critical (MINLOCK) endif enddo
program sharing_par2 use omp_lib implicit none integer, parameter :: N = 50000000 integer(selected_int_kind(17)) :: x(N) integer(selected_int_kind(17)) :: total integer :: i !$omp parallel !$omp do do i = 1, N x(i) = i end do !$omp end do total = 0 !$omp do do i = 1, N !$omp atomic total = total + x(i) end do !$omp end do !$omp end parallel write(*,*) "total = ", total end program ! Parallel code with openmp do directives ! Synchronized with atomic directive ! which give correct answer, but cost more [jemmyhu@saw-login1:~] ./sharing-atomic-par2 total = 1250000025000000 [jemmyhu@saw-login1:~] ./sharing-atomic-par2 total = 1250000025000000 [jemmyhu@saw-login1:~] ./sharing-atomic-par2 total = 1250000025000000
Barriers are used to synchronize the execution of multiple threads within a parallel region, not within a work-sharing construct.Ensure that a piece of work has been completed before moving on to the next phase !$omp parallel private(index) index = generate_next_index() do while (inex .ne. 0) call add_index (index) index = generate_next_index() enddo ! Wait for all the indices to be generated !$omp barrier index = get_next_index() do while (inex .ne. 0) call process_index (index) index = get_next_index() enddo !omp end parallel
Ordered Sections • Impose an order across the iterations of a parallel loop • Identify a portion of code within each loop iteration that must be executed in the original, sequential order of the loop iterations. • Restrictions: If a parallel loop contains an ordered directive, then the parallel loop directive itself must contain the ordered clause An iteration of a parallel loop is allowed to encounter at most one ordered section !$omp parallel do ordered do i = 1, n a(i) = … complex calculation here … ! Wait until the previous iteration has finished its section !$omp ordered print *, a(i) ! Signal the completion of ordered from this iteration !omp end ordered enddo
The problem with this example is that operations on variables a and b are not ordered with respect to each other. For instance, nothing prevents the compiler from moving the flush of b on thread 1 or the flush of a on thread 2 to a position completely after the critical section (assuming that the critical section on thread 1 does not reference b and the critical section on thread 2 does not reference a). If either re-ordering happens, the critical section can be active on both threads simultaneously.
Lock: low-level synchronization functions • Why use lock 1) The synchronization protocols required by a problem cannot be expressed with OpenMP’s high-level synchronization constructs 2) The parallel overhead incurred by OpenMP’s high-level synchronization constructs is too large The simple lock routines are as follows: • omp_init_lock routine initializes a simple lock. • omp_destroy_lock routine uninitializes a simple lock. • omp_set_lock routine waits until a simple lock is available, and then sets it. • omp_unset_lock routine unsets a simple lock. • omp_test_lock routine tests a simple lock, and sets it if it is available. Formats (omp.h) C/C++ Fortran data type omp_lock_t nvar must be an integer variable of Fortran kind=omp_nest_lock_kind. void omp_init_lock(omp_lock_t *lock); subroutine omp_init_lock(svar) integer (kind=omp_lock_kind) svar
program LIB_ENV use omp_lib implicit none integer :: nthreads logical :: dynamics, nnested integer :: myid write(*,*) "start" nthreads = omp_get_num_threads() dynamics = omp_get_dynamic() nnested = omp_get_nested() write(*,*) "nthreads, dynamics, nnested : ", nthreads, dynamics, nnested write(*,*) "before" !$omp parallel private(myid) !$omp master nthreads = omp_get_num_threads() dynamics = omp_get_dynamic() nnested = omp_get_nested() write(*,*) "nthreads, dynamics, nnested : ", nthreads, dynamics, nnested !$omp end master myid = omp_get_thread_num() write(*,*) "myid : ", myid !$omp end parallel write(*,*) "after" end program
/home/jemmyhu/CES706/openmp/Fortran/data-scope [jemmyhu@saw-login1:~] f90 -openmp -o openmp_lib_env-f90 openmp_lib_env.f90 [jemmyhu@saw-login1:~] ./openmp_lib_env start nthreads, dynamics, nnested : 1 F F before nthreads, dynamics, nnested : 8 F F myid : 0 myid : 3 myid : 2 myid : 1 myid : 4 myid : 7 myid : 6 myid : 5 after
/home/jemmyhu/CES706/openmp/Fortran/data-scope/openmp_lib_env-2.f90 … write(*,*) "changes before" call omp_set_dynamic(.TRUE.) call omp_set_nested(.TRUE.) !$omp parallel private(myid) !$omp master nthreads = omp_get_num_threads() dynamics = omp_get_dynamic() nnested = omp_get_nested() write(*,*) "nthreads, dynamics, nnested : ", nthreads, dynamics, nnested !$omp end master myid = omp_get_thread_num() write(*,*) "myid : ", myid !$omp end parallel write(*,*) "after“ …..
[jemmyhu@saw-login1:~] ./openmp_lib_env-2 start nthreads, dynamics, nnested : 1 F F before nthreads, dynamics, nnested : 8 F F myid : 0 myid : 2 myid : 4 myid : 1 myid : 5 myid : 6 myid : 7 myid : 3 after changes before nthreads, dynamics, nnested : 8 T T myid : 2 myid : 0 myid : 4 myid : 1 myid : 3 myid : 6 myid : 7 myid : 5 after
[jemmyhu@silky:~/CES706/openmp/Fortran/data-scope] export OMP_NUM_THREADS=4 [jemmyhu@silky:~/CES706/openmp/Fortran/data-scope] ./openmp_lib_env-2-ifort start nthreads, dynamics, nnested : 1 F F before nthreads, dynamics, nnested : 4 F F myid : 0 myid : 2 myid : 1 myid : 3 after changes before myid : 2 myid : 1 myid : 3 nthreads, dynamics, nnested : 4 T T myid : 0 after [jemmyhu@silky:~/CES706/openmp/Fortran/data-scope] Intel compiler on silky
Intel compiler on silky [jemmyhu@silky:~/CES706/openmp/Fortran/data-scope] export OMP_NUM_THREADS=4 [jemmyhu@silky:~/CES706/openmp/Fortran/data-scope] ./openmp_lib_env-ifort start nthreads, dynamics, nnested : 1 F F before nthreads, dynamics, nnested : 4 F F myid : 0 myid : 1 myid : 2 myid : 3 after [jemmyhu@silky:~/CES706/openmp/Fortran/data-scope
Intel compiler on silky [jemmyhu@silky:~/CES706/openmp/Fortran/data-scope] export OMP_DYNAMIC="TRUE" [jemmyhu@silky:~/CES706/openmp/Fortran/data-scope] export OMP_NESTED="TRUE" [jemmyhu@silky:~/CES706/openmp/Fortran/data-scope] ./openmp_lib_env-ifort start nthreads, dynamics, nnested : 1 T T before nthreads, dynamics, nnested : 4 T T myid : 1 myid : 2 myid : 0 myid : 3 after [jemmyhu@silky:~/CES706/openmp/Fortran/data-scope]
Pathscale on Opteron [jemmyhu@wha781 data-scope]$ ./openmp_lib_env-f90 start nthreads, dynamics, nnested : 1 F F before nthreads, dynamics, nnested : 2 F F myid : 0 myid : 0 1 after [jemmyhu@wha781 data-scope]$ export OMP_DYNAMIC="TRUE" [jemmyhu@wha781 data-scope]$ export OMP_NESTED="TRUE" [jemmyhu@wha781 data-scope]$ ./openmp_lib_env-f90 ** OpenMP warning: dynamic thread adjustment not available (ignored OMP_DYNAMIC) start nthreads, dynamics, nnested : 1 F T before nthreads, dynamics, nnested : 2 F T myid : 1 myid : 0 after [jemmyhu@wha781 data-scope]$
#include <omp.h> /* OpenMP header file*/ #define NUM_STEPS 100000000 int main(int argc, char *argv[]) { int i; double x, pi; double sum = 0.0; double step = 1.0/(double) NUM_STEPS; int nthreads; /* do computation -- using all available threads */ #pragma omp parallel { #pragma omp master { nthreads = omp_get_num_threads(); } #pragma omp for private(x) reduction(+:sum) schedule(runtime) for (i=0; i < NUM_STEPS; ++i) { x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } #pragma omp master { pi = step * sum; } } /* print results */ printf("parallel program results with %d threads:\n", nthreads); printf("pi = %g (%17.15f)\n",pi, pi); return EXIT_SUCCESS; }