260 likes | 441 Views
Introduction to OpenMP Part II. White Rose Grid Computing Training Series Deniz Savas, Alan Real, Mike Griffiths RTP Module February 2012. Synchronisation Pitfalls when using shared variables (Race Conditions).
E N D
Introduction to OpenMPPart II • White Rose Grid Computing Training Series • Deniz Savas, Alan Real, Mike Griffiths • RTP Module • February 2012
Synchronisation Pitfalls when using shared variables (Race Conditions) • A variable that is used (read from) but never updated (written to) can safely be declared as a shared variable in a parallel region. • Problems arise when the above rule is violated by attempting to change the value of any shared variable within the parallel region. Such problems are known as data-race problems and should be avoided at the programming level. However, for situations where avoidance is not possible or efficient, there are a variety of OMP directives that can be used for resolving these problems. These are BARRIER,ATOMIC,CRITICAL and FLUSH which we will discussed later.
Thread 1 Thread 2 Synchronisation example load a load a Program add a 1 Program add a 1 store a store a • a=a+1 on 2 threads where a is a shared variable 11 12 10 11 Private data 11 10 12 Shared data Case 1 (thread 2 behind thread 1): a=12
Thread 1 Thread 2 Synchronisation example load a load a Program add a 1 Program add a 1 store a store a • a=a+1 on 2 threads where a is a shared variable 10 11 10 11 Private data 11 10 Shared data Case 1 (thread 2 behind thread 1): a=12 Case 2 (thread 2 at similar time to thread 1): a=11
Synchronization related directives • We have seen the potential problems arising from the interaction of multiple threads, particularly the race conditions when multiple threads attempt to write to the same shared variable simultaneously. Such events may render our results useless, being determined by the toss of a coin, according to which thread runs ahead of which one. • The following set of OMP directives, namely; CRITICAL, BARRIER ATOMIC and FLUSH directives help us to avoid these synchronization related problems.
OMP Barrier • Syntax C: #pragma omp barrier Fortran: !#omp barrier • This directive defines a marker where all threads must reach before the execution of the program continues. It may be a useful tool in circumstances where you need to ensure that the work relating to one set of tasks are completed before embarking on a new set of tasks. • Beware, overuse of this feature may reduce efficiency. • It may also give rise to DEADLOCK situations • Never-the-less it is very useful to ensure correct working of complex programs • Most of the work sharing directives have an implied barrier at the end of their block ( unless NOWAIT is used). I.e. OMP END DO, OMP END SECTIONS, OMP END WORKSHARE. Note that they do not have an implied barrier at the beginning, only at the end unless a no wait is specified : I.e. !$OMP END WORKSHARE NOWAIT
OMP BARRIER • To avoid deadlocks, NEVER use $OMP BARRIER inside any of these blocks ! • !$OMP MASTER . . . . !$OMP END MASTER • !$OMP SECTIONS . . . . !$OMP END SECTIONS • !$OMP CRITICAL . . . . !$OMP END CRITICAL • !$OMP SINGLE . . . . !$OMP END SINGLE
NOWAIT clause • We have seen during the earlier discussion of the BARRIER statement that the directives END DO/FOR, END SECTIONS, ENDSINGLE and END WORKSHARE all imply a barrier where executing threads must wait until everyone of them finished their work and arrived there. • The NOWAIT clause of the above mentioned statements remove this restriction to allow the earlier finishing threads to proceed straight onto the instructions following the work sharing construct without having to waiting for the other threads to catch up. • This will reduce the amount of idle periods and increase efficiency but at the risk of producing wrong results! SO BE VERY CAREFUL! • Syntax: • Fortran: !$OMP DO do loop !$OMP END DO NOWAIT • C/C++: #pragma omp for nowait for loop Similar for END SECTIONS , END SINGLE and END WORKSHARE
NOWAIT example • Two loops with no dependencies will present an ideal opportunity for the NOWAIT clause. !$OMP PARALLEL !$OMP DO do j=1,n a(j) = c * b(j) end do !$OMP ENDDO NOWAIT !$OMP DO do i=1,m x(i) = sqrt(y(i)) * 2.0 end do !$OMP END PARALLEL
NOWAIT warning • Use with EXTREME CAUTION • Too easy to remove a barrier which is necessary. • Results in non-deterministic behaviour: • Sometimes the right result • Sometimes wrong results • Behaviour changes under debugger • Possibly a good coding style to use NOWAIT everywhere and make all barriers explicit • Not done in practice.
NOWAIT warning example !$OMP DO do j=1,n a(j)=b(j) + c(j) end do !$OMP END DO !$OMP DO do j=1,n d(j)=e(j) * f end do $OMP END DO !$OMP DO do j=1,n z(j) = (a(j) + a(j+1)) * 0.5 end do a(j+1) could be updated by a different thread to a(j) Can remove the first barrier but not the second as there is a dependency on a( )
OMP CRITICAL ( Mutual Exclusion ) A thread waits at the start of a critical section until no other thread is executing a section with the same critical name. This construct can be utilised to mark sections of the code that may, for example change global flags etc., once a particular task is performed so as not to repeat the same work again. It is also useful for sectioning-off code such as updating of heaps and stacks, where simultaneous updating by competing threads may prove disastrous! The OMP ATOMIC directive becomes a better choice if the synchronization worries are related to a specific memory location.
OMP CRITICAL EXAMPLE !$OMP PARALLEL SHARED( MYDATA ) !$OMP CRITICAL updatepart ! Perform operations on the global/shared array WORK ! Which redefines WORK and then sets new flags to ! indicate what the next call to partition will see in MYDATA. CALL PARTITION ( I , MYDATA) !$OMP END CRITICAL updatepart ! Now perform the work , that can be done in isolation ! Without affecting the other threads CALL SOLVE(MYDATA) $OMP END PARALLEL
OMP Atomic Unlike most of the other OMP directives, this is a directive that applies to a single statement immediately following itself ‘rather than a block of statements’. It ensures that a specific shared memory location is updated atomically to avoid it been exposed to the possibility of simultaneous writes that may give rise to race conditions. May be more efficient than using CRITICAL directives e.g. if different array elements can be protected separately. By using the atomic directive we can be confident that no race situation will arise while evaluating an expression and updating a variable it is assigned to. Note that ATOMIC directive does not impose any conditions on the order in which each thread will execute the statement, it merely ensures that no two threads will execute it simultaneously. See OMP ORDERED later.
ATOMIC directive Syntax Syntax: Fortran!$OMP ATOMICstatement where statement must be one of ;x=x op(expr), x= (expr)op x, x=intr(x,expr) or x=intr(expr,x) x is a scalar shared variable and op is one of +,*,-,/,.and.,.or.,.eqv.,.neqv. intr is one of MAX,MIN,IAND,IOR or IEOR intrinsic functions. C #pragma omp atomic statement where statement must be one of ; x binop= expr, x++, ++x, x–- or –-x binop is one of +,*,-,/,&,^,<< or >> and expris an expression of scalar type that does not reference the object designated byx.
ATOMIC example !$OMP PARALLEL DO PRIVATE(xlocal,ylocal) DO i=1,n call work(xlocal,ylocal) !$OMP ATOMIC x(index(i))=x(index(i))+xlocal y(i)=y(i)+ylocal END DO Prevents simultaneous updates of an element of x by multiple threads. ATOMIC directives allows different elements of x to be updated simultaneously. CRITICAL region would “serialise” the update. Note: update on y is not atomic as ATOMIC only applies to the statement that immediately follows the directive.
Lock Routines Occasionally need more flexibility than offered by CRITICAL and ATOMIC directives. (Although not as easy to use) A lock is a special variable that may be setby a thread. No other thread may unset the lock until the thread which set the lock has unset it. Setting a lock may be blocking ‘set_lock’ or non-blocking ‘test_lock’. A lock must be initialised before it is used and may be destroyed when no longer required. Lock variables should not be used for any other purpose.
Syntax Fortran:SUBROUTINE OMP_INIT_LOCK(var)SUBROUTINE OMP_SET_LOCK(var)LOGICAL FUNCTION OMP_TEST_LOCK(var)SUBROUTINE OMP_UNSET_LOCK(var)SUBROUTINE OMP_DESTROY_LOCK(var) var should be an INTEGER of the same size as addresses (e.g. INTEGER*8 on 64-bit machine). C/C++:#include <omp.h>void omp_init_lock(omp_lock_t *lock);void omp_set_lock(omp_lock_t *lock);int omp_test_lock(omp_lock_t *lock);void omp_unset_lock(omp_lock_t *lock);void omp_destroy_lock(omp_lock_t *lock);
Lock example call omp_init_lock(ilock) !$OMP PARALLEL SHARED(ilock) : do while (.not. omp_test_lock(ilock)) call something_else() end do call work() call omp_unset_lock(ilock) : !$OMP END PARALLEL OMP_TEST_LOCK will set a lock if it is not set.
FLUSH directive Ensures that a variable is written to/read from main memory. Variable will be flushed out of the register file (and usually out of cache). Also called a memory fence. Allows use of “normal” variables for synchronisation. Avoids the need for use of volatile type qualifiers in this context.
FLUSH syntax Fortran: !$OMP FLUSH [(list)] C/C++: #pragma omp flush [(list)] listspecifies a list of variables to be flushed. If no list is present all shared variables are flushed. FLUSH directives are implied by a BARRIER, at entry and exit to CRITICAL and ORDERED sections, and at the end of PARALLEL, DO/FOR, SECTIONS and SINGLE directives(except when a NOWAIT clause is present).
FLUSH example !$OMP PARALLEL PRIVATE(myid,i,neighb) : do j=1, niters do i=lb(myid),ub(myid) a(i)=( a(i+1)+a(i))*0.5 end do ndone(myid)=ndone(myid)+1 !$OMP FLUSH (ndone) do while (ndone(neighb).lt.ndone(myid)) !$OMP FLUSH (ndone) end do end do May be updated on different thread. Must wait for previous iteration to finish on neighbour Make sure write is to main memory Waits for neighbour Make sure read is from main memory
Choosing Synchronisation Use ATOMIC if possible. Allows the most optimisation. If not possible, use CRITICAL Use different names wherever possible If appropriate use variable flushing As a last resort use lock routines Should be a rare occurrence in practice.
Practical: Molecular dynamics • Aim: to introduce atomic updates • Code is a simple MD simulation of the melting of solid argon. • Computation is dominated by the calculation of force pairs in the subroutine forces. • Parallelise this routine using a DO/FOR directive and atomic updates. • Watch out for PRIVATE and REDUCTION variables.
Practical: Image processing Aim: Introduction to the use of parallel DO/for loops. Simple image processing algorithm to reconstruct an image from an edge-detected version. Use parallel DO/for directives to run in parallel.
OpenMP resources Web sites http://www.openmp.org Official web site, including language specifications, links to compilers, tools + mailing lists http://www.compunity.org OpenMP community site: links, events, resources. Book: “Parallel programming in OpenMP”, Chandra et al., Morgan Kaufmann, ISBN 1558606718. PGI Users Guide on ‘iceberg’ https://iceberg.shef.ac.uk/docs/pgi52doc/pgi52ug.pdf