160 likes | 221 Views
CPE779: More on OpenMP. Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois. PARALLEL DO: Restrictions. The number of times that the loop body is executed ( trip-count ) must be available at runtime before the loop is executed.
E N D
CPE779: More on OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois
PARALLEL DO: Restrictions • The number of times that the loop body is executed (trip-count) must be available at runtime before the loop is executed. • The loop body must be such that all iterations of the loop are completed CPE 779 Parallel Computing
PARALLEL Directive: Syntax Fortran: !$omp parallel [clause [,] [clause …]] structured block !$omp end parallel C/C++: #pragmaomp parallel [clause [clause …]] structured block CPE 779 Parallel Computing
Parallel Regions Details • When a parallel directive is encountered, threads are spawned which execute the code of the enclosed structured block (the parallel region). • The number of threads can be specified just like for the parallel do directive. • The parallel region is replicated and each thread executes a copy of the replicated region. CPE 779 Parallel Computing
double A[1000]; omp_set_num_threads(4) pooh(0,A) pooh(1,A) pooh(2,A) pooh(3,A) ID = omp_thread_num() ID = omp_thread_num() printf(“all done\n”); Example double A[1000];omp_set_num_threads(4); #pragmaomp parallel{int ID = omp_thread_num(); pooh(ID, A);} printf(“all done\n”); CPE 779 Parallel Computing
PARALLEL vs PARALLEL DO • Arbitrary structured blocks v/s loops • Coarse grained v/s fine grained • Replication v/s work division !$omp parallel do do I = 1,10 print *, ‘Hello world’, I enddo Output: 10 Hello world messages !$omp parallel do I = 1,10 print *, ‘Hello world’, I enddo !$omp end parallel Output: 10*T Hello world messages where T = number of threads CPE 779 Parallel Computing
Synchronization - Motivation • Concurrent access to shared data may result in data inconsistency - mechanism required to maintain data consistency : mutual exclusion • Sometimes code sections executed by different threads need to be sequenced in some particular order : event synchronization CPE 779 Parallel Computing
Mutual Exclusion • Mechanisms for ensuring the consistency of data that is accessed concurrently by several threads • Critical directive: specifies a region of code that must be executed by only one thread at a time. • Atomic directive: specifies that a specific memory location must be updated atomically, rather than letting multiple threads attempt to write to it. • Library lock routines CPE 779 Parallel Computing
Critical Section: Syntax Fortran: !$omp critical [(name)] structuredblock !$omp end critical [(name)] C/C++: #pragmaomp critical [(name)] structured block CPE 779 Parallel Computing
Example cur_max = MINUS_INFINITY !$omp parallel do do i = 1, n … !$OMP CRITICAL if (a(i) .gt. cur_max) then cur_max = a(i) endif !$OMP END CRITICAL … enddo CPE 779 Parallel Computing
Atomic Directive • The body of an atomic directive is a single assignment statement. • There are restrictions on the statement which insure that it can be translated into an atomic sequence of machine instructions to read, modify and write a memory location. • An atomic statement must follow a specific syntax. See the most recent OpenMP specs for this CPE 779 Parallel Computing
Example C$OMP PARALLEL PRIVATE(B) B = DOIT(I)C$OMP ATOMIC X = X + B C$OMP END PARALLEL C$OMP PARALLEL PRIVATE(B) B = DOIT(I)C$OMP CRITICAL(XB) X = X + B C$OMP END CRITICAL(XB) C$OMP END PARALLEL CPE 779 Parallel Computing
Library Lock routines • Routines to: • create a lock - omp_init_lock • acquire a lock, waiting until it becomes available if necessary - omp_set_lock • release a lock, resuming a waiting thread, if one exists - omp_unset_lock • try and acquire a lock but return instead of waiting if not available - omp_test_lock • destroy a lock - omp_destroy_lock CPE 779 Parallel Computing
Example omp_lock_tlck;omp_init_lock(&lck);#pragmaomp parallel private (tmp){ id = omp_get_thread_num();tmp = do_lots_of_work(id);omp_set_lock(&lck);printf(“%d %d”, id, tmp);omp_unset_lock(&lck);} CPE 779 Parallel Computing
Library Lock Routines • Locks are the most flexible of the mutual exclusion primitives because the there are no restrictions on where they can be placed. • The previous routines don’t support nested acquires - deadlock if tried!! - a separate set of routines exist to allow nesting. • Nesting of locks is useful for code like recursive routines. CPE 779 Parallel Computing
Mutual Exclusion Features • Apply to critical, atomic as well as library routines: • NO Fairness guarantee. • Guarantee of Progress. • Careful when nesting - lots of chances for deadlock. CPE 779 Parallel Computing