

Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois. PARALLEL DO: Restrictions. The number of times that the loop body is executed ( trip-count ) must be available at runtime before the loop is executed.



  1. CPE779: More on OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois

  2. PARALLEL DO: Restrictions • The number of times that the loop body is executed (trip-count) must be available at runtime before the loop is executed. • The loop body must be such that all iterations of the loop are completed CPE 779 Parallel Computing

  3. PARALLEL Directive: Syntax Fortran: !$omp parallel [clause [,] [clause …]] structured block !$omp end parallel C/C++: #pragmaomp parallel [clause [clause …]] structured block CPE 779 Parallel Computing

  4. Parallel Regions Details • When a parallel directive is encountered, threads are spawned which execute the code of the enclosed structured block (the parallel region). • The number of threads can be specified just like for the parallel do directive. • The parallel region is replicated and each thread executes a copy of the replicated region. CPE 779 Parallel Computing

  5. double A[1000]; omp_set_num_threads(4) pooh(0,A) pooh(1,A) pooh(2,A) pooh(3,A) ID = omp_thread_num() ID = omp_thread_num() printf(“all done\n”); Example double A[1000];omp_set_num_threads(4); #pragmaomp parallel{int ID = omp_thread_num(); pooh(ID, A);} printf(“all done\n”); CPE 779 Parallel Computing

  6. PARALLEL vs PARALLEL DO • Arbitrary structured blocks v/s loops • Coarse grained v/s fine grained • Replication v/s work division !$omp parallel do do I = 1,10 print *, ‘Hello world’, I enddo Output: 10 Hello world messages !$omp parallel do I = 1,10 print *, ‘Hello world’, I enddo !$omp end parallel Output: 10*T Hello world messages where T = number of threads CPE 779 Parallel Computing

  7. Synchronization - Motivation • Concurrent access to shared data may result in data inconsistency - mechanism required to maintain data consistency : mutual exclusion • Sometimes code sections executed by different threads need to be sequenced in some particular order : event synchronization CPE 779 Parallel Computing

  8. Mutual Exclusion • Mechanisms for ensuring the consistency of data that is accessed concurrently by several threads • Critical directive: specifies a region of code that must be executed by only one thread at a time. • Atomic directive: specifies that a specific memory location must be updated atomically, rather than letting multiple threads attempt to write to it. • Library lock routines CPE 779 Parallel Computing

  9. Critical Section: Syntax Fortran: !$omp critical [(name)] structuredblock !$omp end critical [(name)] C/C++: #pragmaomp critical [(name)] structured block CPE 779 Parallel Computing

  10. Example cur_max = MINUS_INFINITY !$omp parallel do do i = 1, n … !$OMP CRITICAL if (a(i) .gt. cur_max) then cur_max = a(i) endif !$OMP END CRITICAL … enddo CPE 779 Parallel Computing

  11. Atomic Directive • The body of an atomic directive is a single assignment statement. • There are restrictions on the statement which insure that it can be translated into an atomic sequence of machine instructions to read, modify and write a memory location. • An atomic statement must follow a specific syntax. See the most recent OpenMP specs for this CPE 779 Parallel Computing


  13. Library Lock routines • Routines to: • create a lock - omp_init_lock • acquire a lock, waiting until it becomes available if necessary - omp_set_lock • release a lock, resuming a waiting thread, if one exists - omp_unset_lock • try and acquire a lock but return instead of waiting if not available - omp_test_lock • destroy a lock - omp_destroy_lock CPE 779 Parallel Computing

  14. Example omp_lock_tlck;omp_init_lock(&lck);#pragmaomp parallel private (tmp){ id = omp_get_thread_num();tmp = do_lots_of_work(id);omp_set_lock(&lck);printf(“%d %d”, id, tmp);omp_unset_lock(&lck);} CPE 779 Parallel Computing

  15. Library Lock Routines • Locks are the most flexible of the mutual exclusion primitives because the there are no restrictions on where they can be placed. • The previous routines don’t support nested acquires - deadlock if tried!! - a separate set of routines exist to allow nesting. • Nesting of locks is useful for code like recursive routines. CPE 779 Parallel Computing

  16. Mutual Exclusion Features • Apply to critical, atomic as well as library routines: • NO Fairness guarantee. • Guarantee of Progress. • Careful when nesting - lots of chances for deadlock. CPE 779 Parallel Computing

