1 / 16

CPE779: More on OpenMP

CPE779: More on OpenMP. Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois. PARALLEL DO: Restrictions. The number of times that the loop body is executed ( trip-count ) must be available at runtime before the loop is executed.

zoe
Download Presentation

CPE779: More on OpenMP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPE779: More on OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois

  2. PARALLEL DO: Restrictions • The number of times that the loop body is executed (trip-count) must be available at runtime before the loop is executed. • The loop body must be such that all iterations of the loop are completed CPE 779 Parallel Computing

  3. PARALLEL Directive: Syntax Fortran: !$omp parallel [clause [,] [clause …]] structured block !$omp end parallel C/C++: #pragmaomp parallel [clause [clause …]] structured block CPE 779 Parallel Computing

  4. Parallel Regions Details • When a parallel directive is encountered, threads are spawned which execute the code of the enclosed structured block (the parallel region). • The number of threads can be specified just like for the parallel do directive. • The parallel region is replicated and each thread executes a copy of the replicated region. CPE 779 Parallel Computing

  5. double A[1000]; omp_set_num_threads(4) pooh(0,A) pooh(1,A) pooh(2,A) pooh(3,A) ID = omp_thread_num() ID = omp_thread_num() printf(“all done\n”); Example double A[1000];omp_set_num_threads(4); #pragmaomp parallel{int ID = omp_thread_num(); pooh(ID, A);} printf(“all done\n”); CPE 779 Parallel Computing

  6. PARALLEL vs PARALLEL DO • Arbitrary structured blocks v/s loops • Coarse grained v/s fine grained • Replication v/s work division !$omp parallel do do I = 1,10 print *, ‘Hello world’, I enddo Output: 10 Hello world messages !$omp parallel do I = 1,10 print *, ‘Hello world’, I enddo !$omp end parallel Output: 10*T Hello world messages where T = number of threads CPE 779 Parallel Computing

  7. Synchronization - Motivation • Concurrent access to shared data may result in data inconsistency - mechanism required to maintain data consistency : mutual exclusion • Sometimes code sections executed by different threads need to be sequenced in some particular order : event synchronization CPE 779 Parallel Computing

  8. Mutual Exclusion • Mechanisms for ensuring the consistency of data that is accessed concurrently by several threads • Critical directive: specifies a region of code that must be executed by only one thread at a time. • Atomic directive: specifies that a specific memory location must be updated atomically, rather than letting multiple threads attempt to write to it. • Library lock routines CPE 779 Parallel Computing

  9. Critical Section: Syntax Fortran: !$omp critical [(name)] structuredblock !$omp end critical [(name)] C/C++: #pragmaomp critical [(name)] structured block CPE 779 Parallel Computing

  10. Example cur_max = MINUS_INFINITY !$omp parallel do do i = 1, n … !$OMP CRITICAL if (a(i) .gt. cur_max) then cur_max = a(i) endif !$OMP END CRITICAL … enddo CPE 779 Parallel Computing

  11. Atomic Directive • The body of an atomic directive is a single assignment statement. • There are restrictions on the statement which insure that it can be translated into an atomic sequence of machine instructions to read, modify and write a memory location. • An atomic statement must follow a specific syntax. See the most recent OpenMP specs for this CPE 779 Parallel Computing

  12. Example C$OMP PARALLEL PRIVATE(B) B = DOIT(I)C$OMP ATOMIC X = X + B C$OMP END PARALLEL C$OMP PARALLEL PRIVATE(B) B = DOIT(I)C$OMP CRITICAL(XB) X = X + B C$OMP END CRITICAL(XB) C$OMP END PARALLEL CPE 779 Parallel Computing

  13. Library Lock routines • Routines to: • create a lock - omp_init_lock • acquire a lock, waiting until it becomes available if necessary - omp_set_lock • release a lock, resuming a waiting thread, if one exists - omp_unset_lock • try and acquire a lock but return instead of waiting if not available - omp_test_lock • destroy a lock - omp_destroy_lock CPE 779 Parallel Computing

  14. Example omp_lock_tlck;omp_init_lock(&lck);#pragmaomp parallel private (tmp){ id = omp_get_thread_num();tmp = do_lots_of_work(id);omp_set_lock(&lck);printf(“%d %d”, id, tmp);omp_unset_lock(&lck);} CPE 779 Parallel Computing

  15. Library Lock Routines • Locks are the most flexible of the mutual exclusion primitives because the there are no restrictions on where they can be placed. • The previous routines don’t support nested acquires - deadlock if tried!! - a separate set of routines exist to allow nesting. • Nesting of locks is useful for code like recursive routines. CPE 779 Parallel Computing

  16. Mutual Exclusion Features • Apply to critical, atomic as well as library routines: • NO Fairness guarantee. • Guarantee of Progress. • Careful when nesting - lots of chances for deadlock. CPE 779 Parallel Computing

More Related