230 likes | 414 Views
OpenMP fundamentials. Nikita Panov (nikita.v.panov@intel.com). OpenMP is. An application programming interface (API) that supports shared-memory programming for C/C++ and Fortran Pros: Simple Cross-platform Small overhead Data parallelism support. Usage. Compiler directives: C/C ++
E N D
OpenMPfundamentials Nikita Panov (nikita.v.panov@intel.com)
OpenMP is • An application programming interface (API) that supports shared-memory programming for C/C++ and Fortran • Pros: • Simple • Cross-platform • Small overhead • Data parallelism support
Usage • Compiler directives: • C/C++ • #pragmaompdirective [clause, …] • Fortran • !$OMP directive [clause, …] • C$OMP directive [clause, …] • *$OMP directive [clause, …]
Parallel execution • Parallel Regions • Main OpenMP directive • #pragma omp parallel #pragmaomp parallel { printf( “hello world from thread %d of%d\n”, omp_get_thread_num(), omp_get_num_threads() ); }
Параллельное исполнение • Most of the OpenMP instructions are preprocessor directives • Main construction is “ompparallel[smth]”
OpenMPparallel model • Memory is shared • Task is divided into the threads. – Variables can be • shared by the threads • private, available only for one thread • Uncareful or wrong variable usage can lead to wrong execution results.
OpenMPparallel model • Fork-join model • Program execution starts from the master thread • With OpenMP directive master thread creates the additional threads • After the parallel region is finished all threads are synchronized • Main thread continues to execute the sequential part
Основные конструкции OpenMP • #pragmaomp for • Each thread gets its own amount of data – data parallelism • #pragmaompsection • Each section will be executed in a separate thread –functional parallelism • #pragmaompsingle • Sequential execution. Only one thread will execute this code
OpenMP sections #pragma omp sections [ clause [ clause ] ... ] new-line { [#pragma omp section new-line ] structured-block1 [#pragma omp section new-line structured-block2 ] ... }
OpenMP sections Functional Parallelism #pragmaomp parallel #pragmaomp sections nowait { thread1_work(); #pragmaomp section thread2_work(); #pragmaomp section thread3_work(); #pragmaomp section thread4_work(); }
OpenMP for directive • #pragma omp for [ clause [ clause ] ... Following loop will be executed in parallel (the iterations will be divided by the execution threads)
OpenMP for directive #pragma omp parallel private(f) { f=7; #pragma omp for for (i=0; i<20; i++) a[i] = b[i] + f * (i+1); } /* omp end parallel */
OpenMP for directive Available definitions: private( list) reduction( operator: list) schedule( type [ , chunk ] ) nowait(для#pragma omp for) At the end of the loop all threads will be synchronized unless“nowait” directive is mentioned • schedule defines iteration space scattering method (default behaviour depend on OpenMP version)
OpenMP variables private ( list ) Each of the listed variables will have the local copy for each exection thread shared ( list ) All the thread will share the same instance of the variable • firstprivate( list ) • All the local copies will be initialized by master thread value • lastprivate( list ) • The resulting master thread value will be taken from the last thread executed • … All the variables are shared by default, except the local variables inside a function calls and the loop iterators
Example int x; x = 0; // Initialize x to zero #pragmaompparallel for firstprivate(x) // Copy value // of x // from master for (i = 0; i < 10000; i++) { x = x + i; } printf( “x is %d\n”, x ); // Print out value of x /* Actually needs lastprivate(x) to copy value back out to master */
OpenMP schedule clause schedule( type [ , chunk ] ) static: Every thread gets fixed amount of data dynamic:Amount of data will depend on the thread execution speed guided: Threads will get decreased amounts of data dymamically runtime: Schedule type will be defined at runtime
Main OpenMP functions intomp_get_num_threads(void); intomp_get_thread_num(void); … http://www.openmp.org/
OpenMP synchronization Implicit sunchrionization is performed at the end of any parallel section (unless nowait option is mentioned)
OpenMPsynchroniztion • сritical – can be executed only by one thread at a time. • atomic – Special critical section version for the atomic operations • barrier – synchronization point • ordered – sequential execution • master – only the main thread will execute the following code • …
OpenMP critical cnt = 0; f=7; #pragma omp parallel { #pragma omp for for (i=0; i<20; i++) { if (b[i] == 0) { #pragma omp critical cnt ++; } /* endif */ a[i] = b[i] + f * (i+1); } /* end for */ } /*omp end parallel */
More information OpenMP Homepage: http://www.openmp.org/ • Introduction to OpenMP - tutorial from WOMPEI 2000 (link) • Writing and Tuning OpenMP Programs on Distributed Shared Memory Machines (link) R.Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald, R. Menon: Parallel programming in OpenMP. Academic Press, San Diego, USA, 2000, ISBN 1-55860-671-8 R. Eigenmann, Michael J. Voss (Eds): OpenMP Shared Memory Parallel Programming. Springer LNCS 2104, Berlin, 2001, ISBN 3-540-42346-X