120 likes | 344 Views
OpenMP (Open Multi-Processing). David Valentine Computer Science Slippery Rock University. The buzz for OpenMP. There are more than a dozen events at SC12 with “OpenMP” in their titles. OpenMP celebrating 15 years: booth #2237 API designed for C/C++ and FORTRAN
E N D
OpenMP(Open Multi-Processing) David Valentine Computer Science Slippery Rock University
The buzz for OpenMP • There are more than a dozen events at SC12 with “OpenMP” in their titles. • OpenMP celebrating 15 years: booth #2237 • API designed for C/C++ and FORTRAN • Shared memory parallelism, in multicore world • As such, is an incremental learning curve for current programmers: • Start with serial code • Grab the obvious parallelizable sections to get the quickest results (Amdahl’s Law).
Shared Memory Parallelism • Our world has already gone multicore • How best can we take advantage of the cores already on the desktop without jumping into the weeks of low-level thread manipulation? • There are several choices: • openMP • Cilk • Threaded Building Blocks (TBB)
OpenMP (open multi-processing)OpenMP.org • Started in 1997, as continuation of ANSI X3H5 • Supported by industry (HP, IBM, Intel, Sun, et al) and government (DoE) • Designed for shared memory, multicore • Thread based parallelism • Explicit programmer control • Fork-join model
OpenMP Fork-Join Model • Explicit programmer control • Can use thread number (omp_get_thread_num()) to set different tasks per thread in parallel region Fork Join wikipedia.com
OpenMP • Made of 3 components: • Compiler Directives (20 as of 3.1) • #pragmaomp parallel will spawn parallel region • Run time library routines (32) • intmyNum = omp_get_thread_num( ); • Environment Variables (9) • setenv OMP_NUM_THREADS 8
OpenMP Goals • Their 4 stated goals are: • Standardization • Lean and Mean • Ease of Use • Portability • CS2 students see their programs “go parallel” with just 2 or 3 lines of additional code! • At this level we are just exposing them to the concept of mulitcore, shared memory parallelism
General Code Structure(from https://computing.llnl.gov/tutorials/openMP/#Abstract) #include <omp.h> main () { int var1, var2, var3; Serial code … Beginning of parallel section. Fork a team of threads. Specify variable scoping #pragmaomp parallel private(var1, var2) shared(var3) { Parallel section executed by all threads Other OpenMP directives Run-time Library calls All threads join master thread and disband } //parallel block Resume serial code … }//main
The obligatory Hello World example #include <omp.h> #include <stdio.h> int main() { printf("Getting started...\n\n"); #pragmaomp parallel printf("Hello World from thread %i of %i\n", omp_get_thread_num(), omp_get_num_threads()); printf("\nThat's all Folks!\n"); return 0; } • Compile with OpenMP enabled • Project-Properties-Configuration Properties-C/C++ -Language – OpenMP Support – YES • Or gcc uses -fopenmp
For CS1/CS2 • NB most programs are severely I/O bound • But, we are looking for only: • A simple counting loop (FOR) • where each iteration is independent, and • has enough work to distribute across our cores • The first two requirements are easy- the third one can involve “handicapping” the loop work • We won’t show them nearly all of OpenMP; we just want to whet their appetites here • Tell them the Truth, tell them nothing but the Truth, but for heaven’s sake don’t tell them ALL the Truth!
eg. Trapezoidal Rule float trap(float xLo, float xHi, intnumIntervals) { float area; //area under the curve (the integral) float width; //width of each trapezoid float x; //our points of evaluation float sum; //sum up all our f(x)’s sum= 0.0; //init our summing var width = (xHi-xLo)/numIntervals; //width of each trap for(inti=1; i<numIntervals; i++) {//get the interior points x = xLo + i*width; //each iter. independent of others sum += f(x); //add the interior value }//for sum += (f(xLo) + f(xHi))/2.0; //add the endpoints area = width * sum; //calc the total area return area; //return the approximation }//trap
eg. Trapezoidal Rule • Students add two lines: • #include <omp.h> • #pragma • When they see the cores all “redline” @100%, they are hooked. #pragmaomp parallel for private(x) reduction(+:sum) for(inti=1; i<numIntervals; i++) {//get the interior points x = xLo + i*width; //each iteration independent of others sum += f(x); //add the interior value }//for