290 likes | 328 Views
Javier Delgado Grid-Enabledment of Scientific Applications Professor S. Masoud Sadjadi. Shared Memory Programming with OpenMP. Outline. Motivation for OpenMP Basics Work Sharing Constructs Synchronization Data Sharing and Scope Example Program. Motivation.
E N D
Javier Delgado Grid-Enabledment of Scientific Applications Professor S. Masoud Sadjadi Shared Memory Programming with OpenMP OpenMP Programming - GCB
OpenMP Programming - GCB Outline • Motivation for OpenMP • Basics • Work Sharing Constructs • Synchronization • Data Sharing and Scope • Example Program
OpenMP Programming - GCB Motivation • Message Passing Model not optimized for Shared Memory • Hard to code • “All or nothing” • Traditional Threading Libraries not suitable • overly complicated • Little Fortran support
OpenMP Programming - GCB Brief History • ANSI X3H5 • Not formally adopted • Only basic parallelism support (i.e. loops) • Pthreads • Too complicated for HPC applications • Little support for Fortran • Custom/Proprietary solutions • Not portable • OpenMP – Improve upon X3H5, keeping Scientific Applications in mind
OpenMP Programming - GCB Outline • Motivation for OpenMP • Basics • What it is • Design Goals • Model • Work Sharing Constructs • Synchronization • Data Sharing and Scope • Example Program
OpenMP Programming - GCB OpenMP – What is it? • API for multi-threaded, shared memory parallelism • Compiler Directives • Runtime Library Routines • Environment Variables • Abstraction of low-level threading constructs • Optimized for HPC • Extensions for Fortran, C, and C++
OpenMP Programming - GCB Design Goals • Leanness • Simple and limited set of directives • Incremental parallelism of serial applications • Simplicity for implementing scientific applications
OpenMP Programming - GCB Model • Shared Memory, thread-based parallelism • Programmer has full control • Fork-join execution pattern
OpenMP Programming - GCB Fork-Join Model source: http://www.mhpcc.edu source: http://dimsboiv.uqac.ca
OpenMP Programming - GCB Fork-Join Model • All threads execute parallel region • I/O atomicity and synchronization is the programmer's problem • If one thread fails in the parallel region, they all do
OpenMP Programming - GCB Outline • Motivation for OpenMP • Basics • Work Sharing Constructs • Loops • Sections • Synchronization • Data Sharing and Scope • Example Program
OpenMP Programming - GCB Loops • Distribute iterations amongst threads • #pragma omp for [clause ... ] • Clauses • SCHEDULE • NOWAIT • ORDERED
OpenMP Programming - GCB Scheduling • Schedule clause describes mapping of threads to iterations • Types • STATIC – divide evenly amongst nodes • DYNAMIC – Assign iterations as they become available • GUIDED – Dynamically reasign, with exponentially declining “chunk” size • RUNTIME – divide according to environment variable
OpenMP Programming - GCB Static Scheduling chunk size: 2 iterations Thread 1 Thread 2 Thread 3 time source: http://navet.ics.hawaii.edu/~casanova
OpenMP Programming - GCB Dynamic Scheduling chunk size: 2 iterations Thread 1 Thread 2 Thread 3 time source: http://navet.ics.hawaii.edu/~casanova
OpenMP Programming - GCB Guided Scheduling chunk size: 2 iterations Thread 1 Thread 2 Thread 3 time • Note the changing chunk sizes (red borders) source: http://navet.ics.hawaii.edu/~casanova
OpenMP Programming - GCB Sections • Allow programmer to specify sections of code that can be executed concurrently • Example: wake_up SECTIONS SECTION make_coffee || make_tea SECTION cook_cereal END SECTIONS eat_breakfast
OpenMP Programming - GCB Workshare • Define a section of code where each line can be executed by a different processor • Fortran only • Example: Vector Operations on entire arrays • C(1:N) = A(1:N) + B(1:N)
OpenMP Programming - GCB Outline • Motivation for OpenMP • Basics • Work Sharing Constructs • Synchronization • Data Sharing and Scope • Example Program
OpenMP Programming - GCB Synchronization • Programmer is responsible for correctness of shared variables • Example: • If x is updated at the same time, it is given a value of 1 instead of 2 shared int x fork() x = x + 1 x = x + 1 read(x)
OpenMP Programming - GCB Synchronization • Solution 1: MASTER or SERIAL directive • Only one thread executes the “critical” portion of code • Solution 2: CRITICAL or ATOMIC directive • Only one thread executes at a time
OpenMP Programming - GCB Other Synchronization Directives • Barrier – force synchronization • Flush – require consistent view of memory • Ordered – execute loop in order
OpenMP Programming - GCB Outline • Motivation for OpenMP • Basics • Work Sharing Constructs • Synchronization • Data Sharing and Scope • Example Program
OpenMP Programming - GCB Variable Scope • Shared Memory -> shared variables ... by default ... usually • Globals: • File scope • static variables • Privates: • Loop index • Stack variables in subroutines called from parallel regions
OpenMP Programming - GCB Data Scope Attributes • Shared – All threads modify the same variable • Private – New object created for each thread • FirstPrivate – Same, but a copy from master node is created • LastPrivate – Same, but final value is assigned at master upon completion of parallel region • Reduction – After execution, peform a (specified) reduction and give its value to a variable • etc.
OpenMP Programming - GCB Outline • Motivation for OpenMP • Basics • Work Sharing Constructs • Synchronization • Data Sharing and Scope • Example Program
OpenMP Programming - GCB Example Program program calc_pi integer n,i double precision w,x,sum,pi,f,a double precision start, finish, timef f(a) = 4.0 / (1.0 + a*a) n=100000000 w=1.0/n sum=0.0 !$OMP PARALLEL PRIVATE(x,i), SHARED(w,n), & !$OMP REDUCTION(+:sum) !$OMP DO do i=1,n x = w * (i - 0.5) sum = sum + f(x) end do !$OMP END DO !$OMP END PARALLEL pi = w * sum print*,"value of pi, time taken:" end
OpenMP Programming - GCB Disadvantages • Scalability of Shared Memory Architecture • Hardware Limitations • Software (OS) Limitations (to an extent) • Price of Shared Memory Supercomputers
OpenMP Programming - GCB Sun Fire e25k Server 16 UltraSPARC IV+, 1.8 Ghz 2 x 73 GB Storage 64 GB Memory Price: $ 1,125,047.00source: Sun Website SM Cost Examples • IBM System P • 16-way processor @ 2.1 GHz • 73 GB Storage • 8 GB Memory • Price: $ 473,770.00source: commercial vendor Prices obtained on April 28, 2008