E N D
1. OpenMP: Introduction Greg Wolffe / Christian Trefftz
Grand Valley State University
Supercomputing 2008
Education Program
2. Supercomputing 2008 Education Program 2 Moores Law: Value ramp
3. Supercomputing 2008 Education Program 3 Performance: Inflection Point
4. Supercomputing 2008 Education Program 4 Multi-Core Architecture
5. Supercomputing 2008 Education Program 5 Power Consumption: Superlinear
6. Supercomputing 2008 Education Program 6 Power Efficiency
7. Supercomputing 2008 Education Program 7 Conclusion ? Parallelism Hardware
Going parallel (many-core)
Servers / laptops / devices
Software?
Herb Sutter of Microsoft in Dr. Dobbs Journal:
The free lunch is over. Software performance will no longer increase from one generation to the next as hardware improves unless it is parallel software.
8. Supercomputing 2008 Education Program 8 Threads and Parallel Programming Process: A unit of work managed by an OS with its own address space and OS managed resources.
Thread: resources within a process that execute the instructions in a program. Threads have their own program counter and a private memory region (a stack), but share the other resources within the process including the heap.
Threads are the natural unit of execution for parallel programs on shared memory hardware.
9. Supercomputing 2008 Education Program 9 Demo: Multiple Threads Multi-core in action
Multi-threaded program
OpenMP-multipleThreads.cpp
OpenMP-multipleThreads.exe
C++ w/OpenMP using Visual Studio
Multi-core CPU in laptops
Windows Performance Monitor
Visualization
10. Supercomputing 2008 Education Program 10 OpenMP Portable API for shared memory thread-based parallelism
C/C++ and Fortran
Fork-Join model
11. Supercomputing 2008 Education Program 11 OpenMP Fundamentals Environment variables
export OMP_NUM_THREADS=4
Library functions
omp_set_num_threads (4);
Compiler directives (#pragmas)
#pragma omp parallel num_threads (4)
12. Supercomputing 2008 Education Program 12 OpenMP Programming Source code
#include omp.h
#pragma omp parallel
{
parallel region
}
Compiling
g++ -fopenmp filename.cc o filename
13. Supercomputing 2008 Education Program 13 Directive Responsibility Work-sharing
Data scoping
Synchronization
Scheduling
14. Supercomputing 2008 Education Program 14 OpenMP: Work Sharing Parallel region: partition work
Each thread executes same code
Parallel for loop: partition iterations
Threads share iterations of loop
Parallel section: functional parallelism
Threads perform different tasks
15. Supercomputing 2008 Education Program 15 Demo: Hello, World Sequential code
Uh, well skip that
Multi-threaded code
OpenMP-helloWorld.cc
Compile/Execute
Uses default number of threads
Note: I/O not protected!
16. Supercomputing 2008 Education Program 16 OpenMP: Data Shared: threads access a single copy of the data object
Private: each thread gets volatile copy
Firstprivate: initialized from master
Lastprivate: masters copy updated with last value of last thread
17. Supercomputing 2008 Education Program 17 Problem: Numerical Integration Mathematically:
Approximately:
Rectangle:
Height f(xi)
Width ?x
18. Supercomputing 2008 Education Program 18 Demo: Parallel Loop Sequential code
pi.cc
Multi-threaded code
OpenMP-forLoop.cc
Compile/Execute
Note: some data should not be shared!
Compute Speedup
Vary problem size
Use more threads
19. Supercomputing 2008 Education Program 19 Critical Section Problem Shared memory system ? shared data
Shared data ? concurrent access
Concurrent access ? corrupted variables
Critical section problem: ensure correct access to shared data
20. Supercomputing 2008 Education Program 20 Data Corruption
21. Supercomputing 2008 Education Program 21 Concurrency Control Synchronization
Mutex ensures exclusive access to critical section of code
Barrier causes a group of threads to pause until all have reached a defined point
Signalling
Conditional Wait waits for some event; signals when it occurs
Broadcasting signals a group of waiting threads
22. Supercomputing 2008 Education Program 22 Mutual Exclusion
// program code
entry
// access shared data
exit
// program code
23. Supercomputing 2008 Education Program 23 Barrier
24. Supercomputing 2008 Education Program 24 Conditional Wait
25. Supercomputing 2008 Education Program 25 Broadcast
26. Supercomputing 2008 Education Program 26 OpenMP: Synchronization #pragma omp master
{}
#pragma omp critical
{}
#pragma omp atomic
count++;
#pragma omp barrier
reduction (+: sum)
27. Supercomputing 2008 Education Program 27 Demo: Critical Section Unprotected code
OpenMP-criticalSection.cc
Execute
Note: use synchronization constructs to solve critical section problem
Verify correctness
28. Supercomputing 2008 Education Program 28 OpenMP: Scheduling Static: splits iteration space into blocks of size chunk
Dynamic: assign blocks to threads as they become idle (uneven workloads)
Guided: adjusts chunk-size exponentially until all assigned
29. Supercomputing 2008 Education Program 29 Problem: Mandelbrot Set Set of points in the complex plane
which remain bounded
Varying number of iterations = varying workload
30. Supercomputing 2008 Education Program 30 Demo: Uneven Workload Graphical code
OpenMP-Mandelbrot.cc
Execute and Time
Sequential and parallel
Experiment with various scheduling strategies to improve performance
31. Supercomputing 2008 Education Program 31 Acknowledgements Multi-core
www.intel.com/multi-core
Multi-Core Programming
Shameem Akhter and Jason Roberts
Intel Press, 2006.
OpenMP
www.openmp.org
Parallel Programming in OpenMP
Chandra, Dagum, Kohr, Maydan, McDonald, and Menon
Morgan Kaufmann Publishers, 2001.