510 likes | 970 Views
Shared-memory Parallel Programming. Taura Lab M1 Yuuki Horita. Agenda. Introduction Sample Sequential Program Multi-thread programming OpenMP Summary. Agenda. Introduction Sample Sequential Program Multi-thread programming OpenMP Summary. Parallel Programming Model.
E N D
Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita Parallel and Distributed Programming
Agenda • Introduction • Sample Sequential Program • Multi-thread programming • OpenMP • Summary Parallel and Distributed Programming
Agenda • Introduction • Sample Sequential Program • Multi-thread programming • OpenMP • Summary Parallel and Distributed Programming
Parallel Programming Model • Message Passing Model • Talked by Imatake-kun just now • Shared Memory Model • Memory is shared with all process elements • Multiprocessor (SMP, SunFire, …) • DSM (Distributed Shared Memory) • Process elements can communicate each other through the shared memory Parallel and Distributed Programming
Shared Memory Model …… PE PE PE Memory Parallel and Distributed Programming
Shared Memory Model • Simplicity • not necessary to think about the location of the computation data • Fast communication (Multiprocessor) • not necessary to use networks in process communication • Dynamic load sharing • the same reason as simplicity Parallel and Distributed Programming
Shared Memory Parallel Programming • Multi-thread programming • Pthreads • OpenMP • Parallel Programming model for shared memory multiprocessor Parallel and Distributed Programming
Agenda • Introduction • Sample Sequential Program • Multi-thread programming • OpenMP • Summary Parallel and Distributed Programming
Sample Sequential Program FDM (Finite Difference Method) …loop{ for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } } }… Parallel and Distributed Programming
Parallelization Procedure Assignment Decomposition Sequential Computation Tasks Process Elements Mapping Orchestration Processors Parallel and Distributed Programming
Parallelize theSequential Program • Decomposition a task …loop{ for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } } }… Parallel and Distributed Programming
Parallelize the Sequential Program • Assignment PE Divide the tasks equally among process elements PE PE PE Parallel and Distributed Programming
Parallelize the Sequential Program • Orchestration PE PE need to communicate and to synchronize PE PE Parallel and Distributed Programming
Parallelize the Sequential Program • Mapping PE PE PE PE Multiprocessor Parallel and Distributed Programming
Agenda • Introduction • Sample Sequential Program • Multi-thread programming • OpenMP • Summary Parallel and Distributed Programming
Multi-thread Programming • A process element is a thread • cf. a process • Memory is shared among all threads generated from the same process • Threads can communicate with each other through shared memory Parallel and Distributed Programming
Fork-Join Model Main Thread Program starts (Main Thread) Serialized Section Fork Main Thread creates new threads Parallelized Section Other threads join Main Thread Join Serialized Section Main Thread continues processing Parallel and Distributed Programming
Libraries for Thread Programming • Pthreads (C/C++) • pthread_create() • pthread_join() • Java Thread • Thread Class / Runnable Interface Parallel and Distributed Programming
Pthreads API (fork/join) • pthread_t // thread variable • pthread_create ( pthread_t *thread, // thread variable pthread_attr_t *attr, // thread attributes void *(*func)(void *), // start function void *arg // arguments of the function) • pthread_join ( pthread_t thread, // thread variable void **thread_return // the return value of the thread) Parallel and Distributed Programming
Pthreads Parallel Programming #include …void do_sequentially (void){ /* sequential execution */} main (){… do_sequentially(); // want to parallelize…} Parallel and Distributed Programming
Pthreads Parallel Programming #include …#include <pthread.h>void do_in_parallel (void){ /* parallel execution */} main (){pthread_t tid;… pthread_create(&tid, NULL, (void *)do_in_parallel, NULL); do_in_parallel();pthread_join(tid);…} Parallel and Distributed Programming
Exclusive Access Control ThreadA ThreadB int sum = 0; thread_A(){ sum++;} thread_B(){ sum++;} sum = 0 0 a ← read sum 0 a ← read sum a = a + 1 a = a + 1 1 write a → sum sum = 1 1 write a → sum sum = 1 Parallel and Distributed Programming
Pthreads API (Exclusive Access Control) • Variablepthread_mutex_t • Initialization Function pthread_mutex_init( pthread_mutex_t *mutex, pthread_mutexattr_t *mutexattr ) • Lock Function pthread_mutex_lock(pthread_mutex_t *mutex) pthread_mutex_unlock(pthread_mutex_t *mutex) Parallel and Distributed Programming
Exclusive Access Control ThreadA ThreadB int sum = 0;pthread_mutex_t mutex;pthread_mutex_init(&mutex, 0) thread_A(){pthread_mutex_lock(&mutex); sum++;pthread_mutex_unlock(&mutex);} thread_B(){pthread_mutex_lock(&mutex); sum++;pthread_mutex_unlock(&mutex);} acquire lock acquire lock sum ++ release lock acquire lock sum ++ release lock Parallel and Distributed Programming
Pthreads API (Condition Variable) • Variablepthread_cond_t • Initialization Functionpthread_cond_init( pthread_cond_t *cond, pthread_condattr_t *condattr ) • Condition Functionpthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex) pthread_cond_broadcast(pthread_cond_t *cond) pthread_cond_signal(pthread_cond_t *cond); Parallel and Distributed Programming
Condition Wait ThreadA acquire lock pthread_mutex_lock(&mutex)while( condition is not satisfied ){pthread_cond_wait(&cond, &mutex);}pthread_mutex_unlock(&mutex); sleep release lock Is condition satisfied? ThreadB pthread_mutex_lock(&mutex)update_condition();pthread_cond_broadcast(&cond);pthread_mutex_unlock(&mutex); pthread_cond_broadcastpthread_cond_signal release lock Parallel and Distributed Programming
Synchronization • Synchronization in the sample program n = 0;…pthread_mutex_lock(&mutex);n++;while ( n < nthreads ){pthread_cond_wait(&cond, &mutex);}pthread_cond_broadcast(&cond);pthread_mutex_unlock(&mutex); Parallel and Distributed Programming
Characteristics of Pthreads • troublesome to describe exclusive access control and synchronization • likely to be deadlocked • still hard to parallelize a given sequential program Parallel and Distributed Programming
Agenda • Introduction • Sample Sequential Program • Multi-thread programming • OpenMP • Summary Parallel and Distributed Programming
What’s OpenMP? • specification for a set of compiler directives, library routines, and environment variables that can be used to specify shared memory parallelism in Fortran and C/C++ programs • Fortran ver1.0 API – Oct.1997 • C/C++ ver1.0 API – Oct. 1998 Parallel and Distributed Programming
Background of OpenMP • spread of shared memory multiprocessors • need for common directives in shared memory multiprocessors • Each vendors had provided a different set of directives • need for simpler and more flexible interface for developing parallel applications • Pthread is hard for developers to describe parallel applications Parallel and Distributed Programming
OpenMP API • Directives • Libraries • Environment Variables Parallel and Distributed Programming
Directives • C/C++ • Fortran #pragma omp directive_name… !$OMP directive_name… If user’s compiler doesn’t support openMP, the directive sentencesare ignored and therefore the program can be executed as a sequential program. Parallel and Distributed Programming
Parallel Region • the part parallelized by some threads #pragma omp parallel{ /* parallel region */} create some threads at the beginning of the parallel region join at the end of the parallel region Parallel and Distributed Programming
Parallel Region (thread) • the number of thread • omp_get_num_threads() : get current # of threads • omp_set_num_threads(int nthreads) : set # of threads to nthreads • $OMP_NUM_THREADS • thread ID (0~# of threads-1) • omp_get_thread_num() : get thread ID Parallel and Distributed Programming
Work Sharing Construction • specify the task assignment inside parallel region • for • sharing iterations among threads • sections • sharing sections among threads • single • executing only by one thread Parallel and Distributed Programming
Example of Work Sharing omp_set_num_threads(4); #pragma omp parallel#pragma omp forfor (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }} omp_set_num_threads(4); #pragma omp parallel forfor (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }} for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }} Memory access conflict at i and j makes the computation slow Parallel and Distributed Programming
Data Scoping Attributes • specify the data scoping at parallel construction or work sharing construction • shared( var_list ) • var_list is shared among threads • private( var_list ) • var_list is private • reduction (operator : var_list ) • ex) #pragma omp for reduction (+: sum) • var_list is private in construction and reflected after the construction Parallel and Distributed Programming
Example of Data Scoping Attributes omp_set_num_threads(4); #pragma omp parallel for private(i, j)for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }} Parallel and Distributed Programming
Synchronization • barrier • wait until all threads reach this line • #pragma omp barrier • critical • execute exclusively • #pragma omp critical [(name)] { … } • atomic • update a scalar variable atomically • #pragma omp atomic…… Parallel and Distributed Programming
Synchronization (Pthreads/OpenMP) • Synchronization in the sample program <Pthreads> pthread_mutex_lock(&mutex);n++;while ( n < nthreads ){ pthread_cond_wait(&cond, &mutex);}pthread_cond_broadcast(&cond);pthread_mutex_unlock(&mutex); <OpenMP> #pragma omp barrier Parallel and Distributed Programming
Summary of OpenMP • Incremental parallelization of sequential programs • Portability • Easier to implement parallel application than Pthreads and MPI Parallel and Distributed Programming
Agenda • Introduction • Sample Sequential Program • Multi-thread programming • OpenMP • Summary Parallel and Distributed Programming
Message Passing Model / Shared Memory Model Parallel and Distributed Programming
Thank you! Parallel and Distributed Programming