650 likes | 810 Views
MultiCore Processing Workshop. Multithreaded Programming using POSIX Threads( Pthreads ) Syed Akbar Mehdi. Outline. Preliminaries and Introduction Thread Management Synchronization Exercises. Part 1. Preliminaries. OS Basics Virtual Address Space Program Execution Basics
E N D
MultiCore Processing Workshop Multithreaded Programming using POSIX Threads(Pthreads) Syed Akbar Mehdi
Outline • Preliminaries and Introduction • Thread Management • Synchronization • Exercises.
Part 1. Preliminaries OS Basics Virtual Address Space Program ExecutionBasics Processes vs Threads POSIX Threads
Computer System Organization • Computer-system operation • One or more CPUs, device controllers connect through common bus providing access to shared memory • Concurrent execution of CPUs and devices competing for memory cycles
What is an OS? Half-Life 2 • software between applications and reality: • abstracts hardware and makes useful and portable • makes finite into (near)infinite • provides protection Visual Studio MS Word OS hardware
What is a Process? • A process is an “instance” of a program running. • Modern OSes run multiple processes simultaneously • Examples (can all run simultaneously): • gccfile_A.c – compiler running on file A • gccfile_B.c – compiler running on file B • emacs – text editor • firefox – web browser • Non-examples (implemented as one process): • Multiple firefox tabs are part of one process. • Why processes? • Simplicity of programming • Higher throughput (better CPU utilization), lower latency
What is a Process? • Each proc. Pi has own view of machine • Its own address space. • Its own open files. • Its own virtual CPU (through preemptive multitasking) • *(char *)0xc000 different in P1 & P2 • Greatly simplifies programming model • gcc does not care that firefox is running • Sometimes want interaction between processes • Simplest is through files: emacs edits file, gcc compiles it • More complicated: Shell/command, Window manager/app.
Basic Execution Environment IP FP SP int gvar = 100; int foo2 (int b) { return b * gvar; } int foo1 (int a) { int lvar = a + gvar; return foo2(lvar); } int main ( ) { int var1, int var2; var1 = 2; var2 = 3; var1 = foo1(var1); var2 = foo1(var2); return 0; } var1 = 2 var2 = 3 int main( ) { } int foo1 (int) { } int foo2 (int) { } main() Stack Heap gvar = 100 Global Text
Basic Execution Environment IP FP SP int gvar = 100; int foo2 (int b) { return b * gvar; } int foo1 (int a) { int lvar = a + gvar; return foo2(lvar); } int main ( ) { int var1, int var2; var1 = 2; var2 = 3; var1 = foo1(var1); var2 = foo1(var2); return 0; } int main( ) { } int foo1 (int) { } int foo2 (int) { } var1 = 2 var2 = 3 main() a = 2 lvar = 102 foo1() Stack Heap gvar = 100 Global Text
Basic Execution Environment IP FP SP int gvar = 100; int foo2 (int b) { return b * gvar; } int foo1 (int a) { int lvar = a + gvar; return foo2(lvar); } int main ( ) { int var1, int var2; var1 = 2; var2 = 3; var1 = foo1(var1); var2 = foo1(var2); return 0; } int main( ) { } int foo1 (int) { } int foo2 (int) { } var1 = 2 var2 = 3 main() a = 2 lvar = 102 foo1() Stack foo2() b = 102 Heap gvar = 100 Global Text
Basic Execution Environment IP FP SP int gvar = 100; int foo2 (int b) { return b * gvar; } int foo1 (int a) { int lvar = a + gvar; return foo2(lvar); } int main ( ) { int var1, int var2; var1 = 2; var2 = 3; var1 = foo1(var1); var2 = foo1(var2); return 0; } int main( ) { } int foo1 (int) { } int foo2 (int) { } var1 = 2 var2 = 3 main() a = 2 lvar = 102 foo1() Stack Heap gvar = 100 Global Text
Basic Execution Environment IP FP SP int gvar = 100; int foo2 (int b) { return b * gvar; } int foo1 (int a) { int lvar = a + gvar; return foo2(lvar); } int main ( ) { int var1, int var2; var1 = 2; var2 = 3; var1 = foo1(var1); var2 = foo1(var2); return 0; } var1 = 10200 var2 = 3 int main( ) { } int foo1 (int) { } int foo2 (int) { } main() Stack Heap gvar = 100 Global Text
Basic Execution Environment IP FP SP int gvar = 100; int foo2 (int b) { return b * gvar; } int foo1 (int a) { int lvar = a + gvar; return foo2(lvar); } int main ( ) { int var1, int var2; var1 = 2; var2 = 3; var1 = foo1(var1); var2 = foo1(var2); return 0; } int main( ) { } int foo1 (int) { } int foo2 (int) { } var1 = 10200 var2 = 3 main() a = 3 lvar = 103 foo1() Stack Heap gvar = 100 Global Text
Basic Execution Environment IP FP SP int gvar = 100; int foo2 (int b) { return b * gvar; } int foo1 (int a) { int lvar = a + gvar; return foo2(lvar); } int main ( ) { int var1, int var2; var1 = 2; var2 = 3; var1 = foo1(var1); var2 = foo1(var2); return 0; } int main( ) { } int foo1 (int) { } int foo2 (int) { } var1 = 10200 var2 = 3 main() a = 3 lvar = 103 foo1() Stack foo2() b = 103 Heap gvar = 100 Global Text
Basic Execution Environment IP FP SP int gvar = 100; int foo2 (int b) { return b * gvar; } int foo1 (int a) { int lvar = a + gvar; return foo2(lvar); } int main ( ) { int var1, int var2; var1 = 2; var2 = 3; var1 = foo1(var1); var2 = foo1(var2); return 0; } int main( ) { } int foo1 (int) { } int foo2 (int) { } var1 = 10200 var2 = 3 main() a = 3 lvar = 103 foo1() Stack Heap gvar = 100 Global Text
Basic Execution Environment IP FP SP int gvar = 100; int foo2 (int b) { return b * gvar; } int foo1 (int a) { int lvar = a + gvar; return foo2(lvar); } int main ( ) { int var1, int var2; var1 = 2; var2 = 3; var1 = foo1(var1); var2 = foo1(var2); return 0; } var1 = 10200 var2 = 10300 int main( ) { } int foo1 (int) { } int foo2 (int) { } main() Stack Heap gvar = 100 Global Text
What is a thread? • What’s needed to run code on CPU • “execution stream in an execution context” • Execution stream: sequential seq. of instructions • CPU execution context (1 thread) • State: stack, heap, registers • Position: Instruction Pointer(IP) register • OS execution context (n threads): • identity + open file descriptors, page table, …
What is a thread? • All threads in a process share the same address space. • *(char *)0xc000 means “the same” in thread T1 and T2. • All threads share the same file descriptors. • Which implies that they share network sockets. • All threads have access to the same heap and same global variables. • Write access to global variables should be protected by a synchronization mechanism. • Each thread has its separate stack, Instruction Pointer and Local variables. • Therefore each thread has its own independent flow of execution
Pthreads • Historically, hardware vendors have implemented their own proprietary versions ofthreads. • These implementations differed significantly from each other resulting in reduced portability. • In order to take full advantage of the capabilities provided by threads, a standardized programming interface was required. • For UNIX systems, this interface has been specified by the IEEE POSIX 1003.1c standard (1995). • Implementations adhering to this standard are referred to as POSIX threads, or Pthreads. • Most hardware vendors now offer Pthreads in addition to their proprietary API's. • Pthreads are defined as a set of C language programming types and procedure calls, implemented with a pthread.hheader/include file and a thread library.
Pthreads • The subroutines which comprise the Pthreads API can be informally grouped into four major groups: • Thread management: Routines that work directly on threads • Mutexes: Routines that deal with synchronization, called a "mutex", which is an abbreviation for "mutual exclusion" • Condition variables: Routines that address communications between threads that share a mutex. • Synchronization: Routines that manage read/write locks and barriers.
Part 2. Thread Management Creating and Terminating Threads Passing Arguments to Threads Joining and Detaching Threads Setting Thread Attributes Miscellaneous Routines
Creating and Terminating Threads • The following functions are used for creating and terminating threads: • pthread_create (thread,attr,start_routine,arg) • pthread_exit (status) • pthread_attr_init (attr) • pthread_attr_destroy (attr)
Creating Threads • Initially, your main() program comprises a single, default thread. • All other threads must be explicitly created by the programmer. • The maximum number of threads that may be created by a process is implementation dependent. • Once created, threads are peers, and may create other threads. • There is no implied hierarchy or dependency between threads.
Creating Threads Return a non zero value in success intpthread_create(pthread_t *thr, const pthread_attr_t *attr, void *(*start_routine)(void), void *arg) Will contain the newly created thread’s id. Must be passed by reference pthread_t *thr Give the attributes that this thread will have. Use NULL for the default ones. const pthread_attr_t *attr The name of the function that the thread will run. Must have a void pointer as its return and parameters values void *(*start_routine)(void) The argument for the function that will be the body of the Pthreads void *arg Pointers of the type void can reference ANY type of data, but they CANNOT be used in any type of operations that reads or writes its data without a cast
Terminating Threads • There are several ways in which a Pthread may be terminated. • The thread returns from its starting routine • This means the main() function for the initial thread. • The thread makes a call to the pthread_exit subroutine. • Typically, the pthread_exit() routine is called after a thread has completed its work and is no longer required to exist. • The thread is canceled by another thread via the pthread_cancel routine. • The entire process is terminated due to a call to either the exec or exit subroutines. • If main() finishes before the threads it has created. • If it uses pthread_exit(), the other threads will continue to execute. • If main simply returns they will be automatically terminated.
Misc. Useful Functions pthread_tpthread_self(void) Return the id of the calling thread. Returns a pthread_t type which is usually an integer type variable OpenMP Counterpart intomp_get_thread_num(void); voidpthread_exit(void *arg); This function will indicate the end of a Pthread and the returning value will be put in arg
“Hello World” Example #include <pthread.h> #define NUM_THREADS 4 void* work(void *i){ printf("Hello, world from %i\n", pthread_self()); pthread_exit(NULL); } int main(int argc, char **argv){ int i; pthread_t id[NUM_THREADS]; for(i = 0; i < NUM_THREADS; ++i){ if(pthread_create(&id[i], NULL, work, NULL)){ printf("Error creating the thread\n"); exit(-1); } } printf("After creating the thread. My id is: %i\n", pthread_self()); return 0;} Hello, world from 2 Hello, world from 3 After creating the thread. My id is: 1 Hello, world from 4 What happened to thread 5??? Hello, world from 2 Hello, world from 3 Hello, world from 4 After creating the thread. My id is: 1 Hello, world from 5
Single Argument Passing Cast its value as a void pointer (a tricky pass by value) Cast its address as a void pointer (pass by reference). The value that the address is pointing should NOT change between Pthreads creation Multiple Argument Passing Heterogonous: Create an structure with all the desired arguments and pass an element of that structure as a void pointer. Homogenous: Create an array and then cast it as a void pointer Passing Arguments to Threads
Passing a Single Argument #include <pthread.h> #define NUM_THREADS 10 void *work(void *i){ int f = *((int *)(i)); printf("Hello, world from %i with value %i\n", pthread_self(), f); pthread_exit(NULL); } int main(int argc, char **argv){ int i; pthread_t id[NUM_THREADS]; for(i = 0; i < NUM_THREADS; ++i){ if(pthread_create(&id[i], NULL, work, (void *)(&i))){ printf("Error creating the thread\n"); exit(-1);} } return 0; } Hello, world from 2 with value 1 Hello, world from 3 with value 2 Hello, world from 6 with value 5 Hello, world from 5 with value 5 Hello, world from 4 with value 4 Hello, world from 8 with value 9 Hello, world from 9 with value 9 Hello, world from 10 with value 9 Hello, world from 7 with value 6 Hello, world from 11 with value 10 Wrong Method!!!!
Passing a Single Argument #include <pthread.h> #define NUM_THREADS 10 void *work(void *i){ int f = (int)(i); printf("Hello, world from %i with value %i\n", pthread_self(), f); pthread_exit(NULL); } int main(int argc, char **argv){ int i; pthread_t id[NUM_THREADS]; for(i = 0; i < NUM_THREADS; ++i){ if(pthread_create(&id[i], NULL, work, (void *)(i))){ printf("Error creating the thread\n"); exit(-1); } } return 0; } Hello, world from 2 with value 0 Hello, world from 3 with value 1 Hello, world from 4 with value 2 Hello, world from 5 with value 3 Hello, world from 6 with value 4 Hello, world from 7 with value 5 Hello, world from 8 with value 6 Hello, world from 10 with value 8 Hello, world from 11 with value 9 Right Method 1
Passing a Single Argument #include <pthread.h> #define NUM_THREADS 10 void *work(void *i){ int f = *((int *)(i)); printf("Hello, world from %i with value %i\n", pthread_self(), f); pthread_exit(NULL); } int main(int argc, char **argv){ int i; int y[NUM_THREADS]; pthread_t id[NUM_THREADS]; for(i = 0; i < NUM_THREADS; ++i){ y[i] = i; if(pthread_create(&id[i], NULL, work, (void *)(&y[i]))){ printf("Error creating the thread\n"); exit(-1); } } return 0; } Hello, world from 2 with value 0 Hello, world from 4 with value 2 Hello, world from 5 with value 3 Hello, world from 6 with value 4 Hello, world from 7 with value 5 Hello, world from 8 with value 6 Hello, world from 9 with value 7 Hello, world from 3 with value 1 Hello, world from 10 with value 8 Hello, world from 11 with value 9 Right Method 2
Thread Joining • Joining is a way to accomplish “coarse grained” synchronization between threads. • The pthread_join() subroutine blocks the calling thread until the thread with the specified “id” terminates. • A joining thread can match one pthread_join() call. • It is a logical error to attempt multiple joins on the same thread.
Thread Joining The Joining of All Loose Ends: pthread_join intpthread_join(pthread_tid, void **tr); Make sure that the thread that has this id returns. Otherwise waits for it pthread_t id The id of a created thread OpenMP Counterpart #pragma omp barrier void **tr A pointer to the result of the thread T3 T3 T2 T2 T1 T1 Returns a non zero value in success Main Main Premature thread death Join point Why use it?If the main thread dies, then all other threads will die with it. Even if they have not completed their work
Thread Joining #include <pthread.h> #define NUM_THREADS 4 void *work(void *i){ printf("Hello, world from %i\n", pthread_self()); pthread_exit(NULL); } int main(int argc, char **argv){ int i; pthread_t id[NUM_THREADS]; for(i = 0; i < NUM_THREADS; ++i){ if(pthread_create(&id[i], NULL, work, NULL)){ exit(-1); } } printf("After creating the thread. My id is: %i\n“, pthread_self()); for(i = 0; i < NUM_THREADS; ++i){ if(pthread_join(id[i], NULL)){ exit(-1); } } printf("After joining\n"); return 0; } Hello, world from 2 Hello, world from 3 Hello, world from 4 After creating the thread. My id is: 1 Hello, world from 5 After joining
Thread Attributes • By default, a thread is created with certain attributes. Some of these attributes can be changed by the programmer via the thread attribute object. • Thread attributes help the programmer customize the behavior of thread execution. • pthread_attr_init and pthread_attr_destroy are two functions used to initialize/destroy the thread attribute object. • Other routines are then used to query/set specific attributes in the thread attribute object.
Thread Attributes intpthread_attr_init(pthread_attr_t *attr); • Initialize an attribute with the default values for the attribute object • Default Schedule: SCHED_OTHER (?) • Default Scope: PTHREAD_SCOPE_SYSTEM (?) • Default Join State: PTHREAD_CREATE_JOINABLE (?) intpthread_attr_destroy(pthread_attr_t *attr); De-allocate any memory and state that the attribute object occupied. It is safe to delete the attribute object after the thread has been created intpthread_attr_setdetachstate(pthread_attr_t *attr, intJOIN_STATE); • Set the attached parameter on the attribute object with the JOIN_STATE variable • PTHREAD_CREATE_JOINABLE: It can be joined at a join point. State must be saved after function ends • PTHREAD_CREATE_DETACHED: It cannot be joined at a join point. State and resources are de-allocated immediately
Thread Attributes intpthread_attr_setschedpolicy(pthread_attr_t *attr, intpolicy) • Set the scheduling policy of the thread: • SCHED_OTHER Regular scheduling • SCHED_RR Round-robin (SU) • SCHED_FIFO First-in First-out (SU) intpthread_attr_setschedparam(pthread_attr_t *attr, const structsched_param *pr) • Contains the schedule priority of the thread • Default: 0 intpthread_attr_setinheritsched(pthread_attr_t *attr, intinherit) Tell if the scheduling parameters will be inherit from the parent or the ones in the attribute object will be used PTHREAD_EXPLICIT_SCHED Scheduling parameters from the attribute object will be used. PTHREAD_INHERIT_SCHED inherit the attributes from its parent. intpthread_attr_setscope(pthread_attr_t *attr, intscope) • Contention parameter • PTHREAD_SCOPE_SYSTEM • PTHREAD_SCOPE_PROCESS
Thread Attributes #include <pthread.h> #define NUM_THREADS 4 structargs{int a; float b; char c;}; void *work(void *i){ structargs *a = (structargs *)(i); printf("(%3i, %.3f, %3c) --> %i\n", a->a, a->b, a->c, pthread_self()); pthread_exit(NULL); } int main(int argc, char **argv){ int i; structargs a[NUM_THREADS]; pthread_t id[NUM_THREADS]; pthread_attr_t ma; pthread_attr_init(&ma); pthread_attr_setdetachstate(&ma, PTHREAD_CREATE_JOINABLE); for(i = 0; i < NUM_THREADS; ++i){ a[i].a = i; a[i].b = 1.0 /(i+1); a[i].c = 'a' + (char)(i); pthread_create(&id[i], &ma, work, (void *)(&a[i])); } pthread_attr_destroy(&ma); for(i = 0; i < NUM_THREADS; ++i){pthread_join(id[i], NULL);} return 0; } ( 0, 1.000, a) --> 2 ( 3, 0.250, d) --> 5 ( 2, 0.333, c) --> 4 ( 1, 0.500, b) --> 3
Miscellaneous Useful Functions intpthread_attr_getstackaddr (const pthread_attr_t *attr, void **stackaddr) Return the stack address that this P-thread will be using intpthread_attr_getstacksize(const pthread_attr_t *attr, size_t *stacksize) Return the stack size that this P-thread will be using intpthread_detach (pthread_tthr, void **value_ptr) Make the thread that is identified by thr not joinable intpthread_once(pthread_once_t *once_control, void (*init_routine)(void)); Make sure that the init_routine is executed by a single thread and only once. The once_control is a synchronization mechanism that can be defined as: pthread_once_t once_control = PTHREAD_ONCE_INIT; OpenMP Counterpart #pragma omp single voidpthread_yield () Relinquish the use of the processor
Exercises • Compile and run the example code from the slides • Implement vector addition using Pthreads. • Implement matrix multiplication using Pthreads. • Try chunking and cyclic distribution for different matrix sizes and different core counts and observe the performance.
Part 3. Thread Synchronization Mutexes Read-Write Locks Condition Variables
Synchronization Types Mutual Exclusion Lock. Only the thread that has the lock can access the protected region Mutex pthreads A counter of resources that are available. Zero means no resources are left. Binary (Mutex) and Counting Semaphores Act as a guard of some resource. Consists of a mutex with some kind of notification scheme Java Monitors Synchronization occurs when a condition is met. Always used in conjunction with a mutex. Inter-thread (process) communication. pthreads Conditional Variables Permits only reads on a data or writes on data in a group. In other words, lock out the writers when the readers are on the shared data and vice versa. Reader / Writer Locks pthreads
Mutex • Mutex is an abbreviation for "mutual exclusion". • A mutex variable acts like a "lock" protecting access to a shared data resource. • Only one thread can lock (or own) a mutex variable at any given time. • If multiple threads try to lock a mutex at the same time, the first thread gets access and others are blocked. • They must wait their turn to lock the mutex. • Used to protect access to shared data visible to all threads • This usually means global variables and data structures.