260 likes | 396 Views
CSE 160 - Lecture 15. Introduction to Threads, Synchronization and Mutual Exclusion. Heavyweight Processes. Complete stand-alone programs Code segment Data Segment Static data Heap Malloc’ed data Stack Registers. How can two heavyweight processed communicate. Process 1. Process 2.
E N D
CSE 160 - Lecture 15 Introduction to Threads, Synchronization and Mutual Exclusion
Heavyweight Processes • Complete stand-alone programs • Code segment • Data Segment • Static data • Heap • Malloc’ed data • Stack • Registers
How can two heavyweight processed communicate Process 1 Process 2 myshmPtr myshmPtr Shared Memory Segment or Communication Socket
Shared Memory Segment • Only a single cpu or multiprocessor shared memory • A “named” segment of memory that processes attach to • shmat() function call for Unix • Processes are given pointers to the beginning of the shared memory segment • Structure of the segment contents are not specified
Concurrent Access Problem Shared Memory Segment Process 2 Process 1 ptrY = myshmPtr + sizeof (int); *ptrY = 1; if (*ptrY > 0) *ptrY --; ptrY = myshmPtr + sizeof (int); *ptrY = 1; if (ptrY > 0) *ptrY --; myshmPtr myshmPtr int x; int y; int z; What value is y after these programs execute?
Mutual Exclusion • In general, the temporal (time) order in which processes execute code relative to each other is unknown • Portions of code that modify shared variables are called critical sections • Access to critical shared variables must regulated so that only one process at a time may have access to the section; • This is called serialization of access or mutual exclusion
Implementing Mutual Exclusion • Spin Locks While (lock == 1) /* wait */ ; lock = 1; <critical section> lock = 0; • Busy waiting is inefficient • Naïve implementation has pitfalls (how?)
Atomic Operations • Implementing locks, semaphores, monitors requires atomic building blocks load r0, <lock> cmp r0, 0 jne again: add r0, 1 store <lock>, r0 Again: A second process could be swapped in. (Simultaneously in an SMP) Need to make sure all operations complete without interruption (atomically)
Test and Set • CPU designers recognize this need and have special hardware instructions • test and set • test for zero, set if not zero • fetch and increment • fetch location and add one
Semaphores • Introduced by Dijkstra. • Give a higher-level test and set semantic • Two operations P and V. • P(semaphore) : if > 0, decrement semaphore, otherwise, wait • V(semaphore): increment semaphore by one • Semaphore initialized > 0 • Provides the functionality needed to implement mutual exclusion • Standard OS construct • semget(), semctl(), semop() system calls
More Mutual Exclusion • Monitors • Higher-level than Semaphores making them less prone to error • To gain access to shared resource, programs must always go through the monitor. • Condition variables • Gain access to a resource, when a particular condition occurs (more later).
Threads • For SMP, could always use heavyweight processes • Performance penalties • More burden on the programmer to manage shared structures (“pointer hell”) • Threads allow concurrency within a single process • Lighter-weight access
Processes and Threads • Process includes address space. • Thread is program counter and stack pointer. • Process may have many threads. • All the threads share the same address space. • Processes are heavyweight, threads are lightweight. • Processes/threads need not map one-to-one onto processors.
heap stack 1 SP1 data stack 2 SP2 stack 3 SP3 PC1 function f PC2 code function g PC3 Three Threads Within a Process
pool of processors pool of threads Thread Execution Model • Each thread of control can be scheduled by the OS when it is in a runnable state. • Threads within one process can run concurrently • mutual exclustion is very important
Thread Execution Model: Key Points • Pool of processors, pool of threads. • Threads are peers. • Dynamic thread creation. • Can support many more threads than processors. • Threads dynamically switch between processors. • Threads share access to memory. • Synchronization needed between threads.
Why Use Threads? • Representing Concurrent Entities • Concurrency is part of the problem specification. • Examples: systems programming and user interfaces. • Single or multiple processors. • This kind of multithreaded programming is difficult. • Multiprocessing for Performance • Concurrency is under programmer’s control. • Programs could be written sequentially. • This kind of multithreaded programming should be easier.
Commercial Thread Libraries • Win32 threads (Windows NT and Windows 95). • Pthreads (POSIX Thread Interface).(SGI IRIX, Sun Solaris, HP-UX, IBM AIX, Linux, etc.). • Solaris threads (SunOS 5.x). • All designed primarily for systems programming.
Example: Pthreads • POSIX Threads – available on many platforms • Thread Management:pthread_create(), pthread_join(), pthread_exit(), pthread_kill(),pthread_cancel() • Mutexes:pthread_mutex_create(), pthread_mutex_init(), pthread_mutex_lock(), pthread_mutex_unlock(), pthread_mutux_trylock() • Events:pthread_cond_init(), pthread_cond_wait(), pthread_cond_timedwait(), pthread_cond_signal() • Scheduling:pthread_setschedparam(), pthread_attr_setschedpolicy()
Condition Variables • Would like to be “woken up” when a particular condition occurs • Calling pthread_cond_wait(mutex) releases exclusive access to a mutex. Thread sleeps. • When condition is signalled, thread wakes up and given access back to the mutex
Conditional Waiting action() { lock(); while (x != 0) wait (s); unlock(); } counter() { lock(); x--; if (x==0) signal(s); unlock(); } Both must occur before wait() returns
A Simple Example: Array Summation • int array_sum(int n, int data[]){ int mid; int low_sum, high_sum; mid = n/2; low_sum = 0; high_sum = 0; #pragma multithreadable { for (int i = 0; i < mid; i++) low_sum = low_sum + data[i]; for (int j = mid; j < n; j++) high_sum = high_sum + data[j]; } return low_sum + high_sum;}
typedef struct { int n, *data, mid; int *high_sum, *low_sum;} args_block; • void sum_0(args_block *args){ for (int i = 0; i < args->mid; i++) *args->low_sum = *args->low_sum + args->data[i];} • void sum_1(args_block *args){ for (int j = args->mid; j < args->n; j++) *args->high_sum = *args->high_sum + args->data[j];} • int array_sum(int n, int data[]){ int mid; int low_sum, high_sum; args_block args; pthread_t threads[2]; mid = n/2; args.n = n; args.data = data; args.mid = mid; args.low_sum = &low_sum; args.high_sum = &high_sum; • pthread_create(&thread[0], NULL, (void *) sum_0, (void *) &args); pthread_create(&thread[1], NULL, (void *) sum_1, (void *) &args); • for (i = 0; i < 2; i++) /* wait for threads to complete */ • pthread_join(&thread[i], &retval); return low_sum + high_sum;} attributes Routine to execute Thread args
Commodity Multithreaded Applications • Example Problems: Spreadsheets, CAD/CAM, simulation, video/photo editing and production, games, voice/handwriting recognition, real-time 3D rendering, job scheduling, etc. etc. • Need to run as fast as sequential on one processor. • Need to run significantly faster on multiprocessors. • No recompilation, no relinking, no reconfiguration. • Need to adapt dynamically to changing resources. • Need to be reliable and timely.
Last Thoughts on Threading • Threads provide a way to expose parallelism within a task. • Advantages • Straightforward parallelism • Common construction (Java, Win32, Pthreads) • Shared variables eliminates copying • Disadvantages • Mutual exclusion hard to think about • Not scalable to outside of a single SMP • (Active research to eliminate this)
An Aside: Automatic Parallelization ? • Write a sequential program. • Compiler transforms sequential program into efficient parallel (multithreaded) program • A very very very very very very very difficult problem. • Decades of work on this problem. • Some success with some regular scientific programs. • Not a general solution (and probably never will be). • Not applicable to large, irregular, dynamic programs. • Compilers must overuse locking to insure correctness • Compilers need help determining what code blocks can operate independently OpenMP directives