Concurrent Programming

Concurrent Programming

The Cunning Plan • We’ll look into: • What concurrent programming is • Whyyou care • How it’s done • We’re going to skim over *all* the interesting details

One-Slide Summary • There are many different ways to do concurrent programming • There can be more than one • We need synchronization primitives (e.g. semaphores) to deal with shared resources • Message passing is complicated

What? • Concurrent Programming • using multiple threads on a single machine • OS simulates concurrencyor • using multiple cores/processors • using message passing • Memory is not shared between threads • More general in terms of hardware requirements etc.

What (Shorter version) • There are a million different ways to do concurrent programming • We’ll focus on three: • co-begin blocks • threads • message passing

Concurrent Programming: Why? • Because it is intuitive for some problems (say you’re writing httpd) • Because we need better-than-sequential performance • Because the problem is inherently distributed (e.g. BitTorrent)

Coding Challenges • How do you divide the problem across threads? • easy: matrix multiplication using threads • hard: heated plate using message passing • harder:n-body simulation for large n

One Slide on Co-Begin • We want to execute commands simultaneously m’kay—solution: Main int x; int y; // ... run-in-parallel{ functionA(&x) | functionB(&x,&y)} B A

Threads • Most common in everyday applications • Instead of a run-in-parallel block, we want explicit ways to create and destroy threads • Threads can all see a program’s global variables (i.e. they share memory)

Some Syntax: Thread mythread = new Thread( new Runnable() { public void run() { // your code here } } ); mythread.start(); mythread.join();

Some Syntax: Thread mythread = new Thread( new Runnable() { public void run() { // your code here } } ); mythread.start(); void foo(int & x) { // your code here } //... int bar = 5;pthread_t my_id; pthread_create(&my_id, NULL, foo, (void *)&bar);// ... pthread_join(my_id, NULL);

Example: Matrix Multiplication • Given: • Compute: 9 * 2 = -187 * 5 = 35 4 * -3 = -12 +----- 5 9 * 2 = 187 * 5 = 35 4 * -3 = -12 +----- 41

Matrix Multiplication ‘Analysis’ • We havep = 4 size(A) = (p, q)q = 3 size(B) = (q, r)r = 4 size(C) = (p, r) • Complexity: • p×r elements in C • O(q) operations per element • Note: calculating each element of C is independent from the other elements

Matrix Multiplication using Threads pthread_t threads[P][Q]; struct location locs[P][Q]; for (i = 0; i < P; ++i) { for (j = 0; j < R; ++j) { (locs[i][j]).row = i; (locs[i][j]).col = j;pthread_create( &threads[i][j], NULL, calc_cell, (void*)(&(locs[i][j])) ); }} for (i = 0; i < P; ++i) { for (j = 0; j < R; ++j) {pthread_join( &threads[i][j], NULL); }}

Matrix Multiplication using Threads for each element in C: create a thread: call the function 'calc_cell' for each created thread: wait until the thread finishes// Profit

Postmortem • Relatively easy to parallellize: • matrices A and B are ‘read only’ • each thread writes to a unique entry in C • entries in Cdo not depend on each other • What are some problems with this? • overhead of creating threads • use of shared memory

Synchronization • So far, we have only covered how to create & destroy threads • What else do we need? (See title)

Synchronization • We want to do things like: • event A must happen before event B and • events Aand Bcannot occur simultaneously • Is there a problem here? Thread 1 Thread 2 counter = counter + 1 counter = counter + 1

Semaphores • A number n(initialized to some value) • Can only increment sem.V() and decrement sem.P() • n > 0 : P() doesn’t blockn ≤ 0 : P() blocks V() unblocks some waiting process

More Semaphore Goodness • Semaphores are straightforward to implement on most types of systems • Easy to use for resource management(set n equal to the number of resources) • Some additional features are common(e.g. bounded semaphores)

Semaphore Example • Let’s try this again: Main Semaphore wes = new Semaphore(0) // start threads 1 and 2 simultaneously Thread 1 Thread 2 counter = counter + 1 wes.V() wes.P() counter = counter + 1

Semaphore Example 2 • Suppose we want two threads to “meet up” at specific points in their code: Semaphore aArrived = new Semaphore(0)Sempahore bArrived = new Semaphore(0) // start threads A and B simultaneously Thread A Thread B foo_a1 aArrived.V() bArrived.P() foo_a2 foo_b1 bArrived.V() aArrived.P() foo_b2 foo_a1 bArrived.P() aArrived.V() foo_a2 foo_b1 aArrived.P() bArrived.V() foo_b2

Deadlock • ‘Deadlock’ refers to a situation in which one or more threads is waiting for something that will neverhappen • Theorem: You will, at some point in your life, write code that deadlocks

Readers/Writers Problem • Let’s do a slightly bigger example • Problem: • some finite buffer b • multiple writer threads(only one can write at a time) • multiple reader threads(many can read at a time) • can only read if no writing is happening

Readers/Writers Solution #1 int readers = 0 Semaphore mutex = new Semaphore(1) Semaphore roomEmpty = new Semaphore(1) Writers: roomEmpty.P() // write here roomEmpty.V()

Readers/Writers Solution #1 Readers: mutex.P() readers ++ if (readers == 1) roomEmpty.P() mutex.V() //read here mutex.P() readers -- if (readers == 0) roomEmpty.V() mutex.V()

Starvation • Starvation occurs when a thread is continuously denied resources • Not the same as deadlock: it might eventually get to run, but it needs to wait longer than we want • In the previous example, a writer might ‘starve’ if there is a continuous onslaught of readers

Guiding Question • Earlier, I said sem.V() unblocks some waiting thread • If we don’t unblock in FIFO order, that means we could cause starvation • Do we care?

Synchronization Summary • We can use semaphores to enforce synchronization: • ordering • mutual exclusion • queuing • There are other constructs as well • See your local OS Prof

Message Passing • Threads and co. rely on shared memory • Semaphores make very little sense if they cannot be shared between n > 1 threads • What about systems in which we can’t share memory?

Message Passing • Threads Processes are created for us(they just exist) • We can do the following: blocking_send(int destination, char * buffer, int size)blocking_receive(int source, char * buffer, int size)

Message Passing: Motivation • We don’t care if threads run on different machines: • same machine - use virtual memory tricks to make messages very quick • different machines - copy and send over the network

Heated Plate Simulation • Suppose you have a metal plate: • Three sides are chilled to 273 K • One side is heated to 373 K

Heated Plate Simulation Problem: Calculate the heat distribution after some time t: t=10 t=30 t=50

Heated Plate Simulation • We model the problem by dividing the plate into small squares: • For each time step, take the average of a square’s four neighbors

Heated Plate Simulation • Problem: need to communicate for each time step • Sending messages is expensive… P1 P2 P3

Heated Plate Simulation • Problem: need to communicate for each time step • Sending messages is expensive… • Solution: send fewer, larger messages, limitlongest message path P1 P2 P3

How to cause deadlock in MPI Process 1 Process 2 char * buff = "Goodbye" char * buff2 = new char(15); send(2, buff, 8) recv(2, buff, 15) char * buff = ", cruel world\n“char * buff2 = new char(8); send(1, buff, 15) recv(1, bugg, 8)

Postmortem • Our heated plate solution doesnot rely on shared memory • Sending messages becomes complicated in a hurry (easy to do the wrong thing) • We need to reinvent the wheel constantly for different interaction patterns

Example Summary • Matrix Multiplication Example • used threads and implicitly shared memory • this is common for everyday applications(especially useful for servers, GUI apps, etc.) • Heated Plate Example • used message passing • this is more common for big science and big business (also e.g. peer-to-peer) • it is not used to code your average firefox

Guiding Question If you’re writing a GUI app(let’s call it “firefox”)would you prefer to use threads or message passing?

Summary • Looked at three ways to do concurrent programming: • co-begin • threads, implicitly shared memory, semaphores • message passing • Concerns of scheduling, deadlock, starvation

Summary

Concurrent Programming