1.45k likes | 1.46k Views
This article discusses the analysis of multithreaded programs, including their impact on program development and the challenges they introduce. It explores the use of threads for performance optimization and program structuring. Practical implications and future directions are also discussed.
E N D
Analysis of Multithreaded Programs Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology
What is a multithreaded program? NOT general parallel programs No message passing No tuple spaces No functional programs No concurrent constraint programs NOT just multiple threads of control No continuations No reactive systems Multiple Parallel Threads Of Control Lock Acquire and Release read write Shared Mutable Memory
Why do programmers use threads? • Performance (parallel computing programs) • Single computation • Execute subcomputations in parallel • Example: parallel sort • Program structuring mechanism (activity management programs) • Multiple activities • Thread for each activity • Example: web server • Properties have big impact on analyses
Practical Implications • Threads are useful and increasingly common • POSIX threads standard for C, C++ • Java has built-in thread support • Widely used in industry • Threads introduce complications • Programs viewed as more difficult to develop • Analyses must handle new model of execution • Lots of interesting and important problems!
Outline • Examples of multithreaded programs • Parallel computing program • Activity management program • Analyses for multithreaded programs • Handling data races • Future directions
Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2
8 2 7 4 6 1 3 5 Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2 Divide
8 2 7 4 6 1 3 5 Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2 Divide 4 7 1 6 3 5 2 8 Conquer
8 2 7 4 6 1 3 5 Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2 Divide 4 7 1 6 3 5 2 8 Conquer 1 4 6 7 2 3 5 8 Combine
8 2 7 4 6 1 3 5 Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2 Divide 4 7 1 6 3 5 2 8 Conquer 1 4 6 7 2 3 5 8 Combine 1 2 3 4 5 6 7 8
Divide and Conquer Algorithms • Lots of Recursively Generated Concurrency • Solve Subproblems in Parallel
Divide and Conquer Algorithms • Lots of Recursively Generated Concurrency • Recursively Solve Subproblems in Parallel
Divide and Conquer Algorithms • Lots of Recursively Generated Concurrency • Recursively Solve Subproblems in Parallel • Combine Results in Parallel
“Sort n Items in d, Using t as Temporary Storage” • void sort(int *d, int *t, int n) • if (n > CUTOFF) { • spawn sort(d,t,n/4); • spawn sort(d+n/4,t+n/4,n/4); • spawn sort(d+2*(n/4),t+2*(n/4),n/4); • spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); • sync; • spawn merge(d,d+n/4,d+n/2,t); • spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); • sync; • merge(t,t+n/2,t+n,d); • } else insertionSort(d,d+n);
“Sort n Items in d, Using t as Temporary Storage” • void sort(int *d, int *t, int n) • if (n > CUTOFF) { • spawn sort(d,t,n/4); • spawn sort(d+n/4,t+n/4,n/4); • spawn sort(d+2*(n/4),t+2*(n/4),n/4); • spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); • sync; • spawn merge(d,d+n/4,d+n/2,t); • spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); • sync; • merge(t,t+n/2,t+n,d); • } else insertionSort(d,d+n); Divide array into subarrays and recursively sort subarrays in parallel
7 4 6 1 3 5 8 2 “Sort n Items in d, Using t as Temporary Storage” • void sort(int *d, int *t, int n) • if (n > CUTOFF) { • spawn sort(d,t,n/4); • spawn sort(d+n/4,t+n/4,n/4); • spawn sort(d+2*(n/4),t+2*(n/4),n/4); • spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); • sync; • spawn merge(d,d+n/4,d+n/2,t); • spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); • sync; • merge(t,t+n/2,t+n,d); • } else insertionSort(d,d+n); Subproblems Identified Using Pointers Into Middle of Array d d+n/4 d+n/2 d+3*(n/4)
4 7 1 6 3 5 2 8 “Sort n Items in d, Using t as Temporary Storage” • void sort(int *d, int *t, int n) • if (n > CUTOFF) { • spawn sort(d,t,n/4); • spawn sort(d+n/4,t+n/4,n/4); • spawn sort(d+2*(n/4),t+2*(n/4),n/4); • spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); • sync; • spawn merge(d,d+n/4,d+n/2,t); • spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); • sync; • merge(t,t+n/2,t+n,d); • } else insertionSort(d,d+n); Sorted Results Written Back Into Input Array d d+n/4 d+n/2 d+3*(n/4)
4 1 4 7 1 6 6 7 3 2 3 5 2 5 8 8 “Merge Sorted Quarters of d Into Halves of t” • void sort(int *d, int *t, int n) • if (n > CUTOFF) { • spawn sort(d,t,n/4); • spawn sort(d+n/4,t+n/4,n/4); • spawn sort(d+2*(n/4),t+2*(n/4),n/4); • spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); • sync; • spawn merge(d,d+n/4,d+n/2,t); • spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); • sync; • merge(t,t+n/2,t+n,d); • } else insertionSort(d,d+n); d t t+n/2
1 1 4 2 3 6 4 7 5 2 3 6 7 5 8 8 “Merge Sorted Halves of t Back Into d” • void sort(int *d, int *t, int n) • if (n > CUTOFF) { • spawn sort(d,t,n/4); • spawn sort(d+n/4,t+n/4,n/4); • spawn sort(d+2*(n/4),t+2*(n/4),n/4); • spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); • sync; • spawn merge(d,d+n/4,d+n/2,t); • spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); • sync; • merge(t,t+n/2,t+n,d); • } else insertionSort(d,d+n); d t t+n/2
7 4 6 1 3 5 8 2 “Use a Simple Sort for Small Problem Sizes” • void sort(int *d, int *t, int n) • if (n > CUTOFF) { • spawn sort(d,t,n/4); • spawn sort(d+n/4,t+n/4,n/4); • spawn sort(d+2*(n/4),t+2*(n/4),n/4); • spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); • sync; • spawn merge(d,d+n/4,d+n/2,t); • spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); • sync; • merge(t,t+n/2,t+n,d); • } else insertionSort(d,d+n); d d+n
7 4 1 6 3 5 8 2 “Use a Simple Sort for Small Problem Sizes” • void sort(int *d, int *t, int n) • if (n > CUTOFF) { • spawn sort(d,t,n/4); • spawn sort(d+n/4,t+n/4,n/4); • spawn sort(d+2*(n/4),t+2*(n/4),n/4); • spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); • sync; • spawn merge(d,d+n/4,d+n/2,t); • spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); • sync; • merge(t,t+n/2,t+n,d); • } else insertionSort(d,d+n); d d+n
Key Properties of Parallel Computing Programs • Structured form of multithreading • Parallelism confined to small region • Single thread coming in • Multiple threads exist during computation • Single thread going out • Deterministic computation • Tasks update disjoint parts of data structure in parallel without synchronization • May also have parallel reductions
Main Loop Client Threads Accept new connection Start new client thread
Main Loop Client Threads Accept new connection Start new client thread
Main Loop Client Threads Accept new connection Start new client thread Wait for input Produce output
Main Loop Client Threads Accept new connection Start new client thread Wait for input Produce output
Main Loop Client Threads Accept new connection Wait for input Start new client thread Wait for input Produce output
Main Loop Client Threads Accept new connection Wait for input Start new client thread Wait for input Produce output
Main Loop Client Threads Accept new connection Wait for input Wait for input Start new client thread Produce output Wait for input Produce output
Main Loop Client Threads Accept new connection Wait for input Wait for input Start new client thread Produce output Produce output Wait for input Produce output
Main Loop Class Main { static public void loop(ServerSocket s) { c = new Counter(); while (true) { Socket p = s.accept(); Worker t = new Worker(p,c); t.start(); } } Accept new connection Start new client thread
Worker threads class Worker extends Thread { Socket s; Counter c; public void run() { out = s.getOutputStream(); in = s.getInputStream(); while (true) { inputLine = in.readLine(); c.increment(); if (inputLine == null) break; out.writeBytes(inputLine + "\n"); } } } Wait for input Increment counter Produce output
Synchronized Shared Counter Class Counter { int contents = 0; synchronized void increment() { contents++; } } Acquire lock Increment counter Release lock
Simple Activity Management Programs • Fixed, small number of threads • Based on functional decomposition Device Management Thread User Interface Thread Compute Thread
Key Properties of Activity Management Programs • Threads manage interactions • One thread per client or activity • Blocking I/O for interactions • Unstructured form of parallelism • Object is unit of sharing • Mutable shared objects (mutual exclusion) • Private objects (no synchronization) • Read shared objects (no synchronization) • Inherited objects passed from parent to child
Why analyze multithreaded programs? Discover or certify absence of errors (multithreading introduces new kinds of errors) Discover or verify application-specific properties (interactions between threads complicate analysis) Enable optimizations (new kinds of optimizations with multithreading) (complications with traditional optimizations)
Classic Errors in Multithreaded Programs Deadlocks Data Races
Deadlock Deadlock if circular waiting for resources (typically mutual exclusion locks) Thread 1: lock(l); lock(m); x = x + y; unlock(m); unlock(l); Thread 2: lock(m); lock(l); y = y * x; unlock(l); unlock(m);
Deadlock Deadlock if circular waiting for resources (typically mutual exclusion locks) Thread 1: lock(l); lock(m); x = x + y; unlock(m); unlock(l); Thread 2: lock(m); lock(l); y = y * x; unlock(l); unlock(m); Threads 1 and 2 Start Execution
Deadlock Deadlock if circular waiting for resources (typically mutual exclusion locks) Thread 1: lock(l); lock(m); x = x + y; unlock(m); unlock(l); Thread 2: lock(m); lock(l); y = y * x; unlock(l); unlock(m); Thread 1 acquires lock l
Deadlock Deadlock if circular waiting for resources (typically mutual exclusion locks) Thread 1: lock(l); lock(m); x = x + y; unlock(m); unlock(l); Thread 2: lock(m); lock(l); y = y * x; unlock(l); unlock(m); Thread 2 acquires lock m
Deadlock Deadlock if circular waiting for resources (typically mutual exclusion locks) Thread 1: lock(l); lock(m); x = x + y; unlock(m); unlock(l); Thread 2: lock(m); lock(l); y = y * x; unlock(l); unlock(m); Thread 1 holds l and waits for m while Thread 2 holds m and waits for l
Data Races Data race if two parallel threads access same memory location and at least one access is a write Data race No data race A[i] = v A[i] = v A[i] = v; A[j] = w; || A[j] = w A[j] = w
Synchronization and Data Races No data race if synchronization separates accesses Thread 1: lock(l); x = x + 1; unlock(l); Thread 2: lock(l); x = x + 2; unlock(l); Synchronization protocol: Associate lock with data Acquire lock to update data atomically
Why are data races errors? • Exist correct programs which contain races • But most races are programming errors • Code intended to execute atomically • Synchronization omitted by mistake • Consequences can be severe • Nondeterministic, timing-dependent errors • Data structure corruption • Complicates analysis and optimization
Overview of Analyses for Multithreaded Programs Key problem: interactions between threads • Flow-insensitive analyses • Escape analyses • Dataflow analyses • Explicit parallel flow graphs • Interference summary analysis • State space exploration
Program With Allocation Sites void main(i,j) ——————— ——————— ——————— void compute(d,e) ———— ———— ———— void evaluate(i,j) —————— —————— —————— void multiplyAdd(a,b,c) ————————— ————————— ————————— void abs(r) ———— ———— ———— void scale(n,m) —————— —————— void multiply(m) ———— ———— ———— void add(u,v) —————— ——————