1.03k likes | 1.28k Views
Concurrency Testing Challenges, Algorithms, and Tools. Madan Musuvathi Microsoft Research. Concurrency is HARD. A concurrent program should Function correctly Maximize throughput Finish as many tasks as possible Minimize latency Respond to requests as soon as possible
E N D
Concurrency TestingChallenges, Algorithms, and Tools Madan Musuvathi Microsoft Research
Concurrency is HARD • A concurrent program should • Function correctly • Maximize throughput • Finish as many tasks as possible • Minimize latency • Respond to requests as soon as possible • While handling nondeterminism in the environment
Concurrency is Pervasive • Concurrency is an age-old problem of computer science • Most programs are concurrent • At least the one that you expect to get paid for, anyway
Solving the Concurrency Problem • We need • Better programming abstractions • Better analysis and verification techniques • Better testing methodologies Weakest Link
Testing is more important than you think • My first-ever computer program: • Wrote it in Basic • Not the world’s best programming language • With no idea about program correctness • I didn’t know first-order logic, loop-invariants, … • I hadn’t heard about Hoare, Dijkstra, … • But still managed to write correct programs, using the write, test, [debug, write, test]+ cycle
How many of you have … • written a program > 10,000 lines? • written a program, compiled it, called it done without testing the program on a single input? • written a program, compiled it, called it done without testing the program on few interesting inputs?
Imagine a world where you can’t pick the inputs during testing … • You write the program • Check its correctness by staring at it • Give the program to the computer • The computer tests on inputs of its choice • factorial(5) = 120 • factorial(5) = 120 the next 100 times • factorial(7) = 5040 • The computer runs this program again and again on these inputs for a week • The program didn’t crash and therefore it is correct int factorial ( int x ) { int ret = 1; while(x > 1){ ret *= x; x --; } return ret; }
This is the world of concurrency testing • You write the program • Check its correctness by staring at it • Give the program to the computer • The computer generates some interleavings • The computer runs this program again and again on these interleavings • The program didn’t crash and therefore its is correct Parent_thread() { if (p != null) { p = new P(); Set (initEvent); } } Child_thread(){ if (p != null) { Wait (initEvent); } }
Demo How do we test concurrent software today
CHESS Proposition • Capture and expose nondeterminism to a scheduler • Threads can run at different speeds • Asynchronous tasks can start at arbitrary time in the future • Hardware/compiler can reorder instructions • Explore the nondeterminism using several algorithms • Tackle the astronomically large number of interleavings • Remember: Any algorithm is better than no control at all
CHESS in a nutshell • CHESS is a user-mode scheduler • Controls all scheduling nondeterminism • Replace the OS scheduler • Guarantees: • Every program run takes a different thread interleaving • Reproduce the interleaving for every run • Download CHESS source from http://chesstool.codeplex.com/
CHESS architecture Unmanaged Program Win32 Wrappers CHESS Exploration Engine Windows CHESS Scheduler Managed Program • Every run takes a different interleaving • Reproduce the interleaving for every run .NET Wrappers CLR
Running Example Thread 1 Thread 2 Lock (l); bal += x; Unlock(l); Lock (l); t = bal; Unlock(l); Lock (l); bal = t - y; Unlock(l);
Introduce Schedule() points Thread 1 Thread 2 • Instrument calls to the CHESS scheduler • Each call is a potential preemption point Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l);
First-cut solution: Random sleeps • Introduce random sleep at schedule points • Does not introduce new behaviors • Sleep models a possible preemption at each location • Sleeping for a finite amount guarantees starvation-freedom Thread 1 Thread 2 Sleep(rand()); Lock (l); bal += x; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); t = bal; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); bal = t - y; Sleep(rand()); Unlock(l);
Improvement 1:Capture the “happens-before” graph Thread 1 Thread 2 Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); • Delays that result in the same “happens-before” graph are equivalent • Avoid exploring equivalent interleavings Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Sleep(5) Sleep(5)
Improvement 2:Understand synchronization semantics • Avoid exploring delays that are impossible • Identify when threads can make progress • CHESS maintains a run queue and a wait queue • Mimics OS scheduler state Thread 1 Thread 2 Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l);
CHESS modes: speed vs coverage • Fast-mode • Introduce schedule points before synchronizations, volatile accesses, and interlocked operations • Finds many bugs in practice • Data-race mode • Introduce schedule points before memory accesses • Finds race-conditions due to data races • Captures all sequentially consistent (SC) executions
CHESS Design Choices • Soundness • Any bug found by CHESS should be possible in the field • Should not introduce false errors (both safety and liveness) • Completeness • Any bug found in the field should be found by CHESS • In theory, we need to capture all sources of nondeterminism • In practice, we need to effectively explore the astronomically large state space
Capture all sources of nondeterminism?No. • Scheduling nondeterminism? Yes • Timing nondeterminism? Yes • Controls when and in what order the timers fire • Nondeterministic system calls? Mostly • CHESS uses precise abstractions for many system calls • Input nondeterminism? No • Rely on users to provide inputs • Program inputs, return values of system calls, files read, packets received,… • Good tradeoff in the short term • But can’t find race-conditions on error handling code
Capture all sources of nondeterminism?No. • Hardware relaxations? Yes • Hardware can reorder instructions • Non-SC executions possible in programs with data races • Sober [CAV ‘08] can detect and explore such non-SC executions • Compiler relaxations? No • Very few people understand what compilers can do to programs with data races • Far fewer than those who understand the general theory of relativity
Two kinds • Reduction algorithms • Explore one out of a large number equivalent interleavings • Prioritization algorithms • Pick “interesting” interleavings before you run out of resources • Remember: anything is better than nothing
Schedule Exploration Algorithms Reduction Algorithms
Enumerating Thread InterleavingsUsing Depth-First Search Thread 1 Thread 2 x = 1; y = 1; x = 2; y = 2; 0,0 Explore (State s) { T = set of threads in s; foreach t in T { s’ = schedule t in s Explore(s’); } } 2,0 1,0 x = 1; 1,0 2,2 1,1 2,0 y = 1; x = 2; 1,2 1,2 2,1 2,1 1,1 2,2 y = 2; 1,2 1,1 1,1 2,1 2,2 2,2
Behaviorally equivalent interleavings • Reach the same final state (x = 1, y = 3) x = 1; x = 1; y = 2; if(x == 1) { equiv if(x == 1) { y = 2; y = 3; } y = 3; }
Behaviorally inequivalentinterleavings • Reach different final states (1, 3) vs (1,2) x = 1; x = 1; y = 2; if(x == 1) { equiv if(x == 1) { y = 3; } y = 3; } y = 2;
Behaviorally inequivalentinterleavings • Don’t necessarily have to reach different states if(x == 1) { x = 1; x = 1; if(x == 1) { equiv y = 3; } y = 2; y = 2;
Execution Equivalence • Two executions are equivalent if they can be obtained by commuting independent operations r1 = y x = 1 r2 = y r3 = x x = 1 r2 = y r1 = y r3 = x r1 = y r2 = y x = 1 r3 = x r2 = y x = 1 r3 = x r1 = y
Formalism • Execution is a sequence of transitions • Each transition is of the form <tid, var, op> • tid: thread performing the transition • var: the memory location accessed in the transition • op: READ | WRITE | READWRITE • Two steps are independent if • They are executed by different threads and • Either they access different variable or READ the same variable
Equivalence makes the schedule space a Directed Acyclic Graph Thread 1 Thread 2 x = 1; y = 1; x = 2; y = 2; 0,0 2,0 1,0 1,0 2,2 1,1 2,0 1,2 1,2 2,1 2,1 1,1 2,2 1,2 1,1 1,1 2,1 2,2 2,2
DFS in a DAG (CS 101) HashTable visited; Explore (Sequence s) { T = set of threads enabled in S; foreach t in T { s’ = s . <t,v,o> ; s” = canon(s”); if (s’’ in visited) continue; visited.Add(s’’); Explore(s’); } } HashTable visited; Explore (Sequence s) { T = set of threads enabled in S; foreach t in T { s’ = s . <t,v,o> ; if (s’ in visited) continue; visited.Add(s’); Explore(s’); } } Explore (Sequence s) { T = set of threads enabled in S; foreach t in T { s’ = s . <t,v,o> ; Explore(s’); } } Sleep sets algorithm explores a DAG without maintaining the table
Sleep Set Algorithm Thread 1 Thread 2 x = 1; y = 1; x = 2; y = 2; 0,0 2,0 1,0 Identify transitions that take you to visited states 1,0 2,2 1,1 2,0 1,2 2,1 1,1 2,2 1,2 1,1 2,1 2,2
Sleep Set Algorithm Explore (Sequence s, sleep C) { T = set of transitions enabled in s; T’ = T – C; foreach t in T’ { C = C + t s’ = s . t ; C’ = C – {transitions dependent on t} Explore(s’, C’); } }
Summary • Sleep sets ensure that a stateless execution does not explode a DAG into a tree
Persistent Set Reduction Thread 1 Thread 2 x = 1; x= 2; y= 1; y = 2;
With Sleep Sets Thread 1 Thread 2 x = 1; x= 2; y= 1; y = 2;
With Persistent Sets • Assumption: we are only interested in the reachability of final states (for instance, no global assertions) Thread 1 Thread 2 x = 1; x= 2; y= 1; y = 2;
Persistent Sets • A set of transitions P is persistent in a state s, if • In the state space X reachable from s by only exploring transitions not in P • Every transition in X is independent with P • P “persists” in X • It is sound to only explore P from s s x
With Persistent Sets Thread 1 Thread 2 x = 1; x= 2; y= 1; y = 2;
Dynamic Partial-Order Reduction Algorithm [Flanagan & Godefroid] • Identifies persistent sets dynamically • After execution a transition, insert a schedule point before the most recent conflict Thread 1 Thread 2 y= 1; x= 1; x= 2; z= 3; y=1 x=1 x=2 x=1 x=2 z=3 z=3
Schedule Exploration Algorithms Prioritization Algorithms
Schedule Prioritization • Preemption bounding • Few preemptions are sufficient for finding lots of bugs • Preemption sealing • Insert preemptions where you think bugs are • Random • If you don’t have additional information about the state space, random is the best • Still do partial-order reduction
CHESS checks for various correctness criteria • Assertion failures • Deadlocks • Livelocks • Data races • Atomicity violations • (Deterministic) Linearizability violations
Concurrency Correctness Criterion LinearizabilityChecking in CHESS
Motivation • Writing good test oracles is hard
Motivation • Writing good test oracles is hard • Is this a correct assertion to check for? • Now what if there are 5 threads each performing 5 queue operations
We want to magically • Check if a Bank behaves like a Bank should do • Check if a queue behaves like a queue • Answer: Check for linearizability
Linearizability • The correctness notion closest to “thread safety” • A concurrent component behaves as if it is protected by a single global lock • Each operation appears to take effect instantaneously at some point between the call and return