1 / 97

Concurrency Testing Challenges, Algorithms, and Tools

Concurrency Testing Challenges, Algorithms, and Tools. Madan Musuvathi Microsoft Research. Concurrency is HARD. A concurrent program should Function correctly Maximize throughput Finish as many tasks as possible Minimize latency Respond to requests as soon as possible

garin
Download Presentation

Concurrency Testing Challenges, Algorithms, and Tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Concurrency TestingChallenges, Algorithms, and Tools Madan Musuvathi Microsoft Research

  2. Concurrency is HARD • A concurrent program should • Function correctly • Maximize throughput • Finish as many tasks as possible • Minimize latency • Respond to requests as soon as possible • While handling nondeterminism in the environment

  3. Concurrency is Pervasive • Concurrency is an age-old problem of computer science • Most programs are concurrent • At least the one that you expect to get paid for, anyway

  4. Solving the Concurrency Problem • We need • Better programming abstractions • Better analysis and verification techniques • Better testing methodologies Weakest Link

  5. Testing is more important than you think • My first-ever computer program: • Wrote it in Basic • Not the world’s best programming language • With no idea about program correctness • I didn’t know first-order logic, loop-invariants, … • I hadn’t heard about Hoare, Dijkstra, … • But still managed to write correct programs, using the write, test, [debug, write, test]+ cycle

  6. How many of you have … • written a program > 10,000 lines? • written a program, compiled it, called it done without testing the program on a single input? • written a program, compiled it, called it done without testing the program on few interesting inputs?

  7. Imagine a world where you can’t pick the inputs during testing … • You write the program • Check its correctness by staring at it • Give the program to the computer • The computer tests on inputs of its choice • factorial(5) = 120 • factorial(5) = 120 the next 100 times • factorial(7) = 5040 • The computer runs this program again and again on these inputs for a week • The program didn’t crash and therefore it is correct int factorial ( int x ) { int ret = 1; while(x > 1){ ret *= x; x --; } return ret; }

  8. This is the world of concurrency testing • You write the program • Check its correctness by staring at it • Give the program to the computer • The computer generates some interleavings • The computer runs this program again and again on these interleavings • The program didn’t crash and therefore its is correct Parent_thread() { if (p != null) { p = new P(); Set (initEvent); } } Child_thread(){ if (p != null) { Wait (initEvent); } }

  9. Demo How do we test concurrent software today

  10. CHESS Proposition • Capture and expose nondeterminism to a scheduler • Threads can run at different speeds • Asynchronous tasks can start at arbitrary time in the future • Hardware/compiler can reorder instructions • Explore the nondeterminism using several algorithms • Tackle the astronomically large number of interleavings • Remember: Any algorithm is better than no control at all

  11. CHESS in a nutshell • CHESS is a user-mode scheduler • Controls all scheduling nondeterminism • Replace the OS scheduler • Guarantees: • Every program run takes a different thread interleaving • Reproduce the interleaving for every run • Download CHESS source from http://chesstool.codeplex.com/

  12. CHESS architecture Unmanaged Program Win32 Wrappers CHESS Exploration Engine Windows CHESS Scheduler Managed Program • Every run takes a different interleaving • Reproduce the interleaving for every run .NET Wrappers CLR

  13. Running Example Thread 1 Thread 2 Lock (l); bal += x; Unlock(l); Lock (l); t = bal; Unlock(l); Lock (l); bal = t - y; Unlock(l);

  14. Introduce Schedule() points Thread 1 Thread 2 • Instrument calls to the CHESS scheduler • Each call is a potential preemption point Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l);

  15. First-cut solution: Random sleeps • Introduce random sleep at schedule points • Does not introduce new behaviors • Sleep models a possible preemption at each location • Sleeping for a finite amount guarantees starvation-freedom Thread 1 Thread 2 Sleep(rand()); Lock (l); bal += x; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); t = bal; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); bal = t - y; Sleep(rand()); Unlock(l);

  16. Improvement 1:Capture the “happens-before” graph Thread 1 Thread 2 Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); • Delays that result in the same “happens-before” graph are equivalent • Avoid exploring equivalent interleavings Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Sleep(5) Sleep(5)

  17. Improvement 2:Understand synchronization semantics • Avoid exploring delays that are impossible • Identify when threads can make progress • CHESS maintains a run queue and a wait queue • Mimics OS scheduler state Thread 1 Thread 2 Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l);

  18. CHESS modes: speed vs coverage • Fast-mode • Introduce schedule points before synchronizations, volatile accesses, and interlocked operations • Finds many bugs in practice • Data-race mode • Introduce schedule points before memory accesses • Finds race-conditions due to data races • Captures all sequentially consistent (SC) executions

  19. CHESS Design Choices • Soundness • Any bug found by CHESS should be possible in the field • Should not introduce false errors (both safety and liveness) • Completeness • Any bug found in the field should be found by CHESS • In theory, we need to capture all sources of nondeterminism • In practice, we need to effectively explore the astronomically large state space

  20. Capture all sources of nondeterminism?No. • Scheduling nondeterminism? Yes • Timing nondeterminism? Yes • Controls when and in what order the timers fire • Nondeterministic system calls? Mostly • CHESS uses precise abstractions for many system calls • Input nondeterminism? No • Rely on users to provide inputs • Program inputs, return values of system calls, files read, packets received,… • Good tradeoff in the short term • But can’t find race-conditions on error handling code

  21. Capture all sources of nondeterminism?No. • Hardware relaxations? Yes • Hardware can reorder instructions • Non-SC executions possible in programs with data races • Sober [CAV ‘08] can detect and explore such non-SC executions • Compiler relaxations? No • Very few people understand what compilers can do to programs with data races • Far fewer than those who understand the general theory of relativity

  22. Schedule Exploration Algorithms

  23. Two kinds • Reduction algorithms • Explore one out of a large number equivalent interleavings • Prioritization algorithms • Pick “interesting” interleavings before you run out of resources • Remember: anything is better than nothing

  24. Schedule Exploration Algorithms Reduction Algorithms

  25. Enumerating Thread InterleavingsUsing Depth-First Search Thread 1 Thread 2 x = 1; y = 1; x = 2; y = 2; 0,0 Explore (State s) { T = set of threads in s; foreach t in T { s’ = schedule t in s Explore(s’); } } 2,0 1,0 x = 1; 1,0 2,2 1,1 2,0 y = 1; x = 2; 1,2 1,2 2,1 2,1 1,1 2,2 y = 2; 1,2 1,1 1,1 2,1 2,2 2,2

  26. Behaviorally equivalent interleavings • Reach the same final state (x = 1, y = 3) x = 1; x = 1; y = 2; if(x == 1) { equiv if(x == 1) { y = 2; y = 3; } y = 3; }

  27. Behaviorally inequivalentinterleavings • Reach different final states (1, 3) vs (1,2) x = 1; x = 1; y = 2; if(x == 1) { equiv if(x == 1) { y = 3; } y = 3; } y = 2;

  28. Behaviorally inequivalentinterleavings • Don’t necessarily have to reach different states if(x == 1) { x = 1; x = 1; if(x == 1) { equiv y = 3; } y = 2; y = 2;

  29. Execution Equivalence • Two executions are equivalent if they can be obtained by commuting independent operations r1 = y x = 1 r2 = y r3 = x x = 1 r2 = y r1 = y r3 = x r1 = y r2 = y x = 1 r3 = x r2 = y x = 1 r3 = x r1 = y

  30. Formalism • Execution is a sequence of transitions • Each transition is of the form <tid, var, op> • tid: thread performing the transition • var: the memory location accessed in the transition • op: READ | WRITE | READWRITE • Two steps are independent if • They are executed by different threads and • Either they access different variable or READ the same variable

  31. Equivalence makes the schedule space a Directed Acyclic Graph Thread 1 Thread 2 x = 1; y = 1; x = 2; y = 2; 0,0 2,0 1,0 1,0 2,2 1,1 2,0 1,2 1,2 2,1 2,1 1,1 2,2 1,2 1,1 1,1 2,1 2,2 2,2

  32. DFS in a DAG (CS 101) HashTable visited; Explore (Sequence s) { T = set of threads enabled in S; foreach t in T { s’ = s . <t,v,o> ; s” = canon(s”); if (s’’ in visited) continue; visited.Add(s’’); Explore(s’); } } HashTable visited; Explore (Sequence s) { T = set of threads enabled in S; foreach t in T { s’ = s . <t,v,o> ; if (s’ in visited) continue; visited.Add(s’); Explore(s’); } } Explore (Sequence s) { T = set of threads enabled in S; foreach t in T { s’ = s . <t,v,o> ; Explore(s’); } } Sleep sets algorithm explores a DAG without maintaining the table

  33. Sleep Set Algorithm Thread 1 Thread 2 x = 1; y = 1; x = 2; y = 2; 0,0 2,0 1,0 Identify transitions that take you to visited states 1,0 2,2 1,1 2,0 1,2 2,1 1,1 2,2 1,2 1,1 2,1 2,2

  34. Sleep Set Algorithm Explore (Sequence s, sleep C) { T = set of transitions enabled in s; T’ = T – C; foreach t in T’ { C = C + t s’ = s . t ; C’ = C – {transitions dependent on t} Explore(s’, C’); } }

  35. Summary • Sleep sets ensure that a stateless execution does not explode a DAG into a tree

  36. Persistent Set Reduction Thread 1 Thread 2 x = 1; x= 2; y= 1; y = 2;

  37. With Sleep Sets Thread 1 Thread 2 x = 1; x= 2; y= 1; y = 2;

  38. With Persistent Sets • Assumption: we are only interested in the reachability of final states (for instance, no global assertions) Thread 1 Thread 2 x = 1; x= 2; y= 1; y = 2;

  39. Persistent Sets • A set of transitions P is persistent in a state s, if • In the state space X reachable from s by only exploring transitions not in P • Every transition in X is independent with P • P “persists” in X • It is sound to only explore P from s s x

  40. With Persistent Sets Thread 1 Thread 2 x = 1; x= 2; y= 1; y = 2;

  41. Dynamic Partial-Order Reduction Algorithm [Flanagan & Godefroid] • Identifies persistent sets dynamically • After execution a transition, insert a schedule point before the most recent conflict Thread 1 Thread 2 y= 1; x= 1; x= 2; z= 3; y=1 x=1 x=2 x=1 x=2 z=3 z=3

  42. Schedule Exploration Algorithms Prioritization Algorithms

  43. Schedule Prioritization • Preemption bounding • Few preemptions are sufficient for finding lots of bugs • Preemption sealing • Insert preemptions where you think bugs are • Random • If you don’t have additional information about the state space, random is the best • Still do partial-order reduction

  44. Concurrency Correctness Criterion

  45. CHESS checks for various correctness criteria • Assertion failures • Deadlocks • Livelocks • Data races • Atomicity violations • (Deterministic) Linearizability violations

  46. Concurrency Correctness Criterion LinearizabilityChecking in CHESS

  47. Motivation • Writing good test oracles is hard

  48. Motivation • Writing good test oracles is hard • Is this a correct assertion to check for? • Now what if there are 5 threads each performing 5 queue operations

  49. We want to magically • Check if a Bank behaves like a Bank should do • Check if a queue behaves like a queue • Answer: Check for linearizability

  50. Linearizability • The correctness notion closest to “thread safety” • A concurrent component behaves as if it is protected by a single global lock • Each operation appears to take effect instantaneously at some point between the call and return

More Related