Taming Concurrency: A Program Verification Perspective

Taming Concurrency:A Program Verification Perspective Shaz Qadeer Microsoft Research

Reliable concurrent software? • Correctness problem • does program behave correctly for allinputs and allinterleavings? • Bugs due to concurrency are insidious • non-deterministic, timing dependent • data corruption, crashes • difficult to detect, reproduce, eliminate

Undecidable problem! P satisfies S Why is verification of concurrent programs more difficult than verification of sequential programs?

Assertions: Provide contracts to decompose problem • into a collection of decidable problems • pre-condition and post-condition for each procedure • loop invariant for each loop P satisfies S • Abstractions: Provide an abstraction of the program • for which verification is decidable • Finite-state systems • finite automata • Infinite-state systems • pushdown automata, counter automata, timed automata, Petri nets

Interference • pre x = 0; int t; t := x; t := t + 1; x := t; Correct • post x = 1;

Interference • pre x = 0; A B int t; t := x; t := t + 1; x := t; int t; t := x; t := t + 1; x := t; Incorrect! • post x = 2;

Controlling interference • pre x = 0; A B int t; acquire(l); t := x; t := t + 1; x := t; release(l); int t; acquire(l); t := x; t := t + 1; x := t; release(l); Correct! • post x = 2;

Interference makes program verification difficult • Annotation explosion with the assertion approach • State explosion with the abstraction approach

Annotation explosion • For sequential programs • assertion for each loop • assertion refers only to variables in scope • For concurrent programs • assertion for each control location • assertion may need to refer to private state of other threads

State explosion Sequential Concurrent PSPACE complete Finite-state systems P.G Pushdown systems (P.G)3 Undecidable P = # of program locations G = # of global states n = # of threads

Taming interference • Atomicity for combating annotation explosion • Interference-bounding for combating state explosion

Bank account Critical_Section l; /*# guarded_by l */ intbalance; /*# atomic */ void deposit (int x) { acquire(l); int r = balance; balance = r + x; release(l); } /*# atomic */ int read( ) { int r; acquire(l); r = balance; release(l); return r; } /*# atomic */ void withdraw(int x) { int r = read(); acquire(l); balance = r – x; release(l); }

Atomicity violationin StringBuffer (Flanagan-Q, PLDI 03) public final class StringBuffer { private int count; private char[ ] value; . . public synchronized StringBuffer append (StringBuffersb) { if (sb == null) sb = NULL; intlen = sb.length( ); intnewcount = count + len; if (newcount > value.length) expandCapacity(newcount); sb.getChars(0, len, value, count); //use of stale len !! count = newcount; return this; } public synchronized int length( ) { return count; } public synchronized void getChars(. . .) { . . . } }

Inadequate atomicity is a good predictor of • undesirable interference! • compared to data races • Numerous tools for detecting atomicity violations • static (Type systems, ESPC, QED) • dynamic (Atomizer, AVIO, AtomAid, Velodrome, …) • Significant effort cutting across communities • architecture and operating systems • programming languages and compilers • testing and formal methods

x y acq(l) r=bal bal=r+n rel(l) z         acq(l) x r=bal y bal=r+n z rel(l)         • Non-serialized executions of deposit acq(l) x y r=bal bal=r+n z rel(l)         Definition of atomicity • Serialized execution of deposit • deposit is atomic if for every non-serialized execution, there is a serialized execution with the same behavior

Reduction (Lipton, CACM 75) acq(l) x r=bal y bal=r+n z rel(l) S0 S1 S2 S3 S4 S5 S6 S7 acq(l) y r=bal bal=r+n z rel(l) x S0 S1 S2 T3 S4 S5 S6 S7 x acq(l) y r=bal bal=r+n z rel(l) S0 T1 S2 T3 S4 S5 S6 S7 x y acq(l) r=bal bal=r+n z rel(l) S0 T1 T2 T3 S4 S5 S6 S7 x y r=bal bal=r+n rel(l) z acq(l) S0 T1 T2 T3 S4 S5 T6 S7

Four atomicities • R: right commutes • lock acquire • L: left commutes • lock release • B: both right + left commutes • variable access holding lock • A: atomic action, non-commuting • access unprotected variable

R* . x . A . Y . L* S0 S5 R* . . . Y x . A L* S0 S5 ; B L R A C B B L R A C R R A R A C L L L C C C A A A C C C C C C C C C Sequential composition • Theorem: Sequence (R+B)*;(A+); (L+B)* is atomic R; B ; A; L ; A R A R;A;L; R;A;L ; A A C

Bank account Critical_Section l; /*# guarded_by l */ intbalance; /*# atomic */ void deposit (int x) { acquire(l); int r = balance; balance = r + x; release(l); } /*# atomic */ int read( ) { int r; acquire(l); r = balance; release(l); return r; } /*# atomic */ void withdraw(int x) { int r = read(); acquire(l); balance = r – x; release(l); } R B B L R B L B A R B L A A C Incorrect!

Bank account Critical_Section l; /*# guarded_by l */ intbalance; /*# atomic */ void deposit (int x) { acquire(l); int r = balance; balance = r + x; release(l); } /*# atomic */ int read( ) { int r; acquire(l); r = balance; release(l); return r; } /*# atomic */ void withdraw(int x) { acquire(l); int r = balance; balance = r – x; release(l); } R B B L R B L B R B B L A A A Correct!

Taming interference • Atomicity for combating annotation explosion • Interference-bounding for combating state explosion

Interference bounding • An approach to the state-explosion problem • Explore all executions with a bounded amount of interference • increase the interference bound iteratively • Good idea if low interference bound • can be exploited by algorithms • is enough to expose bugs

Context-bounded verification [Wu-Q, PLDI 04] [Rehof-Q, TACAS 05] Context switch Context switch • Interference proportional to number of context switches • Explore all executions with few context switches • Unbounded computation within each context • Different from bounded model checking           Context Context Context

Context-bounding today • CHESS [Musuvathi-Q, PLDI 07] • JMoped [Suwimonteerabuth et al., SPIN 08] • SPIN for multithreaded C programs [Zaks-Joshi, SPIN 08] • CBA [Lal-Reps, CAV 08] • Static Driver Verifier

Testing concurrent programs is HARD • Bugs hidden in rare thread interleavings • Today, concurrency testing = stress testing • Poor coverage of interleavings • Unpredictable coverage results in “Heisenbugs” • The mark of reliability of the system still remains its ability to withstand stress

CHESS in a nutshell ConcurrentProgram Win32 API Kernel Scheduler Demonic Scheduler • Replace the OS scheduler with a demonic scheduler • Systematically explore all scheduling choices

CHESS architecture Program CHESS runs the scenario in a loop While(not done) { TestScenario() } CHESS TestScenario() { … } • Each run takes different interleaving • Each run is repeatable • Intercept synchronization and threading calls • To control and introduce nondeterminism Win32 API • Detect • Assertion violations • Deadlocks • Dataraces • Livelocks Kernel: Threads, Scheduler, Synchronization Objects

CHESS methodology generalizes Singularity Program .NET Program Win32 Program CHESS CHESS CHESS Singularity .NET CLR Win32 / OS • CHESS works for • Unmanaged programs (written in C, C++) • Managed programs (written in C#,…) • Singularity applications • With appropriate wrappers, can work for Java, Linux applications

CHESS: Systematic testing for concurrency Program While(not done){ TestScenario() } CHESS TestScenario(){ … } CHESS runs the scenario in a loop • Each run is a different interleaving • Each run is repeatable Win32 API Kernel: Threads, Scheduler, Synchronization Objects

State-space explosion Thread 1 Thread n x = 1; … … … … … x= k; x = 1; … … … … … x= k; … k steps each n threads Goal: Scale CHESS to large programs (large k) • Number of executions = O( nnk) • Exponential in both n and k • Typically: n < 10 k > 100 • Limits scalability to large programs

Preemption-bounding Thread 1 Thread 2 x = 1; if (p != 0) { x = p->f; } x = 1; if (p != 0) { p = 0; preemption x = p->f; } non-preemption • Prioritize executions with small number of preemptions • Two kinds of context switches: • Preemptions – forced by the scheduler • e.g. Time-slice expiration • Non-preemptions – a thread voluntarily yields • e.g. Blocking on an unavailable lock, thread end

Preemption-bounding in CHESS • The scheduler has a budget of c preemptions • Nondeterministically choose the preemption points • Resort to non-preemptive scheduling after c preemptions • Once all executions explored with c preemptions • Try with c+1 preemptions

Property 1: Polynomial bound Thread 1 Thread 2 • Choose c preemption points x = 1; … … … … x = 1; … … … … … x = k; x = 1; … … … x = 1; … … … … … x = k; • Permute n+c atomic blocks … … … x = k; x = k; • Terminating program with fixed inputs and deterministic threads • n threads, k steps each, c preemptions • Number of executions <= nkCc . (n+c)! = O( (n2k)c. n! ) Exponential in n and c, but not in k

Property 2: Simple error traces • Finds smallest number of preemptions to the error • Number of preemptions better metric of error complexity than execution length

Property 3: Coverage metric • If search terminates with preemption-bound of c, then any remaining error must require at least c+1 preemptions • Intuitive estimate for • The complexity of the bugs remaining in the program • The chance of their occurrence in practice

Property 4: Many bugs with 2 preemptions Acknowledgement: testers from PCP team

Coverage vs. Preemption-bound

CHESS status • Being used by testers in many Microsoft product groups • Demo and session at Professional Developer Conference 2008 • CHESS for Win32 • http://research.microsoft.com/projects/chess • CHESS for .NET • Coming soon

So far … • Interference makes program verification difficult • Taming interference • atomicity • interference-bounding

Whither next? • Concurrent programs as a composition of modules implementing atomic actions • Inteference-bounding for general concurrent systems • Symbolic context-bounded verification

Structuring concurrent programs • In addition to atomicity, we need • behavioral abstraction analogous to pre/post-conditions for sequential code fragments • intuitive and simple contract language • A calculus of atomic actions (Elmas-Tasiran-Q, POPL 08) • QED verifier

Interference-bounding for general concurrent systems • Task = unit-of-work • Shared-memory systems • Task (usually) corresponds to a thread • Message-passing systems • Task corresponds to a sequence of messages • Need linguistic support for the task abstraction in software models and implementations

Context-bounded reachability is NP-complete Unbounded Context-bounded PSPACE complete Finite-state systems NP-complete Pushdown systems Undecidable NP-complete P = # of program locations G = # of global states n = # of threads c = # of context switches

Symbolic context-bounded verification • The transition relations of tasks are usually encoded symbolically • reachability analysis is PSPACE-complete even for a single task • Scalable tools for single symbolic task • Bebop, Moped, Getafix • Can we extend these techniques to deal with multiple tasks? • Lal and Reps (TACAS 2008, CAV 2008) • Suwimonteerabuth et al. (SPIN 2008)

Questions?

Taming Concurrency: A Program Verification Perspective