Summarizing Procedures in Concurrent Programs

Summarizing Procedures in Concurrent Programs Shaz Qadeer Sriram K. Rajamani Jakob Rehof Microsoft Research

Motivation • How do you scale program analyses for sequential programs? • Summarize at procedure boundaries • Sharir-Pnueli ‘81, Reps-Horwitz-Sagiv ‘95 • Used in compiler dataflow analyses • Used in error detection tools • SLAM (Ball-Rajamani ‘00) • ESP (Das-Lerner-Seigle ‘02)

Summarization is efficient ! • Boolean program with: • g globals • n procedures, each with at most m locals • |E| = size of the CFG of the program • Complexity : O( |E|  2 O(g+m) ) • Complexity linear in the number of procedures!

Summarization gives termination! • Possibly recursive boolean programs • Infinite state systems • Checking terminates with summarization!

Question Can summarization help analysis of concurrent programs?

Difficulty Assertion checking for multithreaded programs is undecidable • Even if all variables are boolean • Further, even if only two threads! • Reduce emptiness of intersection of two CFLs to this problem(Ramalingam 00)

Our work • New model checking algorithm using summarization • useful for concurrent programs • Summaries provide re-use and efficiency for analyzing concurrent programs • Enable termination of analysis in a large class of concurrent programs • includes programs with recursion, shared variables and concurrency

Difficulties in summarizing concurrent programs • What is a summary? • For sequential programs • Summary of procedure P = Set of all pre-post state pairs (s,s’) obtained by invoking P • This doesn’t work for concurrent programs • Does not model concurrent updates by other threads

Insight • In a well synchronized concurrent program • A thread’s computation can be viewed as a sequence of transactions • While analyzing a transaction, interleavings with other threads need not be considered • Key idea: Summarize transactions!

How do you identify transactions? Lipton’s theory of reduction

x r=bal S2 S3 S4 r=bal x S2 T3 S4 z rel(this) r=bal y acq(this) x S5 S6 S7 S2 S3 S4 S0 S1 S2 rel(this) x acq(this) z y r=bal S2 S0 S5 T1 T6 S7 S2 T3 S4 Four atomicities • R: right movers • lock acquire • L: left movers • lock release • B: both right + left movers • variable access holding lock • N: non-movers • access unprotected variable

Transaction Any sequence of actions whose atomicities are in R*(N+)L* is a transaction R R R N R L L S5 S6 S7 S2 S0 S1 S3 S4 Transaction Precommit Postcommit

Transactions and summaries Corollary of Lipton’s theorem: No need to schedule other threads in the middle of a transaction If a procedure body occurs in a transaction, we can summarize it!

Resource allocator (1) bool available[N]; mutex m; int getResource() { int i = 0; L0: acquire(m); L1: while (i < N) { L2: if (available[i]) { L3: available[i] = false; L4: release(m); L5: return i; } L6: i++; } L7: release(m); L8: return i; } Choose N = 2 Summaries: <pc, i, m, (a[0],a[1])>  <pc’, i’, m’, (a[0]’,a[1]’)> <L0, 0, 0, (0, 0)>  <L8, 2, 0, (0,0)> <L0, 0, 0, (0, 1)>  <L5, 1, 0, (0,0)> <L0, 0, 0, (1, 0)>  <L5, 0, 0, (0,0)> <L0, 0, 0, (1, 1)>  <L5, 0, 0, (0,1)>

What if transaction boundaries and procedure boundaries do not coincide? Two level model checking algorithm

Two level algorithm • First level maintains stack • Second level maintains stack-less summaries • Summaries can start and end anywhere in a procedure

Resource allocator (2) bool available[N]; mutex m[N]; int getResource() { int i = 0; L0: while (i < N) { L1: acquire(m[i]); L2: if (available[i]) { L3: available[i] = false; L4: release(m[i]); L5: return i; } else { L6: release(m[i]); } L7: i++; } L8: return i; } Choose N = 2 Summaries: <pc,i,(m[0],m[1]),(a[0],a[1]>  <pc’,i’,(m[0]’,m[1]’),(a[0]’,a[1]’)> <L0, 0, (0,0), (0,0)>  <L1, 1, (0,0), (0,0)> <L0, 0, (0,0), (0,1)>  <L1, 1, (0,0), (0,1)> <L0, 0, (0,0), (1,0)>  <L5, 0, (0,0), (0,0)> <L0, 0, (0,0), (1,1)>  <L5, 0, (0,0), (0,1)> <L1, 1, (0,0), (0,0)>  <L8, 2, (0,0), (0,0)> <L1, 1, (0,0), (0,1)>  <L5, 1, (0,0), (0,0)> <L1, 1, (0,0), (1,0)>  <L8, 2, (0,0), (1,0)> <L1, 1, (0,0), (1,1)>  <L5, 1, (0,0), (1,0)>

Two level model checking algorithm: in pictures Lets first review the sequential CFL algorithm…

main( ) bar( ) bar()

Two level model checking algorithm: in pictures

main( ) bar( ) bar()

Three kinds of summaries: • MAX • MAXCALL • MAXRETURN main( ) bar( ) MAXCALL MAX End of transaction bar() MAXRETURN MAXRETURN bar main main T1 T2

Concurrency + recursion int g = 0; mutex m; void foo(int r) { L0: if (r == 0) { L1: foo(r); } else { L2: acquire(m); L3: g++; L4: release(m); } L5: return; } Summaries for foo: <pc,r,m,g>  <pc’,r’,m’,g’> <L0,1,0,0>  <L5,1,0,1> <L0,1,0,1>  <L5,1,0,2> Summaries for main: <pc,q,m,g>  <pc’,q’,m’,g’> <M0,1,0,0>  <M1,1,0,1> <M0,1,0,1>  <M1,1,0,2> <M1,1,0,1>  <M4,1,0,1> <M1,1,0,2>  <M4,1,0,2> void main() { int q = choose({0,1}); M0: foo(q); M1: acquire(m) M2: assert(g >= 1); M3: release(m); M4: return; } P = main() || main()

What if the same procedure is called from different phases of a transaction? Instrument the transaction phase into the state of the program

Transactional context int gm = 0, gn = 0; mutex m, n; void bar() { N0: acquire(m); N1: gm++; N2: release(m); } void foo1() { L0: acquire(n); L1: gn++; L2: bar(); L3: release(n); L4: return; } void foo2() { M0: acquire(n); M1: gn++; M2: release(n); M3: bar(); M4: return; } P = foo1() || foo2()

Recap of technical problems • How do you identify transactions • Using the theory of reduction (Lipton ’75) • What if transaction boundaries do not coincide with procedure boundaries? • Two level model checking algorithm • First level maintains stack • Second level maintains stack-less summaries • Procedure can be called from different phases of a transaction • Instrument the transaction phase into the state of program

Termination • A function is transactional if no transaction ends in the “middle” of its exectution (includes all transitive callees) • Theorem: For concurrent boolean programs, if all recursive functions are transactional, then the algorithm terminates.

Sequential case • If we feed a sequential program to our algorithm it functions exactly like the Reps-Sagiv-Horwitz-POPL95 algorithm • Our algorithm generalizes the RHS algorithm to concurrent programs!

Related work • Summarizing sequential programs • Sharir-Pnueli ‘81, Reps-Horwitz-Sagiv ‘95, Ball-Rajamani ‘00 • Concurrency+Procedures • Bouajjani-Esparza-Touili ‘02 • Esparza-Podeslki ‘00 • Reduction • Lipton ‘75 • Qadeer-Flannagan ‘03

(joint work with Tony Andrews)

Automatic abstraction SLAM model checker Data flow analysis implemented using BDDs Finite state machines Push down model Boolean program FSM abstraction C data structures, pointers, procedure calls, parameter passing, scoping,control flow Source code Sequential C program

Zing model checker Rich control constructs: thread creation, function call, exception, objects, dynamic allocation Model checking is undecidable! abstraction Source code Device driver (taking concurrency into account), web services code

What is Zing? • Zing is a framework for software model-checking • Language, compiler, runtime, tools • Supports key software concepts • Enables easier extraction of models from code • Supports research in exploring large state spaces • Operates seamlessly with the VS.Net design environment

Current status • Summarization: • Theory: to appear in POPL 04 • Implementation: in progress • Zing: • Compiler, model checker and conformance checker operational • State-delta and transaction-based reduction implemented • Plans: • Symbolic reasoning • Automatic abstraction

Bluetooth demo

Zing State Explorer BPEL4WS checking BPEL Processes Buyer Seller Zing Model Auction House Reg Service

Summarizing Procedures in Concurrent Programs