V yrdMC : Driving Runtime Refinement Checking Using Model Checkers

Tayfun Elmas, Serdar TasiranKoç University, Istanbul, Turkey VyrdMC: Driving Runtime Refinement Checking Using Model Checkers PLDI 2005, June 12-15, Chicago, U.S.

Coverage metric for concurrent programs Motivation • Verification/testing of concurrent programs difficult • Exhaustive methods: State space too big, compounded by thread interleavings • Testing: Scalable but not exhaustive • Our work: Hybrid methods • Testing + model checking • Coverage metrics: Link between testing and model checking • Quantify adequacy of testing/verification • Communicate partial results, testing goals between tools • Direct tools toward unexplored, distinct new executions PLDI 2005, June 12-15, Chicago, U.S.

Idea • This paper: Metric directed at concurrency errors ONLY • Focus: “High-level” data races • Atomicity violations • Refinement violations • All variables may be lock-protected, but operations not implemented atomically PLDI 2005, June 12-15, Chicago, U.S.

Idea • Observation: Bug occurs whenever • Method1 executes up to line X, context switch occurs • Method2 starts execution from line Y • Provided there is a data dependency between • Method1’s code “right before” line X: BlockX • Method2’s code “right after” line Y: BlockY • Bug description follows pattern above • No other requirements on program state, other threads, method arguments, etc. • A “one-bit” data abstraction captures error scenario • Depdt: Is there a data dependency between BlockX and BlockY PLDI 2005, June 12-15, Chicago, U.S.

Idea • A “one-bit” data abstraction captures error scenario • Depdt: Is there a data dependency between BlockX and BlockY • Many other conditions may need to be set up: • Many other threads • Particular, complicated program state • But testing target easy to describe • Programmer thinks BlockY cannot follow BlockX because of assumptions about • the environment accessing the component • synchronization mechanism used PLDI 2005, June 12-15, Chicago, U.S.

Motivating examples The java.lang.StringBuffer bug The bug in Boxwood.Cache The Location Pairs (LP) Metric Reducing the Coverage Goal Approximating reachable LP’s Outline PLDI 2005, June 12-15, Chicago, U.S.

public synchronized StringBuffer append(StringBuffer sb) { public synchronized void setLength(int newLength) { int len = sb.length(); int newCount = count + len; if (newCount > value.length) ensureCapacity(newCount); ... if (count < newLength) ... } else { count = newLength; } return this; sb.getChars(0, len, value, count); count = newCount; } return this; }

Concurrency Bug in Cache Cache Cache Cache Cache Cache handle handle handle handle handle A Y A Y A B X Y A B Chunk Manager Chunk Manager Chunk Manager Chunk Manager Chunk Manager handle handle handle handle handle A Y A Y X Z T Z A Y Different byte-arrays for the same handle Corrupted data in persistent storage Experience Flush()starts Write(handle,AB) starts Write(handle, AB)ends Flush() ends PLDI 2005, June 12-15, Chicago, U.S.

private static void CpToCache( byte[] buf, CacheEntry te, int lsn, Handle h sb) { public static void Flush(int lsn) { ... lock (clean) { for (int i=0; i<buf.length; i++) { BoxMain.alloc.Write(h, te.data, te.data.length, 0, 0, WRITE_TYPE_RAW); te.data[i] = buf[i]; } } ... te.lsn = lsn } }

public synchronized StringBuffer append(StringBufer sb) {1 int len = sb.length();2 int newCount = count + len;3 if (newCount > value.length) {4 ensureCapacity(newCount);5 sb.getChars(0, len, value, count);6 count = newCount;7 return this;8 } -----------------------------------acquire(this)-----------------------------------invoke sb.length()--------------------------– L1 ----int len = sb.length()--------------------------- L2 ----int newCount = count + len -----------------------------------if (newCount > value.length) -----------------------------------expandCapacity(newCount); -----------------------------------invoke sb.getChar()-----------------------------------sb.getChars(0, len, value, count)--------------------------–--------count = newCount-----------------------------------return this

Coverage FSM State Method 2 Method 1 (LX, pend1, LY, pend2, depdt) Location inthe CFG ofMethod 2 Location inthe CFG ofMethod 1 Do actions following LX and LY have a data dependency? Is an “interesting” action in Method 2 expected next? Is an “interesting” action in Method 1 is expected next? PLDI 2005, June 12-15, Chicago, U.S.

(L2, !pend1, L3, pend2, !depdt) (L1, pend1, L3, !pend2, !depdt) (L1, !pend1, L3, !pend2, !depdt) Coverage FSM (L1, !pend1, L3, !pend2, depdt) t1: L1  L2 t2: L3  L4 t2: L3  L4 t1: L1  L2 PLDI 2005, June 12-15, Chicago, U.S.

Coverage Goal • The “pend1” bit gets set when • The depdt bit is TRUE • Method2 takes an action • Intition: Method1’s dependent action must follow • Must cover all (reachable) transitions of the form • p = (LXp, TRUE, LYp, pend2p, depdtp)  q = (LXq, pend1q, LYq, pend2q, depdtq) • p = (LXp, pend1p, LYp, TRUE, depdtp)  q = (LXq, pend1q, LYq, pend2q, depdtq) • Separate coverage FSM for each method pair • Cover transitions in each FSM PLDI 2005, June 12-15, Chicago, U.S.

Important Details • Action: Atomically executed code fragment • Defined by the language • Method calls: • Call action: Method call, all lock acquisitions • Return action: Total net effect of method, atomically executed + lock releases • But what if there is interesting concurrency inside called method? • Considered separately when that method is considered as one in the method pair PLDI 2005, June 12-15, Chicago, U.S.

Reducing the Coverage FSM • Method-local actions: • Basic block consisting of method-local actions considered a single atomic action • Atomic blocks: • User can annotate code blocks “atomic” • Block considered as single action • Pure blocks: • A “pure” execution of pure block does not affect global state • Example: Acquire lock, read global variable, decide resource not free, release lock • Considered a “no-op” • Modeled by “bypass transition” in coverage FSM. Does not need to be covered PLDI 2005, June 12-15, Chicago, U.S.

Evidence, estimate on # of locations • Errors captured by metric • 100% metric  Bug guaranteed to be triggered • Preliminary study • Bugs in Java class libraries • Bug found in Boxwood • Bug found in Scan file system • Bugs in E. Farchi, Y. Nir, S. UrConcurrent Bug Patterns and How to Test Them 17th Intl. Parallel and Distributed Processing Symposium (IDPDS ’03) • # of location pairs after factoring out atomic and pure blocks • 100s • How many are covered by random testing? How does coverage scale over time? • Don’t know yet. Students implementing coverage measurement tool. PLDI 2005, June 12-15, Chicago, U.S.

Caveat: LP reachability undecidable Metric only intended as aid to programmer What have I tested? What should I try to test? Make sure LP does not lead to error if it looks like it can be exercised. Future work: Better approximate reachable LP set Do conservative reachability analysis of coverage FSM for one method pair using predicate abstraction. Future Work: Approximating Reachable LP Set PLDI 2005, June 12-15, Chicago, U.S.

V yrdMC : Driving Runtime Refinement Checking Using Model Checkers