C ooperative C oncurrency Bug I solation

Instrumentation and Sampling Strategiesfor Cooperative Concurrency Bug Isolation Guoliang Jin,Aditya Thakur, Ben Liblit, Shan Lu University of Wisconsin–Madison

Cooperative Concurrency Bug Isolation • They are synchronization mistakes in multi-threaded programs. • Several types: • Atomicity violation • Data race • Deadlock, etc. thread 1 thread 2 thread 1 thread 2 read(x) write(x) read(x) read(x) write(x) J？ J？ J L

Concurrency bugs are common in the fields • Developers are poor at parallel programming • Interleaving testing is inefficient • Applications with concurrency bugs shipped to the users ‚ ƒ €

Concurrency bug lead to failures in the field • Disasters in the past • Therac-25, Northeastern Blackout 2003 • More threats in multi-core era ‚

Failure diagnosis is critical

Concurrency Bug Failure Example L Concurrency Bug from Apache HTTP Server

Concurrency Bug Failure Example thread 1 thread 2 … log_writer() { idx … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); … return SUCCESS; … } … … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); J … return SUCCESS; … } … Concurrency Bug from Apache HTTP Server

Concurrency Bug Failure Example thread 1 thread 2 … log_writer() { … log_writer() { idx … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); L … return SUCCESS; … } … Concurrency Bug from Apache HTTP Server

Diagnosing Concurrency Bug Failure is Challenging • The failure is non-deterministic and rare • Programmers have trouble to repeat the failure • The root cause involves more than one thread

Existing work and their limitations • Failure replay • High runtime overhead • Developers need to manually locate faults • Run-time bug detection • (mostly) High runtime overhead • Not guided by the failure • Many false positives How to achieve low-overhead & accurate failure diagnosis?

Our work: CCI • Goal: diagnosing production run concurrency bug failures • Major components: • predicates instrumentor • sampler • statistical debugging Predicates ProgramSource True in most failure runs, false in most correct runs. Sampler ƒ ‚ ƒ Compiler € StatisticalDebugging Counts& J/L Predictors

CCI Overview • Three different types of predicates. • Each predicate has its supporting sampling strategy. • Same statistical debugging as in CBI. • Experiments show CCI is effective in diagnosing concurrency failures. Prev Havoc FunRe

Outline • Motivation • CCI Overview • CCI Predicates and Sampling Strategies • CCI-Prev and its sampling strategy • CCI-Havoc and its sampling strategy • CCI-FunRe and its sampling strategy • Evaluation • Conclusion • Motivation • CCI Overview • CCI Predicates and Sampling Strategies • CCI-Prev and its sampling strategy • CCI-Havoc and its sampling strategy • CCI-FunRe and its sampling strategy • Evaluation • Conclusion

CCI-PrevIntuition Data Race Atomicity Violation thread 1 thread 2 thread 1 thread 2 thread 1 thread 2 thread 1 thread 2 read(x) read(x) read(x) read(x) read(x) write(x) write(x) read(x) read(x) write(x) write(x) read(x) read(x) write(x) write(x) read(x) read(x) J L J L Just record which thread accessed last time.

CCI-PrevPredicate It tracks whether two successive accesses to a shared memory location were by two distinct threads or were by the same thread.

CCI-Prev Predicate on the Correct Run thread 1 thread 2 … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); I … return SUCCESS; … } … … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); J I … return SUCCESS; … } … Concurrency Bug from Apache HTTP Server

CCI-Prev Predicate on the Failure Run thread 1 thread 2 … log_writer() { … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); I … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); I L … return SUCCESS; … } … Concurrency Bug from Apache HTTP Server

CCI-PrevPredicate Instrumentation thread 1 thread 2 … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … … log_writer() { … } … lock(glock); a globalhash table remote = test_and_insert(&idx, curTid); record(I, remote); temp = idx; idx= temp + strlen(s); I unlock(glock); L … return SUCCESS; … } … Concurrency Bug from Apache HTTP Server

CCI-PrevSampling Strategy • Thread-coordinated • Bursty thread 1 thread 2 … log_writer() { … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); I • Does traditional sampling work? • NO. … return SUCCESS; … } …

CCI-Havoc Intuition thread 1 thread 2 … log_writer() { … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); I … return SUCCESS; Just record what value was observed during last access. … } …

CCI-HavocPredicate It tracks whether the value of a given shared location changes between two consecutive accesses by one thread. Only uses thread local information

CCI-Havoc Predicate on the Correct Run thread 1 thread 2 … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); I … return SUCCESS; … } … … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); J I … return SUCCESS; … } … Concurrency Bug from Apache HTTP Server

CCI-Havoc Predicate on the Failure Run thread 1 thread 2 … log_writer() { … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); I … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); I … return SUCCESS; L … } … Concurrency Bug from Apache HTTP Server

CCI-Havoc Predicate Instrumentation thread 1 thread 2 … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … log_writer() { … } … … temp = idx; idx= temp + strlen(s); I hash table for thread1 changed = test(&idx, temp); record(I, changed); insert (&idx, temp); … return SUCCESS; L … } … Concurrency Bug from Apache HTTP Server

CCI-Havoc Sampling Strategy • Bursty • Thread-independent thread 1 thread 2 … log_writer() { … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); … return SUCCESS; … } …

CCI-FunRePredicate It tracks whether the execution of one function overlaps with the execution of the same function from a different thread.

CCI-FunRePredicate Example thread 1 thread 2 thread 1 thread 2 … log_writer() { … return SUCCESS; } … … log_writer() { … return SUCCESS; } … … log_writer() { … return SUCCESS; } … … log_writer() { … return SUCCESS; } … J L

CCI-FunRePredicate Instrumentation thread 1 thread 2 … log_writer() { oldCount= atomic_inc(Count); record(“log_writer”, oldCount); … atomic_dec(Count); return SUCCESS; } … … log_writer() { oldCount= atomic_inc(Count); record(“log_writer”, oldCount); … atomic_dec(Count); return SUCCESS; } … L

CCI-FunReSampling Strategy thread 1 thread 2 … log_writer() { … return SUCCESS; } … … log_writer() { oldCount= atomic_inc(Count); record(“log_writer”, oldCount); … atomic_dec(Count); return SUCCESS; } … L Function execution accounting is not suitable for sampling, so this part is unconditional.

CCI-FunReSampling Strategy • Function execution accounting: • unconditional • FunRe predicate recording: • thread-independent • non-bursty

Experimental Evaluation • Implementation • Static instrumentor based on the CBI framework • Real world concurrency bug failure from: • Apache HTTP server, Cherokee • Mozilla-JS, PBZIP2 • SPLASH-2: FFT, LU • Parameter used • Roughly 1/100 sampling rate

Failure Diagnosis Evaluation • Methodology • Using concurrency bug failures occurred in real-world • Each app. runs 3000 times on a multi-core machine • Add random sleep to get some failure runs • Sampling is enabled • Statistical debugging then return a list of predictors • Which predictor in the list can diagnose failure?

Failure Diagnosis Results (with sampling) FunRe Havoc Prev Capability

Runtime Overhead FunRe Havoc Prev Overhead

Conclusion • CCI is capable and suitable to diagnose many production-run concurrency bug failures. • Future predicates can leverage our effective sampling strategies. • Experiments confirm design tradeoff.

CCI Questions about ?

CBI on Concurrency Bug Failures thread 1 thread 2 … log_writer() { … log_writer() { To diagnose production-run concurrency bug failures, interleaving related events should be tracked!!! idx … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); CBI does not work! … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); L … return SUCCESS; Concurrency Bug from Apache HTTP Server … } …

CCI-PrevPredicate Instrumentation with Sampling if (gsample) { lock(glock); changed = test_and_insert(&cnt, curTid); record(I, changed); temp = cnt; unlock(glock); } else { temp = cnt; } [[ gsample = true; iset = curTid; lLength=gLength=0;]]?

CCI-PrevPredicate Instrumentation with Sampling if (gsample) { lock(glock); changed = test_and_insert(&cnt, curTid, &stale); changed = test_and_insert(&cnt, curTid); record(stale ? P1 : P2, changed); record(I, changed); temp = cnt; unlock(glock); gLength++; lLength++; if(( iset == curTid && lLength > lMAX) || gLength > gMAX) { clear (); iset= unusedTid; gsample= false; } } else { temp = cnt; [[ gsample = true; iset = curTid; lLength=gLength=0;]]? }

CCI-Havoc Predicate Instrumentation with Sampling if (sample) { changed = test(&cnt, cnt, &stale); record(stale ? P1 : P2, changed); temp = cnt; insert (&cnt, cnt); length++; if(length > lMAX) { clear (); sample = false; } No global lock used!!! } else { temp = cnt; [[ sample = true; length=0;]]? }

Failure Diagnosis Results (with sampling) FunRe Havoc Prev Capability

Failure diagnosis is critical

C ooperative C oncurrency Bug I solation