460 likes | 595 Views
Instrumentation and Sampling Strategies for. C ooperative C oncurrency Bug I solation. Guoliang Jin, Aditya Thakur, Ben Liblit , Shan Lu University of Wisconsin–Madison. Cooperative Concurrency Bug Isolation. They are synchronization mistakes in multi-threaded programs.
E N D
Instrumentation and Sampling Strategiesfor Cooperative Concurrency Bug Isolation Guoliang Jin,Aditya Thakur, Ben Liblit, Shan Lu University of Wisconsin–Madison
Cooperative Concurrency Bug Isolation • They are synchronization mistakes in multi-threaded programs. • Several types: • Atomicity violation • Data race • Deadlock, etc. thread 1 thread 2 thread 1 thread 2 read(x) write(x) read(x) read(x) write(x) J? J? J L
Concurrency bugs are common in the fields • Developers are poor at parallel programming • Interleaving testing is inefficient • Applications with concurrency bugs shipped to the users ‚ ƒ €
Concurrency bug lead to failures in the field • Disasters in the past • Therac-25, Northeastern Blackout 2003 • More threats in multi-core era ‚
Concurrency Bug Failure Example L Concurrency Bug from Apache HTTP Server
Concurrency Bug Failure Example thread 1 thread 2 … log_writer() { idx … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); … return SUCCESS; … } … … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); J … return SUCCESS; … } … Concurrency Bug from Apache HTTP Server
Concurrency Bug Failure Example thread 1 thread 2 … log_writer() { … log_writer() { idx … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); L … return SUCCESS; … } … Concurrency Bug from Apache HTTP Server
Diagnosing Concurrency Bug Failure is Challenging • The failure is non-deterministic and rare • Programmers have trouble to repeat the failure • The root cause involves more than one thread
Existing work and their limitations • Failure replay • High runtime overhead • Developers need to manually locate faults • Run-time bug detection • (mostly) High runtime overhead • Not guided by the failure • Many false positives How to achieve low-overhead & accurate failure diagnosis?
Our work: CCI • Goal: diagnosing production run concurrency bug failures • Major components: • predicates instrumentor • sampler • statistical debugging Predicates ProgramSource True in most failure runs, false in most correct runs. Sampler ƒ ‚ ƒ Compiler € StatisticalDebugging Counts& J/L Predictors
CCI Overview • Three different types of predicates. • Each predicate has its supporting sampling strategy. • Same statistical debugging as in CBI. • Experiments show CCI is effective in diagnosing concurrency failures. Prev Havoc FunRe
Outline • Motivation • CCI Overview • CCI Predicates and Sampling Strategies • CCI-Prev and its sampling strategy • CCI-Havoc and its sampling strategy • CCI-FunRe and its sampling strategy • Evaluation • Conclusion • Motivation • CCI Overview • CCI Predicates and Sampling Strategies • CCI-Prev and its sampling strategy • CCI-Havoc and its sampling strategy • CCI-FunRe and its sampling strategy • Evaluation • Conclusion
CCI-PrevIntuition Data Race Atomicity Violation thread 1 thread 2 thread 1 thread 2 thread 1 thread 2 thread 1 thread 2 read(x) read(x) read(x) read(x) read(x) write(x) write(x) read(x) read(x) write(x) write(x) read(x) read(x) write(x) write(x) read(x) read(x) J L J L Just record which thread accessed last time.
CCI-PrevPredicate It tracks whether two successive accesses to a shared memory location were by two distinct threads or were by the same thread.
CCI-Prev Predicate on the Correct Run thread 1 thread 2 … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); I … return SUCCESS; … } … … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); J I … return SUCCESS; … } … Concurrency Bug from Apache HTTP Server
CCI-Prev Predicate on the Failure Run thread 1 thread 2 … log_writer() { … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); I … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); I L … return SUCCESS; … } … Concurrency Bug from Apache HTTP Server
CCI-PrevPredicate Instrumentation thread 1 thread 2 … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … … log_writer() { … } … lock(glock); a globalhash table remote = test_and_insert(&idx, curTid); record(I, remote); temp = idx; idx= temp + strlen(s); I unlock(glock); L … return SUCCESS; … } … Concurrency Bug from Apache HTTP Server
CCI-PrevSampling Strategy • Thread-coordinated • Bursty thread 1 thread 2 … log_writer() { … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); I • Does traditional sampling work? • NO. … return SUCCESS; … } …
Outline • Motivation • CCI Overview • CCI Predicates and Sampling Strategies • CCI-Prev and its sampling strategy • CCI-Havoc and its sampling strategy • CCI-FunRe and its sampling strategy • Evaluation • Conclusion • Motivation • CCI Overview • CCI Predicates and Sampling Strategies • CCI-Prev and its sampling strategy • CCI-Havoc and its sampling strategy • CCI-FunRe and its sampling strategy • Evaluation • Conclusion
CCI-Havoc Intuition thread 1 thread 2 … log_writer() { … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); I … return SUCCESS; Just record what value was observed during last access. … } …
CCI-HavocPredicate It tracks whether the value of a given shared location changes between two consecutive accesses by one thread. Only uses thread local information
CCI-Havoc Predicate on the Correct Run thread 1 thread 2 … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); I … return SUCCESS; … } … … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); J I … return SUCCESS; … } … Concurrency Bug from Apache HTTP Server
CCI-Havoc Predicate on the Failure Run thread 1 thread 2 … log_writer() { … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); I … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); I … return SUCCESS; L … } … Concurrency Bug from Apache HTTP Server
CCI-Havoc Predicate Instrumentation thread 1 thread 2 … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … log_writer() { … } … … temp = idx; idx= temp + strlen(s); I hash table for thread1 changed = test(&idx, temp); record(I, changed); insert (&idx, temp); … return SUCCESS; L … } … Concurrency Bug from Apache HTTP Server
CCI-Havoc Sampling Strategy • Bursty • Thread-independent thread 1 thread 2 … log_writer() { … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); … return SUCCESS; … } …
Outline • Motivation • CCI Overview • CCI Predicates and Sampling Strategies • CCI-Prev and its sampling strategy • CCI-Havoc and its sampling strategy • CCI-FunRe and its sampling strategy • Evaluation • Conclusion • Motivation • CCI Overview • CCI Predicates and Sampling Strategies • CCI-Prev and its sampling strategy • CCI-Havoc and its sampling strategy • CCI-FunRe and its sampling strategy • Evaluation • Conclusion
CCI-FunRePredicate It tracks whether the execution of one function overlaps with the execution of the same function from a different thread.
CCI-FunRePredicate Example thread 1 thread 2 thread 1 thread 2 … log_writer() { … return SUCCESS; } … … log_writer() { … return SUCCESS; } … … log_writer() { … return SUCCESS; } … … log_writer() { … return SUCCESS; } … J L
CCI-FunRePredicate Instrumentation thread 1 thread 2 … log_writer() { oldCount= atomic_inc(Count); record(“log_writer”, oldCount); … atomic_dec(Count); return SUCCESS; } … … log_writer() { oldCount= atomic_inc(Count); record(“log_writer”, oldCount); … atomic_dec(Count); return SUCCESS; } … L
CCI-FunReSampling Strategy thread 1 thread 2 … log_writer() { … return SUCCESS; } … … log_writer() { oldCount= atomic_inc(Count); record(“log_writer”, oldCount); … atomic_dec(Count); return SUCCESS; } … L Function execution accounting is not suitable for sampling, so this part is unconditional.
CCI-FunReSampling Strategy • Function execution accounting: • unconditional • FunRe predicate recording: • thread-independent • non-bursty
Outline • Motivation • CCI Overview • CCI Predicates and Sampling Strategies • CCI-Prev and its sampling strategy • CCI-Havoc and its sampling strategy • CCI-FunRe and its sampling strategy • Evaluation • Conclusion • Motivation • CCI Overview • CCI Predicates and Sampling Strategies • CCI-Prev and its sampling strategy • CCI-Havoc and its sampling strategy • CCI-FunRe and its sampling strategy • Evaluation • Conclusion
Experimental Evaluation • Implementation • Static instrumentor based on the CBI framework • Real world concurrency bug failure from: • Apache HTTP server, Cherokee • Mozilla-JS, PBZIP2 • SPLASH-2: FFT, LU • Parameter used • Roughly 1/100 sampling rate
Failure Diagnosis Evaluation • Methodology • Using concurrency bug failures occurred in real-world • Each app. runs 3000 times on a multi-core machine • Add random sleep to get some failure runs • Sampling is enabled • Statistical debugging then return a list of predictors • Which predictor in the list can diagnose failure?
Failure Diagnosis Results (with sampling) FunRe Havoc Prev Capability
Runtime Overhead FunRe Havoc Prev Overhead
Conclusion • CCI is capable and suitable to diagnose many production-run concurrency bug failures. • Future predicates can leverage our effective sampling strategies. • Experiments confirm design tradeoff.
CCI Questions about ?
CCI Questions about ?
CBI on Concurrency Bug Failures thread 1 thread 2 … log_writer() { … log_writer() { To diagnose production-run concurrency bug failures, interleaving related events should be tracked!!! idx … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); CBI does not work! … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); L … return SUCCESS; Concurrency Bug from Apache HTTP Server … } …
CCI-PrevPredicate Instrumentation with Sampling if (gsample) { lock(glock); changed = test_and_insert(&cnt, curTid); record(I, changed); temp = cnt; unlock(glock); } else { temp = cnt; } [[ gsample = true; iset = curTid; lLength=gLength=0;]]?
CCI-PrevPredicate Instrumentation with Sampling if (gsample) { lock(glock); changed = test_and_insert(&cnt, curTid, &stale); changed = test_and_insert(&cnt, curTid); record(stale ? P1 : P2, changed); record(I, changed); temp = cnt; unlock(glock); gLength++; lLength++; if(( iset == curTid && lLength > lMAX) || gLength > gMAX) { clear (); iset= unusedTid; gsample= false; } } else { temp = cnt; [[ gsample = true; iset = curTid; lLength=gLength=0;]]? }
CCI-Havoc Predicate Instrumentation with Sampling if (sample) { changed = test(&cnt, cnt, &stale); record(stale ? P1 : P2, changed); temp = cnt; insert (&cnt, cnt); length++; if(length > lMAX) { clear (); sample = false; } No global lock used!!! } else { temp = cnt; [[ sample = true; length=0;]]? }
Failure Diagnosis Results (with sampling) FunRe Havoc Prev Capability