230 likes | 246 Views
"AFix is an automated tool for fixing atomicity-violation bugs in concurrent programs. It statically adds locks to remove buggy interleavings and provides best-effort patches for correctness, performance, and readability."
E N D
AFix AutomatedAtomicity-ViolationFixing Guoliang Jin, Linhai Song, Wei Zhang, Shan Lu, and Ben Liblit University of Wisconsin–Madison 1
Needs to Find Concurrency Bugs Needs to Find and Fix Concurrency Bugs • Multicore era is coming already here • Programmers struggle to reason about concurrency • More and more concurrency bugs • Many concurrency bugs can be automatically detected • But bugs need to befixed Thread 1 Thread 2 if (ptr!= NULL) { ptr->field = 1; } ptr = NULL; Segmentation Fault 2
Bug-fixing • Bug-fixing process is lengthy and resource consuming • Nearly 70% of patches are buggy in their first releases • Automated fixing is desired, but difficult in general Understand bug1 … … Understand bugn Review & test the patch Correctness Generate a patch Understand a bug Performance Readability 3
AutomatedConcurrency-Bug Fixing • Concurrency bugs are feasible to be fixed automatically • Program is correct in most interleavings. • Only need to remove some bad interleavings. Thread 1 Thread 1 Thread 1 Thread 2 Thread 2 Thread 2 ptr = NULL; if (ptr!= NULL) { ptr->field = 1; } if (ptr!= NULL) { ptr->field = 1; } ptr = NULL; if (ptr!= NULL) { ptr->field = 1; } ptr = NULL; Segmentation Fault 4
AFix: AutomatedAtomicity-Violation Fixing • Why atomicity-violation bugs? • One of the most common types of concurrency bug • Strategy • Statically adding locks to remove buggy interleavings. • Goal • Automate the whole bug-fixing process • Provides best-effort atomicity-violation patches • Correctness • Performance • Readability 5
AFix Overview Input from CTrigger Bug understanding Manual Bug Fixing Progress 6
CTrigger Bug-Detector Review • A single-variable atomicity-violation detection & testing tool • It reports a list of buggy instruction triples • Abbreviated as {(p1, c1, r1), …, (pn, cn, rn)} Thread 1 Thread 2 if (ptr!= NULL) { ptr->field = 1; } previous access • current access ptr = NULL; • remote access 7
AFix Overview Input from CTrigger adding runtime support patch testing patch1 (p1, c1, r1) … ... … (pn, cn, rn) merged patch1 … … … … merged patchm patchn Bug understanding Patch testing Patch generation Manual Bug Fixing Progress 8
Outline • Motivation • Overview • AFix • One bug patching • Patch Merging • Runtime support • Patch testing • Evaluation • Conclusion • Motivation • Overview • AFix • One bug patching • Patch Merging • Runtime support • Patch testing • Evaluation • Conclusion • Motivation • Overview • AFix • One bug patching • Patch Merging • Runtime support • Patch testing • Evaluation • Conclusion 9
One Bug Patching (p, c, r) Patching • Make the p-c code region mutually exclusive with r • Put pand cinto a critical section • Put r into a critical section • Select or introduce a lock for the two critical sections p r c 10
Put p and cinto a Critical Section: naïve • A naïve solution • Add lock on edges reaching p • Add unlock on edges leaving c • Potential new bugs • Could lock without unlock • Could unlock without lock • etc. p p p p c c c c 11
Put pand c into a Critical Section: AFix • Assume pand care in the same function f • Step 1: find protected nodes in critical section • In f’s CFG, find nodes on any p c path • Step 2: add lock operations • unprotected node protected node • protected node unprotected node • Avoid those potential bugs mentioned p c 12
pand cAdjustment • p and c adjustment when theyare in different functions • Observation: people put lock and unlock in one function • Find the longest common prefix of p’s and c’sstack traces • Adjust p and caccordingly void close() { … log = CLOSE; } void open() { … log = OPEN; } void newlog() { … close(); open(); … } void newlog() { … p: close(); c: open(); … } p: c: close() newlog() … open() newlog() … 13
(p, c, r) Patching: put r into a critical section • Lock-acquisition before r, lock-release after r • Only if r cannot be reached from the p–c critical section fpc() { lock(L1) p ... r … c unlock(L1) } case 1 fpc() { lock(L1) p ... foo() {…r} … c unlock(L1) } case 2 r’s call stack: …fpcfoo …r 14
(p, c, r) Patching: select or introduce a lock • Use the same lock for the critical sections • Lock type: • Lock with timeout : in case of potential new deadlock • Reentrant lock : in case of recursion • Otherwise: normal lock • Lock instance: • Global lock instances are easy to reuse 15
Patch Merging • One programming mistake can lead to multiple bug reports • They should be fixed all together void buf_write() { inttmp = buf_len+ str_len; if (tmp > MAX) return; memcpy(buf[buf_len], str, str_len); buf_len= tmp; } p1 c1 p2 r1 c2, r2 p1 p2 c1 r1 c2, r2 • Too many lock/unlock operations • Potential new deadlocks • May hurt performance and readability 16
Patch Merging: redundant patch • Redundant patch, when p1–c1, p2–c2critical sections • are in the same function: redundant when one protected region is a subset of the other • are in different functions: consulting the stack trace again lock(L1) p1 lock(L2) p2 c2 unlock(L2) c1 unlock(L1) lock(L1) p1 p2 c2 c1 unlock(L1) lock(L1) r1 unlock(L1) lock(L2) r2 unlock(L2) lock(L1) r2 unlock(L1) 17
Patch Merging: related patch • Related patch • Merge if p, c, or r is in some other patch’s critical sections lock(L1) p1 p2 c1 c2 unlock(L1) lock(L1) p1 lock(L2) p2 c1 unlock(L1) c2 unlock(L2) lock(L1) r1 unlock(L1) lock(L2) r2 unlock(L2) lock(L1) r2 unlock(L1) 18
Runtime Support and Testing • Runtime support to handle deadlock • Lightweight patch-oriented deadlock detection • Whether timeout is caused by potential deadlock? • Only detect deadlocks caused by the patches • Has low-overhead, and suitable for production runs • Help patch refinement • Traditional deadlock detection • In-house patch testing 19
Outline • Motivation • Overview • AFix • One bug patching • Patch Merging • Runtime support • Patch testing • Evaluation • Conclusion • Motivation • Overview • AFix • One bug patching • Patch Merging • Runtime support • Patch testing • Evaluation • Conclusion 20
Evaluation: Overall Patch Quality • Patched failure rates: 0% (except PBZIP2 and FFT) • Patched overheads: <0.6% (except PBZIP2) • With timeout triggered deadlock detection 21
Conclusion • Atomicity violations are feasible to be fixed automatically • By removing bad interleavings • Must be careful in the details • Use some heuristics, and excellent results in practice • Completely eliminates detected bugs in targeted class • Overheads too low to reliably measure • Produces small, simple, understandable patches • Future research should do detector and fixer co-design 22
*disclaimer from Afix: “I represent humans’ efforts towards fixing the world automatically using tools. However, the world is so imperfect that I do not know whether the world is Fully fixable, thus I make no 100% guarantee.” Questions about AFix*? 23