Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics

Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics Shan Lu, Soyeon Park, EunsooSeo and Yuanyuan Zhou Appeared in ASPLOS’08 Presented by Michelle Goodstein LBA Reading Group 3/27/08

Introduction • Multi-core computers are common • More programmers are having to write concurrent programs • Concurrent programs have different bugs than sequential programs • However, without a study, hard to know what those bugs are • First real-world study of concurrency bugs

Introduction • Knowing the types of concurrent bugs that actually occur in software will: • Help create better bug detection schemes • Inform the testing process software goes through • Provide information to program language designers

Introduction • Current state of affairs • Repeating concurrent bugs is difficult • Test cases are critical to being able to diagnose a bug • Most detection research focuses: • data races • deadlock bugs • some new work on detecting atomicity violations • Few studies on real world concurrency bugs • Most use programs that were buggy by design for the study • Most studies on bug characteristics focus on non-concurrent bugs

Methodology • 4 representative open-source applications: • MySQL • Apache • Mozilla • OpenOffice • Each application has • 9-13 years of development history • 1-4 million lines of code

Methodology • Randomly selected bugs from bug databases that contained at least one keyword related to concurrency (eg “race”, “concurrency”, “deadlock”, “synchronization”, etc.) • From these, randomly choose 500 bugs that have • Root causes explained well and in detail • Source code available • Bug fix info available

Methodology • Remove any bugs not truly caused by concurrency • Result: 105 concurrency bugs • Separate study of deadlock and non-deadlock bugs

Methodology • Evaluated bugs in 3 dimensions • Bug pattern: {atomicity-violation, order-violation, other} • Manifestation: required conditions for bug to occur, # threads involved, # variables, # accesses • Bug fix strategy: Look at final patch, mistakes in intermediate patches, and whether TM can help • Results organized as a collection of findings

Motivation • 34/105 concurrency bugs cause program crashes • 37/105 concurrency bugs cause programs to hang • Concurrency bugs are important

Bug Patterns

Findings: Bug Patterns • Atomicity Violation • Order Violation

Findings: Bug Patterns • Most (72/74) of the examined non-deadlock concurrency bugs are either atomicity-violations or order-violations • Focusing on atomicity and order-violations should detect most non-deadlock concurrency bugs • In fact, 24/74 are order violations • Since current tools don’t address order-violation, new tools must be developed

Bug Manifestations

Findings: Bug Manifestations • Most (101/105) bugs involved ≤ 2 threads • Most communication among a small number of threads • Enforcing certain partial orderings among a small number of threads can expose bugs • Heavy workloads can increase competition for resources, and make it more likely to observe a partial ordering that causes a bug • Pairwise Testing can find many bugs

Findings: Bug Manifestations • Some (7/31) bugs experience deadlock bugs with only 1 thread! • Easy to detect/avoid

Findings: Bug Manifestations • Many (49/74) non-deadlock bugs involve 1 variable. However, 34% involve ≥ 2 variables • Focusing on 1 variable is a good simplification • However, new tools also necessary to discover multivariable concurrency bugs

Findings: Bug Manifestations • Most (30/31 ) deadlock bugs involved ≤ 2 resources • Pairwise testing of order among obtained and released resources should help reveal deadlocks

Findings: Bug Manifestations • Most (92%) bugs manifested if enforced certain partial orderings among ≤ 4 memory accesses • Testing small groups of accesses will be polynomial time and expose most bugs

Bug Fixes

Findings: Bug Fixes • Adding/changing locks only helps minority (20/74) non-deadlock concurrency bug fixes • Locks aren’t enough to fix all concurrency bugs. • Locks don’t promise ordering, just atomicity • Addition of locks can hurt performance or create new, deadlock bugs

Findings: Bug Fixes • Most common fix (19/31) to deadlock bugs allows 1 thread to ignore acquiring a resource, like a lock • This may get rid of deadlock bugs, but create other non-deadlock bugs • Code may no longer be correct

Bug fixes: Buggy Patches • 17/57 Mozilla bugs have ≥ 1 buggy patch • On average, release .4 buggy patches for every final correct patch • Of 23 distinct buggy patches for the 17 bugs: • 6 decrease probability of occurrence but do not eliminate original bug • 5 create new concurrency bugs • 12 create new non-concurrency bugs

Findings: Bug fixes • In many (41/105) cases, TM can help avoid concurrency bugs

Findings: Bug fixes • Also in many cases (44/105), TM might be able to help with concurrency bugs • Need to allow long regions, rollback of I/O, strange “nature” of the code

Findings: Bug fixes • In 20/105 cases, TM provides little help • TM cannot help with many order-violation bugs • While TM could be useful in preventing concurrency bugs, it will not fix all of them

Conclusion • First real-world concurrent bug study • Multiple findings on • Type of concurrency bugs • Conditions for manifestation • Techniques for fixing concurrent bugs • Several heuristics proposed for: • Bug detection • Testing • Language Design (ie, TM) • Future work can focus on detecting common types of errors • Multi-variable bugs • Order violation bugs • Multiple-access bugs

Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics

Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics

Presentation Transcript