1 / 24

RELAY: Static Race Detection on Millions of Lines of Code

Learn about RELAY, a static tool that analyzes programs before they run to detect data races. RELAY is scalable and has been used to analyze the Linux kernel, finding 53 races in a sample of 149 warnings.

seatonj
Download Presentation

RELAY: Static Race Detection on Millions of Lines of Code

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 1 RELAY: Static Race Detectionon Millions of Lines of Code speaker Jan Voung, Ranjit Jhala, and Sorin Lerner UC San Diego

  2. 2 Definition of data race • It is an event • between 2 threads, where there are • unordered accesses to the same memory location • and at least one of the accesses is a write thread 2 thread 1 … g = V; … … temp = g; …

  3. 3 Bugs from races • Example bug: • incorrect resource bounds thread 2 thread 1 …; free(); size = size - n; … … temp = size; …

  4. 4 RELAY against data races • RELAY finds data races • RELAY is a static tool: • analyzes the program before it runs • RELAY is scalable: • analyzed the Linux kernel (4.5 million LOC) • in 5 hours on 32 cpus • Found 53 races in a sample of 149 warnings

  5. 5 Outline • Introduction • Computing locksets & recording accesses • Relative locksets • Guarded accesses • Experiments with Linux • Categorizing warnings: false positives • Filters targeting categories

  6. 6 Checking with Locksets • Locks are a mechanism for mutual exclusion • Only one thread holds a particular lock at a time. • No race if the same lock must have been acquired for each of two different shared accesses common locks held locks held thread 1 thread 2 lock(l); temp = g; unlock(l); … read(g); lock(l); g = 0; … l l l l l

  7. 7 A more realistic (but simple) example work (void *d) { o = d->priv; lock(o->lock); read_stats(o); } No race read_stats(x) { if (x->f1++ < 10) { unlock(x->lock); x->stats = ...; } else unlock(x->lock); } read/write with lock Race write without lock

  8. 8 Key components of RELAY work (void *d) { o = d->priv; lock(o->lock); read_stats(o); } GOAL: scalability KEY: modularity work ( ) { … } read_stats(x) { if (x->f1++ < 10) { unlock(x->lock); x->stats = ...; } else unlock(x->lock); } work read_stats read_stats ( ) {…}

  9. 9 Key components of RELAY 4) Symbolic execution: what is the “same” memory location? Normalize to globals and formals. work (void *d) { o = d->priv; lock(o->lock); read_stats(o); } L+ = {d->priv->lock}, L- = {} 1) Relative locksets: locks acq./rel. in function – caller handles locks before L+: MUST have been acq. L-: MAY have been rel. read_stats(x) { if (x->f1++ < 10) { unlock(x->lock); x->stats = ...; } else unlock(x->lock); } L+ = {}, L- = {} L+ = {}, L- = {x->lock} 2) Guarded accesses: pair accesses with relative locksets to catch races L+ = {}, L- = {x->lock} 3) Summaries

  10. 10 How RELAY runs 1) Assume symbolic execution ran. 2) Compute relative locksets 3) Compute guarded accesses work (void *d) { o = d->priv; lock(o->lock); read_stats(o); } read_stats(x) { if (x->f1++ < 10) { unlock(x->lock); x->stats = ...; } else unlock(x->lock); } x->f1: L+ = {}, L- = {} L+ = {}, L- = {} x->stats: L+ = {}, L- = {x->lock} L+ = {}, L- = {x->lock} summary: L+ = {}, L- = {x->lock} x->f1: L+ = {}, L- = {} L+ = {}, L- = {x->lock} x->stats: L+ = {}, L- = {x->lock}

  11. 11 How RELAY runs work (void *d) { o = d->priv; lock(o->lock); read_stats(o); } read_stats(x) summary: L+ = {}, L- = {x->lock} x->f1: L+ = {}, L- = {} summary: x->stats: L+ = {}, L- = {x->lock} L+ = {}, L- = {x->lock} x->f1: L+ = {}, L- = {} x->stats: L+ = {}, L- = {x->lock}

  12. 12 Applying summaries work (void *d) { o = d->priv; lock(o->lock); read_stats(o); } L+ = {}, L- = {} BEFORE L+ = {d->priv->lock}, L- = {} AFTER L+ = {}, L- = {d->priv->lock} summary: L+ = {}, L- = {d->priv->lock} read_stats(x) summary: DIFFERENCE L+ = {}, L- = {x->lock} x->f1: L+ = {}, L- = {} x->stats: L+ = {}, L- = {x->lock}

  13. 13 Applying summaries work (void *d) { o = d->priv; lock(o->lock); read_stats(o); } L+ = {}, L- = {} BEFORE L+ = {d->priv->lock}, L- = {} L+ = {d->priv->lock}, L- = {} L+ = {}, L- = {d->priv->lock} summary: L+ = {}, L- = {d->priv->lock} AFTER d->priv->f1: L+ = {d->priv->lock}, L- = {} d->priv: L+ = {}, L- = {} d->priv->f1: L+ = {d->priv->lock}, L- = {} read_stats(x) { d->priv->stats: L+ = {}, L- = {d->priv->lock} summary: L+ = {}, L- = {x->lock} DIFFERENCE x->f1: L+ = {}, L- = {} x->stats: L+ = {}, L- = {x->lock}

  14. 14 Checking for Races work (void *d) { o = d->priv; lock(o->lock); read_stats(o); } work (void *d) L+ = {}, L- = {} L+ = {d->priv->lock}, L- = {} L+ = {d->priv->lock}, L- = {} L+ = {}, L- = {d->priv->lock} summary: L+ = {}, L- = {d->priv->lock} summary: d->priv: L+ = {}, L- = {} summary: d->priv->f1: L+ = {d->priv->lock}, L- = {} d->priv: L+ = {}, L- = {} d->priv: L+ = {}, L- = {} d->priv->stats: L+ = {}, L- = {d->priv->lock} d->priv->f1: L+ = {d->priv->lock}, L- = {} d->priv->f1: L+ = {d->priv->lock}, L- = {} d->priv->stats: L+ = {}, L- = {d->priv->lock} d->priv->stats: L+ = {}, L- = {d->priv->lock} row 1: reads only => no race row 2: common lock => no race row 3: no common lock => race

  15. 15 Modular Unsoundness • Pointer-arithmetic corner cases • Accesses in assembly code • Function pointers • Not enforcing must-alias for lockset intersection • Filters Revisit each and improve

  16. 16 Outline • Introduction • Computing locksets & recording accesses • Relative locksets • Guarded accesses • Experiments with Linux • Categorizing warnings: false positives • Filters targeting categories

  17. 17 Linux experiments • 5000+ warnings • Sample 90 and categorize • Design and apply filters to zoom-in on races

  18. 18 Categories of false positives • Initialization: thread allocates object and initializes it before sharing • Aliasing: mixed up different data structures • Unsharing: objects removed from shared structures • Recursive locks: “Big kernel lock” • Non-lock synchronization: spawn, wait, signal, etc. • Conditional locking: locking correlated with return value, conditionals, etc.

  19. 19 Example filter: Thread “ownership” • To reduce initialization false positives: • remove accesses originating from the thread that allocated the object. thread 3 thread 1 thread 2 x = malloc()‏ init(x)‏ share(x)‏ update(x)‏ x = get()‏ update(x)‏ x = get()‏ update(x)‏ filtered

  20. 20 Before filters: 11% data races

  21. 21 After filters: 80% data races

  22. 22 The absolute numbers initialization non-aliasing, unsharing recursive locks non-lock sync.

  23. 23 Related Work • Dynamic techniques • Locksets and extensions [Savage et al. 97, Choi et al. 02, Yu et al. 05, Elmas et al. 07] • Atomicity [Flanagan et al. 04, Wang et al. 06] • Benign vs. harmful [Narayanasamy et al. 07] • Static techniques for Java • Type systems [Flanagan et al. 99, Boyapati et al. 02] • Aliasing, must-not aliasing [Naik et al. 06, 07] • Static techniques for C • Scalability, ranking [Engler et al. 03] • Aliasing and sharing [Pratikakis et al. 06, Kahlon et al. 07]

  24. 24 Summary • Relative locksets: Modular summary-based analysis • Can analyze 46K functions of Linux kernel • modular => parallelizable • on a grid of 32 cpus: approx. 5 hours • Modular unsoundness • finds 53 races (or 25 after all filters) • future work: better analyses, better filters • whether races are benign or not, is another question!

More Related