Finding and Fixing Bugs in Software

Finding and Fixing Bugs in Software Stefan Muller 15-740/18-740 Oct. 17, 2012

Problem and Solutions • Problem: Software is buggy! • More specific problem: Want to make sure software doesn’t have bad property X. • X could be: double frees, uses freed memory, race condition, buffer overflow, security vulnerability… • 2 solutions: • Static analysis – analyze the code to see if it can have property X (last paper, sort of) • Dynamic analysis – watch the code as it runs and stop it if it shows property X (first 3 papers)

Watchdog • Prevents use-after-free errors by dynamicallyadding instructions. • For each pointer, storesan identifier in hardware. • When a pointer is dereferenced, check that the identifier is valid. • When a pointer is freed, invalidate its identifier. • Optimizations for fast checking of identifiers.

RADISH • Race Detection in Software and Hardware • Uses vector clocks to track which reads and writes to memory are guaranteed to happen before now in all threads. • Each core stores a clock for every core including itself. • Each byte(*) of memory is associated with clocks showing when it was last read and written. • Cached along with the contents of memory. Stored in software when data is evicted.

ParaLog • Associate a lifeguard with a running thread • Lifeguard checks execution of the thread for bugs • Run lifeguard in parallel on another core • Running many threads+lifeguardsin parallel causes problems. • Atomicity of accesses to lifeguardmetadata • Out-of-order execution: in some cases, it matters that events that happen first are seen first by lifeguard. Thread 1 Thread 2 x = *p free(p)

Common Themes • How to reduce the penalty to access metadata? • Caching! • Use existing architecture features • RADISH uses cache coherence messages to update clocks. • ParaLog uses cache coherence messages to ascertain dependences between events. • Modify other features to aid analysis • Watchdog has a separate cache for identifier info. • RADISH adds additional logic and hardware state to store and compute with per-core clocks. • ParaLog maintains a TLB mapping commonly used application data to the location of related metadata.

Summary

Tradeoffs (Discuss!) • How much metadata to store • Hardware vs. Software • Hardware is fast, but software is flexible and allows a reduction in space usage. • We’ve seen ways to store some metadata in hardware, but use a different system (maybe software) when that overflows. • Where to run checks • Use a separate core and run application in ~real-time or instrument application with runtime checks?

ConSeq • Works backward from (potential) failures to find concurrency errors that trigger them. • Identify failure sites (e.g. assert failures, bad outputs…) Static • Identify a critical read that affects the value of local memory at that failure site. Static • Find alternative interleavings that might result in different values at critical read by observing a (probably correct) run of the program. Dynamic • Use vector clocks to identify other writes that may produce such alternate values

ConSeq Thread 1 Thread 2 Trigger Other write to be interleaved p = malloc(sizeof(some_t)); for (inti = 0; i < 5; i++) a[i] = 0; assert (p != NULL); p = NULL; Failure

Static vs. Dynamic • ConSeq is only run during testing, so no production runtime overhead. • However, give up two properties: • Soundness: no false negatives • Completeness: no false positives • Can usually only have one of these anyway. ConSeq instead seeks to balance both with performance. • Is this a good tradeoff?

Finding and Fixing Bugs in Software