140 likes | 289 Views
LiteRace : Effective Sampling for Lightweight Data-Race Detection. Guy Martin, OSLab. Agenda. Introduction Background LiteRace Overview LiteRace Implementation Evaluation Limitations Conclusion. Introduction.
E N D
LiteRace:Effective Sampling for Lightweight Data-Race Detection Guy Martin, OSLab
Agenda • Introduction • Background • LiteRace Overview • LiteRace Implementation • Evaluation • Limitations • Conclusion [MaMN09] - LiteRace
Introduction • Data Race: Happens when multiple threads perform conflicting data accesses without proper synchronization operation. • Sampling: Act of gathering representative samples; Process of collecting specimens; Process of taking a small portion of something as a specimen [Babylon online dictionary] • Cold Region Hypothesis: Data races are likely to occur when a thread is executing a “cold” (infrequently accessed) region in the program • Hot Region: Frequently accessed region in program. [MaMN09] - LiteRace
Introduction: Happens-Before • The happens-before is a partial order on the events of a particular execution of a multithreaded program • ab if a and b are events from the same sequential thread of execution and a executed before b • ab if a and b are synchronizations operations from different threads such that the semantics of the synchronization dictates that a precedes b. • The relation is transitive, so if ab and bc, then ac • A data race is defined as a pair of accesses to the same memory location, where at least one of the accesses is a write, and neither one happens-before the other. [MaMN09] - LiteRace
Introduction: Happens-Before Time Thread 1 Thread 2 Time Thread 1 Thread 2 Lock L Lock L Write X Write X Unlock L Unlock L Data race on X Lock L Lock L Write X Write X Lock L Unlock L Unlock L Properly synchronized accesses to a memory location X. No data race No happens-before relation between the two write operation on location X [MaMN09] - LiteRace
Background • Problem: Dynamic race detectors incur high runtime overhead into analyzed programs. • RaceTrack: 2x to 3x slowdown • Intel Thread Checker: 200x overhead • Why?Happens-before relation and Lockset based algorithms check all accesses to a shared memory • Source of runtime Overhead in happens-before relation • Instrumentation of all memory and all the synchronization operations • Upkeep of metadata for each accessed memory location (usage of vector clocks) • Solution:LiteRace, a lightweight data races detector based on the cold-region hypothesis that samples and analyzes only selected portions of a program’s execution. [MaMN09] - LiteRace
Background - Motivation • Dynamic data races detection tools can not find all data races in a given program • They find data races only on thread interleavings and paths explored at runtime… …Some false negatives can then be tolerable • Sampling techniques provide a useful knob that allow users to trade runtime overhead for coverage. • Users can increase or decrease the sampling rate according to the analyzed program behavior. [MaMN09] - LiteRace
LiteRace Overview Time Thread 1 Thread 2 Lock L • Which events to log? • Synchronization operations along with logical timestamp to reflect happens-before relation • Reads and writes to memory in the program order Write X False data race reported on X Unlock L Lock L Lock L Write X Unlock L Any missed synchronization operation can result in missing edges in the happens-before graph but a selective sampling of memory accesses is acceptable. [MaMN09] - LiteRace
LiteRace Overview X86 function • 2 copies are created for each function • A Dispatch Check is inserted at the entry of each function • Initial sampling rate for every region is set to 100% • Whenever a region is accessed, its sampling rate is decreased until it reaches a lower bound • Separate sampling information is maintained for each thread • Sampling portion of code are logged and analyzed offline using happens-before algorithm LiteRace Dispatch Check (Sampler) When Cold Usually Un-instrumented copy Instrumented copy Original Code + Log Synchr. Ops Original Code + Log Synchr. Ops + Log Memory Ops LiteRace Instrumentation [MaMN09] - LiteRace
LiteRace Implementation Dispatch Function Decrement the Sampling Counter • LiteRace maintains a buffer for each thread containing 2 counters for each function • Frequency counter: tracks the number of times a function has been executed and determines the sampling rate • Sampling counter: determines when to sample a function • SyncVar uniquely identifies the synchronization object and a logical timestamp representing the order in which the thread perform operation on that objet. Sampling Counter = 0 ? Un-instrumented function Original code + log synchr. ops NO YES Instrumented function Original code + log synchr. & memory ops Sampling Counter Is set to a new value based on current sampling rate LiteRace Code Instrumentation Happens-before relation tracing [MaMN09] - LiteRace
LiteRace Implementation • Tracking Happens-before relation with SyncVar T2 T1 It’s guaranteed that an Unlock operation on a particular mutex will have a smaller timestamp than a subsequent Lock operation on that same mutex in another thread. ts3 ts1 < ts4 ts2 [MaMN09] - LiteRace
Evaluation • LiteRace has been implemented on Windows Server 2003 with two dual-core AMD Opteron and 4GB of RAM • LiteRace is implemented using Microsoft’s Phoenix compiler. It instruments directly the x86 executable file of a program. • LiteRace found about 70% of data races on its benchmarks when incurring about 28% performance overhead for unlogged execution and is up to 25 times faster than an implementation that logs all memory operations. [MaMN09] - LiteRace
Limitations • LiteRace introduces false negatives by analyzing just samples of runtime interleaving. • The overhead introduced (28%) still high and unsuitable for certain applications. • The Dispatch Check is inserted at the entrance of every function. This function need to be as small and efficient as possible. [MaMN09] - LiteRace
Conclusion • Dynamic race detection tools incurs high overhead in analyzed programs • LiteRace is a sampling-based technique which tries to ameliorate the runtime performance overhead of dynamic races detectors. • By logging less than 2% of memory accesses, LiteRace can detect more than 70% of data races using an offline happen-before race detector. • The sampling-based technique offers a knob in term of sampling rate, which the programmer can use to trade-off performance for data-race coverage. [MaMN09] - LiteRace