140 likes | 318 Views
DRF x. Dan Marino Abhay Singh Todd Millstein Madan Musuvathi Satish Narayanasamy. UC Los Angeles. University of Michigan. A Simple and Efficient Memory Model for Concurrent Programming Languages. UC Los Angeles. MSR, Redmond. University of Michigan.
E N D
DRFx Dan Marino Abhay Singh Todd Millstein MadanMusuvathi SatishNarayanasamy UC Los Angeles University of Michigan A Simple and Efficient Memory Modelfor Concurrent Programming Languages UC Los Angeles MSR, Redmond University of Michigan
State of the Art:SC for Data Race Free Memory Models • sequential consistency [Lamport 79] • intuitive for programmers • limits compiler and hardware optimizations • DRF0 [Adve&Hill 90] models balance performance and ease of programming • SC behavior guaranteed for race-free programs • most optimizations allowed • e.g. Java and C++0x memory models[Manson et al. 2005] [Boehm et al. 2008]
Program Behavior under DRF0 X* x = null; bool init = false; atomic // Thread t // Thread u A: x = new X();C: if(init) B: init = true;D: x->f++; A: x = new X(); C: if(init) D: x->f++; B: init = true; Optimizing Compiler and Hardware NullPointer! B doesn’t depend on A. It might be faster to reorder them!
Deficiencies of DRF0 weak or no semantics for racy programs unintentional data races easy to introduce problematic for debuggability programmer must assume non-SC behavior for all programs safety compiler correctness [Boehm et al., PLDI 2008] optimization + data race = jump to arbitrary code! Java must maintain safety at the cost of complexity [Ševčík&Aspinall, ECOOP 2008]
Our Solution: The DRFxMemory Model Memory Model Exception data race Programming Error Fatal Runtime Error • debuggabilitySC for all executions • safetyhalt program before non-SC behavior exhibited • compiler correctnessmost sequentially-valid optimization permitted
DRFx Allows Relaxed Data Race Detection source program observed behavior data race free SC Behavior simplify detection MM Exception has data races precise runtime data race detection is slow in software and complex in hardware[Flanagan & Freund 2009] [Prvulovic & Torrelas 2003]
Detecting an SC Violation X* x = null; bool init = false; // Thread t // Thread u A: x = new X();C: if(init) B: init = true;D: x->f++; Races need not be reported between regionsthat do not execute concurrently!region serializable for compiled ⇒ SC for source MMException region fence B: init = true; region fence C: if(init) D: x->f++; region fence A: x = new X(); region fence data race,but no SC violation Insight: compiler can communicate to runtime the regions in which reordering may have occurred runtime must detect conflicting accessesin regions that execute concurrently.
DRFxCompiler and Runtime Requirements • DRFx Compiler • communicate regions in which optimizations were made by using fence instructions • synchronization in their own region • no speculative memory accesses • DRFx Execution Environment • trap on conflicting accesses in concurrent regions • global order on region fences • memory order consistent with fence order
Formalization • compiler requirements • how program is split into regions • permitted optimizations • all non-speculative, sequentially valid optimizations • execution environment requirements • when conflict may/must be reported • memory orderings allowed w.r.t. fences • prove • no MM exception ⇒ SC behavior for source program • MM exception ⇒ data race in source program
Efficient & Simple Conflict Detection • perform detection in hardware • like transactional memory hardware – but simpler • no rollback • we control region boundaries • compiler bounds number of memory locations dynamically accessed in a region • limits optimization opportunities • distinguish “bounding” region fence • hardware can merge regions separated by a bounding fence when resources available
Compiler Implementation • built conservative DRFx-compliant compiler • LLVM [Lattner & Adve 2004] • naïve bounding analysis • bounding fence at all loop back edges • disable speculative optimizations • measured performance • PARSEC benchmark suite • stock x86 hardware – no architectural simulator
DRFxOverhead on Parsec Benchmarks slowdown over unmodified, fully optimizing LLVM
Related Work • memory modelse.g. [Lamport 1979], [Dubois et al. 1986], [Adve & Hill 1990] • hardware race detection[Adve et al.1991], [Muzahid et al. 2009], [Prvulovic & Torrelas 2003] • software race detection e.g. [Yu et al. 2005 ],[Flanagan & Freund 2009],[Elmas et al. 2007] • detecting SC violations [Gharachorloo&Gibbons, SPAA 1991] • conflict exception [Lucia et al., ISCA 2010] • stronger guarantee : serializability of sync-free regions • requires unbounded detection scheme • focused on hardware
DRFx Conclusion regions lightweight form of data race detection MM Exception easy-to-understand programmer gets understandable behavior for all programscompiler may perform most sequentially valid optimizations within regions efficient straightforward hardware supportcompiler restrictions ⇒ only 0% - 7% slowdown