330 likes | 421 Views
Evaluating the Impact of Thread Escape Analysis on Memory Consistency Optimizations. Chi-Leung Wong , Zehra Sura , Xing Fang , Kyungwoo Lee , Samuel P. Midkiff , Jaejin Lee and David Padua University of Illinois at Urbana-Champaign IBM T.J. Watson Research Center Purdue University
E N D
Evaluating the Impact of Thread Escape Analysis on Memory Consistency Optimizations Chi-Leung Wong, Zehra Sura, Xing Fang, Kyungwoo Lee, Samuel P. Midkiff, Jaejin Lee and David Padua University of Illinois at Urbana-Champaign IBM T.J. Watson Research Center Purdue University Seoul National University
Outline • Memory Models • The Pensieve System • Escape Analyses • Qualitative Impact of Escape Analyses on Delay Set Analysis and Synchronization Analysis • Experimental Results • Conclusion
Memory Models • Consider the following code segments: • Thread 1 : data = 100; data_ready = true; • Thread 2 : while (!data_ready); t = data; • Can t == 0? • Yes if reordering happens • Thread 1 : data_ready = true; data = 100; • Can be done by compiler and hardware • Memory models tell us the answer • Sequential Consistency says no
Objective of the Pensieve Project • Sequential consistency (SC) on top of Intel x86 memory models • Implementation based on Jikes RVM • All analyses done in JIT time • Need to minimize both analysis and application execution time
Enforcing SC • Done by enforcing memory accesses orders • not all orderings need to be enforced • only enforce orders really needed • Delay Set Analysis (DSA) [SS88] computes such orders • Our approach : Approximation of DSA • Orders enforced by inserting fences in generated code
x x’ x y x y’ y x’ Original DSA • Program edge • x executes before y in the same thread • Conflict edge • x and x’ conflict accesses • Order of access affects program outcome • In this paper: • to the same memory location • one of them is a write
x y’ Not mixed x x y’ Not minimal z y x’ x y y x’ Mixed Minimal y Original DSA (Cont’d) • Critical cycle • Minimal • Cannot form smaller cycle using subset of nodes • Mixed • Contains both edges • Enforce program edges on a critical cycle
Approximate DSA • Approximate of critical cycle • x precedes y • Conflict accesses for • x and x’ • y and y’ • y’ precedes x’ • Enforce program edges on approx critical cycle x y’ y x’
Source Program Thread Escape Analysis Synchronization Analysis Program Analyses Delay Set Analysis Program Analyses Orders to Enforce Code Optimizations FenceInsertion & Optimization Target Program The Pensieve System
Escape Analyses • Identify objects which may be accessed by two or more threads • Output: set of variables • {v | v points to an object may be accessed by >= 2 threads}
x Impact on Delay Set Analysis • x, y, y’, x’ must be escaping accesses • Cannot form a cycle if one of them is not escaping access • Fewer escaping accesses implies fewer possible pairs of (x,y) • Fewer checks to be done • Fewer delays y’ y x’
Impact on Synchronization Analysis • Synchronization analysis reduces number of conflict edges considered by DSA • Consider synchronized construct • Calls to start() and join() • Our system only consider t1.join() • if it can match some t2.start() call • t1 and t2 are not escaping • More precise escape info • more join() calls matched • more precise DSA result
Escape Analyses Comparison • In this study, we compare 4 algorithms: • Connectivity Analysis (Pensieve) • Field Base Analysis (Pensieve) • For comparison purposes • Bogda’s Analysis • Removing Unnecessary Synchronization in Java. (OOPSLA 1999) • Ruf’s Analysis • Effective Synchronization Removal for Java. (PLDI 2000)
Connectivity Escape Analysis • An object is escaping if both • Reachable by more than one thread due to two possible cases: • Reachable by a static field • Passed from a thread constructor • Accessed by more than one thread • Do not assume this escaping in run() by default • Field insensitive for most memory accesses • I.e. do not distinguish x.f vs x.g • Except accesses to Runnable objects
Field Base Escape Analysis • An object is escaping if • Reachable from a static field • Passed from a thread constructor • Do not assume this escaping in run() by default • Similar to connectivity base analysis, • Field sensitive • Suppose O1, O2 of same type • O1.f different from O1.g • O1.f same as O2.f
Bogda’s Escape Analysis • An object is escaping if it is reachable: • By a static field • By a Runnable object • Via more than 1 field reference
Ruf’s Escape Analysis • An object is escaping if both • Reachable from either • A static field or • A Runnable object • Synchronized by more than one thread • Adapted for our own use • “synchronized” “accessed”
Experimental Settings (Machine) • Intel (Dell PowerEdge 6600 SMP) • 4 Intel hyperthreaded 1.5Ghz Xeon processors • with 1MB cache each • 6G system memory.
Experimental Settings (Software) • Original • default Jikes RVM implementation • base case for performance comparison • Enforcing SC • Empty • Arg Escaping • Connectivity analysis • Field-base analysis • Bogda’s analysis (bogda) • Ruf’s analysis
Measurements • Escape Analysis Time • Impact on Delay Set Analysis Time • Impact on Synchronization Analysis Time • Slowdown due to fence insertion • Delay Set Analysis only • Delay Set Analysis with Synchronization Analysis
Escape+DSA+ Synchronization Analysis Time / Compilation Time
Conclusions • Evaluate interaction between escape analysis and synchronization/delay set analysis • Montecarlo and jbb motivates enabling field sensitivity for connectivity base analysis