320 likes | 514 Views
Link-Time Path-Sensitive Memory Redundancy Elimination. Manel Fern á ndez and Roger Espasa {mfernand,roger}@ac.upc.es Computer Architecture Department Universitat Polit è cnica de Catalunya Barcelona, Spain. Motivation. The memory “gap” Processor speed increases faster than memory speed
E N D
Link-Time Path-Sensitive Memory Redundancy Elimination Manel Fernández and Roger Espasa {mfernand,roger}@ac.upc.es Computer Architecture Department Universitat Politècnica de Catalunya Barcelona, Spain
Motivation • The memory “gap” • Processor speed increases faster than memory speed • L1-cache latency continues to increase • Memory operations remain a significant bottleneck • Memory redundancy • Instructions that repeatedly access the same location • Lots of memory operations are redundant • Hardware designers exploit memory redundancy • E.g., caches take advantageof temporal reuse • The compiler must be very aggressive in memory optimizations
Memory redundancy • Memory instructions that repeatedly access the same location • Lots of memory operations are redundant • Sources of redundancy • Source code structure • Programmers introduce redundancy • Traditional compilation • Separate compilation units • Limitations in the compilation model • Code generation introduces redundancy • What percentage of memory operations are redundant at run time? redundancy source intervening store … = *p; if ( … ) { *q = … … = *p; } redundant load
Dynamic memory redundancy Load redundancy Store redundancy
Talk outline • Motivation • Memory redundancy elimination (MRE) • Evaluation • Summary
Memory redundancy elimination (MRE) • Removal of memory instructions that repeatedly access the same location • Targeted at redundancy type • Load redundancy elimination (LRE) in a path-sensitive fashion • Based on path-sensitive memory disambiguation • Store redundancy elimination (SRE) • Targeted at redundancy distance • Eliminating close/distant redundancy • In the context of a binary optimizer • Overcome limitations of traditional compilers • Need to deal with “executable code” problems
Load redundancy elimination (LRE) Fundamental problems Alias analysis for disambiguation Liveness analysis for register bypassing Cost-benefit analysis for applying LRE Profile information is needed Eliminating close redundancy Within extended basic blocks (EBBs) Eliminating distant redundancy Intraprocedural dataflow analysis [HorspoolHo97] For fully/partially-redundant loads Redundancy on all/some paths Partial-LRE requires insertion of speculative loads ... I1 load (p0), r1 move r1 , r0 ... ... I2 load (p0), r2 ... move r0 , r2 --------------- Hot Path R. N. Horspool and H. C. Ho. Partial redundancy elimination driven by a cost-benefit analysis, CSSE’97
Memory disambiguation • Register use-def chains • Symbolic descriptors for every use • Disambiguation by instruction inspection • Fails on path-sensitive redundancies • Need to deal withpath-sensitive information • Partial-LRE is not sufficient either ... I0def p0 ... I1 load (p0),r1 ... ... I3 add p0,8,p0 ... IØØ-def p0 ... I2 load (p0),r2 ... √ ?
Path-sensitive redundancy • Path-sensitive memory disambiguation • Established for only a subset of all the possible paths • Subsumes generic disambiguation • Path-sensitive LRE • Partial-LRE is now adapted for dealing with path-sensitive redundancies • Availability on edge (AVEDGij) ... I0def p0 ... I1 load (p0),r1 move r1, r0 ... ... I3 add p0,8,p0 load (p0),r0 ... IØØ-def p0 ... move r0, r2 I2 load (p0),r2 ... √ x ---------------
Store redundancy elimination (SRE) Similar approach than LRE SRE on EBBs Full- and Partial-SRE New formulation of the analysis No path-sensitive elimination! Elimination of dead stores Other optimizations produce a lot of dead stores Form of dead code elimination Based on heuristics Includes a basic analysis for useless stack locations ... I1 store r1, (p0) ... I2 store r2, (p0) ... ---------------- ... I1 load (p0), r0 ... I2 store r0, (p0) ... ----------------
Talk outline • Motivation • Memory redundancy elimination (MRE) • Evaluation • Summary
Methodology • Benchmark suite • SPECint95 • Compiled on an AlphaServer with full optimizations • Intrumented using Pixie to get profiling information • Aggressively re-optimized using Alto • Experimental framework • Alto executable optimizer • Evaluation • Dynamic number of loads/stores • Actual execution time • AlphaServer GS-140, Alpha EV6-21264
Execution time Relative execution time on an AlphaServer GS-140, Alpha EV6-21264 525MHz
Dynamic replay traps Relative number of replay traps on the sim-alpha simulator, modeling an Alpha EV6-21264
Talk outline • Motivation • Memory redundancy elimination (MRE) • Evaluation • Summary
Summary • A high percentage of memory operations are redundant • Memory redundancy elimination (MRE) • Removal of redundant memory operations • Load redundancy elimination (LRE) in a path-sensitive fashion • Based on path-sensitive memory disambiguation • Store redundancy elimination (SRE) • Including elimination of dead stores • For executable code or link-time • Overcome limitations of traditional compilers • Valuable results on real execution time • Future directions • Explore better alias analysis mechanism • Additional techniques for MRE
Load redundancy elimination (LRE) move r1 , r0 move r0 , r2 --------------- • I2 can be removed! • I1 loads a value from memory into r1 • I2 loads from the same location into r2 • Location (p0) is not modified between I1 and I2 • r1 can be safely bypassed to r2 ... I1 load (p0), r1 ... I2 load (p0), r2 ...
LRE on executable code • Alias analysis! • Register liveness analysis! move r1 , r0 move r0 , r2 --------------- • Is (p1) at I1the same memory location than (p2) at I2? • Is there any available register between I1 and I2 that can be used to bypass r1 to r2? ... I1 load (p1), r1 ... I2 load (p2), r2 ...
LRE: Eliminating close redundancy For extended basic blocks (EBBs) Alias analysis: for disambiguation Register live analysis: for bypassing Profile-guided LRE There is not always a benefit in removing a redundant load ... I1 load (p0), r1 move r1 , r0 ... • Need to evaluate cost-benefit of applying LRE! ... I2 load (p0), r2 ... move r0 , r2 --------------- Hot Path
LRE: Eliminating distant redundancy load (p0), r0 move r0 ,r1 ---------------- move r1 ,r0 R. N. Horspool and H. C. Ho. Partial redundancy elimination driven by a cost-benefit analysis, CSSE’97 • For eliminating fully- andpartially- redundant loads • Requires insertion of speculative loads • Dataflow analysis [HorspoolHo97] • Extended cost equation • Complex search for available registers ... ... I2 load (p0),r1 ... I1 store r1 ,(p0) ...
Load redundancy elimination (LRE) Fundamental problems Alias analysis for disambiguation Liveness analysis for register bypassing Cost-benefit analysis for applying LRE Profile information is needed Eliminating close redundancy Within extended basic blocks (EBBs) Eliminating distant redundancy Intraprocedural dataflow analysis [HorspoolHo97] For fully/partially-redundant loads Partial-LRE requires insertion of speculative loads ... I1 load (p0), r1 move r1 , r0 ... ... I2 load (p0), r2 ... move r0 , r2 --------------- Hot Path R. N. Horspool and H. C. Ho. Partial redundancy elimination driven by a cost-benefit analysis, CSSE’97
Path-sensitive LRE • Path-sensitive redundancy • Redundancy occurs only on some execution paths • Partial-LRE is not sufficient • Memory disambiguation • Using register use-def chains • Symbolic descriptors for every use • Path-sensitive memory disambiguation is needed! ... I0def p0 ... I1 load (p0),r1 ... ... I3 add p0,8,p0 ... IØØ-def p0 ... I2 load (p0),r2 ...
Path-sensitive memory disambiguation • Path-sensitive information • Disambiguation is established for only a subset of all the possible paths • For detecting path-sensitive exact memory dependencies • Partial-LRE • Algorithm is now adapted for dealing with path-sensitive redundancies • Availability on edge (AVEDGij) ... I0def p0 ... I1 load (p0),r1 move r1, r0 ... ... I3 add p0,8,p0 load (p0),r0 ... IØØ-def p0 ... move r0, r2 I2 load (p0),r2 ... √ x ---------------
A combined algorithm Short-distance MRE Basic MRE within EBBs Long-distance MRE Full Full-MRE Partial Partial-MRE Complete Path-sensitive LRE Partial SRE Dead store elimination Easy optimizations(including Basic-MRE) Easy optimizations(including Basic-MRE) Easy optimizations(including Basic-MRE) Function inlining Long-distance MRE(Full/Partial/Complete)