170 likes | 185 Views
Explore IBM and UMD's Dyninst tools for cache understanding, advanced memory tracing, restructuring hints, and cache prediction. Enhance memory profiling, loop extraction, and compression for efficient application analysis. Collaborate, instrument, and simulate with DynSigma for insightful runtime data access insights.
E N D
Using Dyninst to Dynamically Control Memory Reference Tracing Jeff Odom
Sigma Goals • Collaboration between IBM and UMD • Family of tools to understand caches • Focus of detailed statistics • Complement existing hardware counters • Ability to handle real applications • MPI and OpenMP programs • Fortran and C • Provide hints about restructuring • Padding (both inter and intra data structures) • Blocking
Approach • Run instrumented program • Capture full information about memory use • Produce compact trace • Extracts loops and memory strides • Post execution tools • Detailed simulator • Full discrete event simulator • Memory profiler • share of accesses due to each data structure • Cache Prediction Tool • Predict cache misses using symbolic equations
RPT BLK1 ADR ADR ADR BLK2 ADR ADR BLK3 250 100 200 300 300 500 7 4 4 4 4 4 Representing Program Execution • Capture full execution behavior • Record all basic blocks and memory addresses • Produces large traces (due to looping) • Trace compression • Maintain pattern buffer • Scan for repeating patterns • Extract memory strides • Repeat algorithms for nested loops Base Count Length Stride
Not Enough • A few seconds generates gigabytes • Regularity of data critical to compression • Lossy tracing • Statistically “rebuild” trace from sampled set
Sampling • Leverages Sigma • Most scientific apps loop based • Regular data access gives better compresion • Time step boundary • Outermost loop • Non-uniform memory access OK
Sigma + Dyninst • Dyninst natural choice • Vary sample rate without recompilation • Adaptive/progressive rate during execution • Leverage existing Sigma infrastructure • Only generate trace • Offline simulation step unchanged
DynSigma • Mutator parses executable, inserts instrumentation, generates aux files • Instructions/module • Stack/global variables • Functions/line # • Group points by basic block (NEW) • Find load/store instrumentation viaBPatch_basicBlock::findPoint() • Mutatee generates trace • Inserted Sigma library
Sample Application • Seismic simulation from SPEC-HPC 2002 • Models multiple seismic processes • Process results pipelined • Variable time steps • Different data pattern for each process • C & Fortran • Fortran – data processing • C – dynamic memory management, IO
Why go to all the trouble? • How about just one time step?
Size does matter • Includes 0:12 mutator overhead
Conclusions • Compressed traces may be very large for short runtimes • Sampling single time step no good • Concentrate on main processing loop • Small (1%) samples accurate enough
Ongoing & Future Work • Measure another application • Determining time steps at runtime • Extending code coverage with counters • Adaptive sampling rates • Multi-pass memory profiling • Irregular accesses • Sampling • Multithreaded applications