1 / 17

Using Dyninst to Dynamically Control Memory Reference Tracing

Explore IBM and UMD's Dyninst tools for cache understanding, advanced memory tracing, restructuring hints, and cache prediction. Enhance memory profiling, loop extraction, and compression for efficient application analysis. Collaborate, instrument, and simulate with DynSigma for insightful runtime data access insights.

kayala
Download Presentation

Using Dyninst to Dynamically Control Memory Reference Tracing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Dyninst to Dynamically Control Memory Reference Tracing Jeff Odom

  2. Sigma Goals • Collaboration between IBM and UMD • Family of tools to understand caches • Focus of detailed statistics • Complement existing hardware counters • Ability to handle real applications • MPI and OpenMP programs • Fortran and C • Provide hints about restructuring • Padding (both inter and intra data structures) • Blocking

  3. Approach • Run instrumented program • Capture full information about memory use • Produce compact trace • Extracts loops and memory strides • Post execution tools • Detailed simulator • Full discrete event simulator • Memory profiler • share of accesses due to each data structure • Cache Prediction Tool • Predict cache misses using symbolic equations

  4. RPT BLK1 ADR ADR ADR BLK2 ADR ADR BLK3 250 100 200 300 300 500 7 4 4 4 4 4 Representing Program Execution • Capture full execution behavior • Record all basic blocks and memory addresses • Produces large traces (due to looping) • Trace compression • Maintain pattern buffer • Scan for repeating patterns • Extract memory strides • Repeat algorithms for nested loops Base Count Length Stride

  5. Not Enough • A few seconds generates gigabytes • Regularity of data critical to compression • Lossy tracing • Statistically “rebuild” trace from sampled set

  6. Sampling • Leverages Sigma • Most scientific apps loop based • Regular data access gives better compresion • Time step boundary • Outermost loop • Non-uniform memory access OK

  7. Sigma + Dyninst • Dyninst natural choice • Vary sample rate without recompilation • Adaptive/progressive rate during execution • Leverage existing Sigma infrastructure • Only generate trace • Offline simulation step unchanged

  8. DynSigma • Mutator parses executable, inserts instrumentation, generates aux files • Instructions/module • Stack/global variables • Functions/line # • Group points by basic block (NEW) • Find load/store instrumentation viaBPatch_basicBlock::findPoint() • Mutatee generates trace • Inserted Sigma library

  9. Sample Application • Seismic simulation from SPEC-HPC 2002 • Models multiple seismic processes • Process results pipelined • Variable time steps • Different data pattern for each process • C & Fortran • Fortran – data processing • C – dynamic memory management, IO

  10. L1 cache memtime by data structure

  11. L2 cache memtime by data structure

  12. L1 + L2 memtime by data structure

  13. L1 + L2 memtime by data structure

  14. Why go to all the trouble? • How about just one time step?

  15. Size does matter • Includes 0:12 mutator overhead

  16. Conclusions • Compressed traces may be very large for short runtimes • Sampling single time step no good • Concentrate on main processing loop • Small (1%) samples accurate enough

  17. Ongoing & Future Work • Measure another application • Determining time steps at runtime • Extending code coverage with counters • Adaptive sampling rates • Multi-pass memory profiling • Irregular accesses • Sampling • Multithreaded applications

More Related