180 likes | 292 Views
University of Maryland. Thomas J. Watson. Research Center. Instrumentation and Performance Analysis for Finding Memory Bottlenecks. Jeff Hollingsworth Luiz Derose K Ekanadham. Using Data Cache Sampling. Hardware Requirements: Periodic interrupt on cache miss
E N D
University of Maryland Thomas J. Watson Research Center Instrumentation and Performance Analysis for Finding Memory Bottlenecks Jeff Hollingsworth Luiz Derose K Ekanadham
Using Data Cache Sampling • Hardware Requirements: • Periodic interrupt on cache miss • Ability to determine miss address • Associate count with each object • Variable or dynamically allocated memory • Interrupt after every n cache misses • Obtain address of miss • Find object containing it and increment count • Advantage: simplicity
Experimental Evaluation • Implemented in simulation • Simulator uses ATOM binary rewriting tool • Instrument load/stores for cache simulation • Instrument basic blocks for virtual cycle count • Simulates necessary hardware support • Sampling and n-way search run under simulation • Tested using SPEC 95 applications • tomcatv, swim, su2cor, mgrid, applu, compress, ijpeg • sampled 1 in 50,000 misses
Quality of Results Application Variable Actual Sample Rank % Rank % tomcatv RY 1 22.5 2 17.6 RX 2 1 37.1 22.5 AA 3 15.0 5 10.1 DD 4 10.0 3 15.0 X 5 10.0 6 9.8 Y 6 10.0 7 0.2 D 7 10.0 4 10.2 applu A 1 22.9 2 23.0 B 2 22.9 3 19.9 C 3 22.6 1 25.8 D 4 17.4 4 16.7 rsd 5 6.9 5 7.7
Application Variable Actual Sample Rank % Rank % tomcatv RY 1 22.5 1 22.6 RX 2 22.5 2 22.5 AA 3 15.0 3 15.6 DD 4 10.0 7 9.4 X 5 10.0 6 9.7 Y 6 10.0 4 10.5 D 7 10.0 5 9.8 Varying Sampling Interval Lesson re-learned: randomly vary sampling interval
Sigma Goals • A Research project • Less of a production tool than others from ACTC • Family of tools to understand caches • Focus of detailed statistics • Complement existing hardware counters • Ability to handle real applications • MPI and openMP programs • Fortran and C • Provide hints about restructuring • Padding (both inter and intra data structures) • Blocking
Approach • Run instrumented program • Capture full information about memory use • Produce compact trace • Extracts loops and memory strides • Post execution tools • Memory profiler • share of accesses due to each data structure • Cache Prediction Tool • Predict cache misses using symbolic equations • Detailed simulator • Full discrete event simulator
Cache Prediction Tool • Predict cache misses • Operate on compact traces • Only expand to full trace if needed • Use algorithms developed for compilers • Re-use vectors • Cache miss equations • Capacity, cold, and conflict misses are identified
Iteration Space • Re-use vectors • defines points in the iteration space that access the same data • Miss equations • describe points in interaction space that cause misses on conflicts
dumpMap .addr ProgramExecution trace files Instrumentedbinary CacheSimulator PredictionTool MemoryRef Tool Structure of SIGMA Data Collection source files SigmaCompile/Link .lst files
RPT BLK1 ADR ADR ADR BLK2 ADR ADR BLK3 250 100 200 300 300 500 7 4 4 4 4 4 Representing Program Execution • Capture full execution behavior • Record all basic blocks and memory addresses • Produces large traces (due to looping) • Trace compression • Maintain pattern buffer • Scan for repeating patterns • Extract memory strides • Repeat algorithms for nested loops Base Count Length Stride
Trace Information • Compression ratio a function of regularity • Slowdown depends on fraction of instructions that load/store memory
Cache Prediction Tool • Use compressed traces • Convert memory refs back to array refs • Solve Cache Miss Equations • computer re-use vectors • define misses as a system of linear equations • use Omega library to solve • Provides • count of misses • information about iterations that cause misses
Using Dyninst to Gather Data • Extend dyninst to support memory ops • Load/store/prefetch instrumentation points • Done and working on Power and SPARC • Extend dyninst AST to include effective addr • Allows code to use memory address • Dyninst for SIGMA Instrumentation provides • Multi-platform support • Dynamic control of instrumentation • Selection of specific functions, loops, memory ops • Possible use of CFGs to optimize instrumentation