160 likes | 258 Views
The SIGMA Tools. Jeff Hollingsworth (University of Maryland) Luiz Derose K Ekanadham (IBM Research). Sigma Goals. Family of tools to understand caches Focus of detailed statistics Complement existing hardware counters Ability to handle real applications MPI and openMP programs
E N D
The SIGMA Tools Jeff Hollingsworth (University of Maryland) Luiz Derose K Ekanadham (IBM Research)
Sigma Goals • Family of tools to understand caches • Focus of detailed statistics • Complement existing hardware counters • Ability to handle real applications • MPI and openMP programs • Fortran and C • Provide hints about restructuring • Padding (both inter and intra data structures) • Blocking
Approach • Run instrumented program • Capture full information about memory use • Produce compact trace • Extracts loops and memory strides • Post execution tools • Memory profiler • share of accesses due to each data structure • Cache Prediction Tool • Predict cache misses using symbolic equations • Detailed simulator • Full discrete event simulator
dumpMap .addr ProgramExecution trace files Instrumentedbinary CacheSimulator PredictionTool MemoryRef Tool Structure of SIGMA Data Collection source files SigmaCompile/Link .lst files
New Dyninst Features for SIGMA • Fortran Common Blocks • Class BPatch_cblock • Represents a unique definition of a common block • getComponents – returns members of the common block • getFunctions – returns functions that define this block • Class BPatch_type • getCblocks – returns list of BPatch_cblock • Global Variables • Named common blocks now visible • Fortran specific Debug Symbols • Now parsed and visible
RPT BLK1 ADR ADR ADR BLK2 ADR ADR BLK3 250 100 200 300 300 500 7 4 4 4 4 4 Representing Program Execution • Capture full execution behavior • Record all basic blocks and memory addresses • Produces large traces (due to looping) • Trace compression • Maintain pattern buffer • Scan for repeating patterns • Extract memory strides • Repeat algorithms for nested loops Base Count Length Stride
Trace Information • Compression ratio a function of regularity • Slowdown depends on fraction of instructions that load/store memory
Using SIGMA Trace Generation • Compiling - modify makefile • .f to .o rules • prepend $(SIGMA)/bin/sigmaCompile $< • Link step • prepend $(SIGMA)/bin/sigmaLink • Running • Two environment variables • SIGMA_TRACELEVEL • SIGMA_TRACEDIR • Selected instrumentation • Only sigmaCompile selected files • No overhead for uninstrumented files • Explict calls to enable/disable • Some overhead remains
Cache Prediction Tool • Use compressed traces • Convert memory refs back to array refs • Compute Miss Equations • re-use vectors (Ghosh & Martonosi) • Direct set of linear constraints (Chatterjee et. al) • To Compute Misses • define misses as a system of linear equations • use Omega library to solve • Provides • count of misses • information about iterations that cause misses
Iteration Space • Re-use vectors • defines points in the iteration space that access the same data • Miss equations • describe points in interaction space that cause misses on conflicts
Predicting cache misses • Operate on compact traces • Only expand to full trace if needed • Use algorithms developed for compilers • Re-use vectors • Cache miss equations • Miss types are identified • capacity, cold, and conflict
Cache Terminology Memory consists of lines L Cache -way associate Each Line maps to a set S
Array References • A reference Rv(i1,i2) refers to • the vth array reference in a loop • the i1th iteration of the outer loop • the i2nd iteration of the inner loop • Rv(i1,i2) precedes Ru(j1,j2) if • i1 < j1 or • i1 = j1 and i2 < j2 or • i1 = j1 and i2 = j2 and v < u
A Replacement Miss • There exists a reference Ra(i1,i2) such that • Ra(i1,i2) refers to line L and maps to set S • There exists another Rb(j1,j2) such that • Rb(j1,j2) refers to line L and maps to set S • Rb(j1,j2) precedes Ra(i1,i2) • There exist at least references such that • Rn(k1,k2) maps to set S • Rn(k1,k2) refers to line line Ln where • Ln is distinct from all other Ln’s and L • Ra(j1,j2) precedes Rb(k1,k2) precedes Rb(i1,i2)
Using Miss Data • For each Reference get • Set of iterations that produce cold misses • Set of iterations that produce replacement misses • Counting Misses • Can count misses at each reference • Combined counts for a loop nest
Status • Trace Generation Running • Cache Prediction Running for small loops • Future Work • Multiple loop nests • Multi-level caches • Irregular programs