Performance Modeling and Analysis with PEBIL

Performance Modeling and Analysis with PEBIL Michael Laurenzano, AnantaTiwari, Laura Carrington Performance Modeling and Characterization (PMaC) Laboratory San Diego Supercomputer Center

Outline • Motivation • Performance modeling in High Performance Computing (HPC) • How does binary instrumentation fit in? • PEBIL = PMaC’sEfficient Binary Instrumentation for Linux/x86 • Binary instrumentation overview • Use case: memory tracing • Use case: function profiling

HPC Target System PMaC HPC Performance Models Performance Model – a calculable expression of the runtime, efficiency, memory use, etc. of an HPC program on some machine HPC Target System HPC Application HPC Application Application signature – detailed summaries of the fundamental operations to be carried out by the application Machine Profile – characterizations of the rates at which a machine can carry out fundamental operations Requirements of HPC Application – Application Signature Characteristics of HPC system – Machine Profile Measured or projected via simple benchmarks on 1-2 nodes of the system Collected via trace tools Performance of Application on Target system Convolution Methods map Application Signatures to Machine Profiles produce performance prediction

Application Signature • Application signature – fundamental operations used by the application • Requires low-level details of application • Details attached to specific structures within the application • Measurement? (e.g. timers or hardware counters) • Measuring at fine grain with reasonable overheads & transparently is HARD Use binary instrumentation

Binary Instrumentation • Instrumentation – inserting extra code into a program, usually to inspect some aspect of behavior • Binary instrumentation – instrumentation of the compiled object/executable void incrementby(int& n, int c){ counter++; // instrumentation code n += c; }

The Case for Binary Instrumentation • Low-level details of application • Program is in its binary form • Compilers transform and optimize • Basic program structure • Memory access • Vectorization • Data dependencies • The executable might be all we have • Easy to tie details to application structures int identity(int n){ int c = 0; while (c < n) c++; return c; } int identity(int n){ return n; }

Runtime Overhead is a Big Deal • PEBIL… the E stands for Efficient • We want to model real HPC applications • Relatively long runtimes: minutes, hours, days? • Lots of CPUS: O(105) in largest supercomputers • High slowdowns create problems • Too long for queue • Unsympathetic administrators/managers • Inconvenience • Unnecessarily use resources Mitigate problems by minimizing runtime overhead

Example Use Cases • Memory address trace collection • Capture all application loads/stores • Use a buffer, batch process them • Very widely used • Performance/energy models (e.g., PMaC) • Cache design • Memory bug detection • For efficiency, this is often used with sampling • Function/loop measurement • Insert calls to measurement routines around functions/loops • TAU uses this feature

PEBIL Design • Efficiency is priority #1 • Designed around a few use cases • Execution counting • Memory tracing • Static binary rewriter • Write instrumented + runnable executable to disk • Keep original behavior intact • Gather information as a side-effect • Instrument once, run many times • No instrumentation cost at runtime • Code patching (not just-in-time compiled!)

How Binary Instrumentation Works (Basic block counting) Original Instrumented 0000c000 <foo>: c000: 48 89 7d f8 mov %rdi,-0x8(%rbp) c004: 5e pop %rsi c005: 75 f8 jne 0xc004 c007: c9 leaveq c008: c3 retq Basic Block 1 Basic Block 2 Basic Block 3 0000d000 <foo>: d000: e9 de ad be efjmp 0x1000 # to instrumentation d005: 48 89 7d f8 mov %rdi,-0x8(%rbp) d000: e9 de ad be efjmp 0x1010 # to instrumentation d00a: 5e pop %rsi d00b: 75 00 00 00 f8 jne 0xd009 d000: e9 de ad be efjmp 0x1020 # to instrumentation d00a: c9 leaveq d00b: c3 retq // do stuff // jump back

Use case: Memory Address Collection • Collect the address of every load/store issued by the application • Put addresses in a buffer, process addresses in batch • Fewer function calls • Less cache pollution for (i = 0; i < n; i++){ A[i] = B[i]; } if (cur + 2 > BUF_SIZE) clear_buf(); buffer[cur + 0] = &(A[i]); buffer[cur + 1] = &(B[i]);

Optimization – Sampling w/ Instrumentation Point Disabling • Processing addresses is usually expensive • Cache simulation (multiple caches), locality analysis, address stream compression • Use interval-based sampling • Process the first X of every Y addresses (Y >= X) • Obvious result: reduced processing overhead • Not so obvious: reduced collection overhead by skipping address collection during sampled regions • Different approaches • PEBIL – swap instrumentation with nops • Very lightweight, limited functionality • PIN / Dyninst – Arbitrarily remove, re-instrument • Heavyweight, rich functionality

Memory Trace Overhead w/ Sampling OpenMP NAS Parallel Benchmarks (8 threads)

Use case: Inserting Profiling Routines • Insert calls to timers/tracking code around functions and loops • Want l low overhead, especially where no instrumentation is introduced • Small overhead = accurate profile • “throttle” instrumentation points that are called too frequently • Don’t just ignore them, disable them! • Collaboration w/ Tuning Analysis and Utilities (TAU) project void compute(){ // function id 0 for (i = 0; i < n; i++){ // loop id 1 A[i] = B[i]; } for (i = 0; i < n; i++){ // loop id 2 A[i] += C[i]; } } profile_begin(0); profile_begin(1); profile_end(1); profile_begin(2); profile_end(2); profile_end(0);

Contact Info download https://github.com/mlaurenzano/PEBIL email michaell@sdsc.edu, lcarring@sdsc.edu

Performance Modeling and Analysis with PEBIL

Performance Modeling and Analysis with PEBIL

Presentation Transcript

Performance Modeling

Simulation Modeling and Analysis

Simulation Modeling and Analysis

Energy Performance Analysis with RETScreen

Modeling and Analysis

Simulation Modeling and Analysis

Isogeometric modeling and analysis

Static Analysis and Modeling

Performance Modeling and Analysis with PEBIL

Performance Analysis with Vampir

Performance Analysis with Vampir

Analysis and Performance

MODELING AND ANALYSIS

Platform Modeling and Analysis

Performance Analysis with Real Case

Software Performance Modeling

Performance Modeling

Application Performance Analysis and Modeling

Simulation Modeling and Analysis

Performance Analysis with Parallel Performance Wizard

Simulation Modeling and Analysis

Performance Modeling