300 likes | 320 Views
IA32/EM64T/IPF. Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation. Harish Patil, Robert Cohn, Mark Charney, Rajiv Kapoor, Andrew Sun, Anand Karunanidhi Enterprise Platform Group Intel Corporation.
E N D
IA32/EM64T/IPF Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn, Mark Charney, Rajiv Kapoor, Andrew Sun, Anand Karunanidhi Enterprise Platform Group Intel Corporation Presented at MICRO-37: Portland, OR, Dec. 6th, 2004
Goal: Accurate Performance Prediction Target: LARGE Applications • With little/no manual intervention • Within reasonable time
SPECINT (average) SPECFP (average) FluentL2 Amberrt Ls-Dyna3cars RenderManmagic Instruction Counts : Some Itanium Applications
SPECINT (average) SPECFP (average) FluentL2 Amberrt Ls-Dyna3cars RenderManmagic Whole-Program Simulation is Slow
Solution: Select Simulation Points • Manually • Randomly • Anywhere • From uniform regions • Fine-grain sampling (SMARTS: CMU) • By program-phase analysis (SimPoint:UCSD, iPart: Intel/MRL)
Running Commercial Applications on Simulators is Hard • Resource Requirements: Disks etc. • Need to modify/re-configure the simulator • OS dependencies • Need support for specific kernel and device drivers • License checking • Need special action
Solution: Native Execution with Instrumentation Use PIN to select simulation points (PinPoints) and generate traces PIN: A dynamic-instrumentation system • A tool for writing tools • No special compiler/linker flags required
PIN-Tools: Profiling, Trace Generation and more…. PIN-based profiler PIN-based Trace Generator Profile PinPoints PIN-based Branch Predictor Simulation Point Selection Your Simulator Here
Simulation Point Selection withSimPoint [UCSD] Why SimPoint? • Instrumentation based • Microarchitecture independent • Works well (results later) Applied to multi-threaded programs Basic Block Vectors PinPoints PIN-based profiler SimPoint Tools
SimulationStats (CPI) PinPoints Traces Goal: Accurate Performance Prediction Phase-detection is not enough! Need Trace Generation and Simulation Error Source: Phase detection Error Source: Non-repeatability Error Source: Warm-up, Modeling Multiple Sources of Error
Main Contributions • A Toolkit that automatically: • Profiles, finds phases/ simulation regions (PinPoints) • Validates that PinPoints are representative • Generates traces for simulators Available for Itanium/IA32/EM64T • Evaluations in a production environment
The PinPoints Toolkit Phase Detection + PinPoint Selection H/W counters-based Validation(pfmon : ItaniumPAPI : IA32) PinPoints file Compute CPI Weighted Sum for PinPoints Whole Program Trace Generation/Simulation Match?
Evaluations Applications: Built w/ Intel’s compilers (high opt)HPC: Fluent, AMBER, LS-Dyna, RenderMan SPEC2000: Processed 8-9 times Test Configurations: Linux (RedHat)
PinPoints Generated • PinPoints << 1% of program execution • Turnaround time (Traces) : Few days
Results: Overview • PinPoints: Whole-Program CPI prediction (SPEC2000 and HPC applications): • Average CPI prediction error ~5% • PinPoints better than random selection • Predicting speedup between microarchitectures • PinPoints can be used to evaluate microarchitecture variations • PinPoints Traces: Prediction of native SPEC2000 ratios • INT within 8% FP within 3% More results in the paper
SPEC2000 CPI PredictionAverage Error: Madison : 2.8% Merced : 3.2% McKinley : 2.7%
HPC Applications CPI PredictionAverage Error: Madison : 5.0%
PinPoints: Speedup Prediction Across Multiple MicroarchitecturesSame Binaries/PinPoints
Putting it All Together:From PinPoints to Projections SimulationStats (CPI) PinPoints Traces Does simulation of traces for PinPoints predict native performance? Error: Cumulative Error Source: Phase detection Error Source: Non-repeatability Error Source: Warm-up, Modeling
Native SPEC2000 Ratios[Spring 2004]Itanium: Madison 1.5GHz/6MB L3
Performance Prediction from PinPoints TracesItanium: Madison 1.5GHz/6MB L3
Summary PinPoints toolkit : Automaticsimulation region selection, tracing, and validation Dynamic instrumentation (PIN )LARGE programs • PinPoints: << 1% of executionCapture whole-program CPI • Average error < 5% for SPEC2000, HPC apps. • Better than random selection • PinPoints traces: Predict SPEC2000 Ratios • INT within 8% FP within 3%
Try it out! (PIN + PinPoints) toolkit:http://rogue.colorado.edu/Pin New
Backup: Simulator Warm-up • Strategy 1: Large slice-size (250 million instructions) • Too coarse-grain for phase detection • Too much simulation time • Strategy 2: 7 warm-up traces per simulation trace (30 million instructions) Art (SPECFP2000): First pinpoint touches most of the working set • Simulate all pinpoint traces in succession