1 / 30

Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation

IA32/EM64T/IPF. Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation. Harish Patil, Robert Cohn, Mark Charney, Rajiv Kapoor, Andrew Sun, Anand Karunanidhi Enterprise Platform Group Intel Corporation.

misae
Download Presentation

Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IA32/EM64T/IPF Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn, Mark Charney, Rajiv Kapoor, Andrew Sun, Anand Karunanidhi Enterprise Platform Group Intel Corporation Presented at MICRO-37: Portland, OR, Dec. 6th, 2004

  2. Goal: Accurate Performance Prediction Target: LARGE Applications • With little/no manual intervention • Within reasonable time

  3. SPECINT (average) SPECFP (average) FluentL2 Amberrt Ls-Dyna3cars RenderManmagic Instruction Counts : Some Itanium Applications

  4. SPECINT (average) SPECFP (average) FluentL2 Amberrt Ls-Dyna3cars RenderManmagic Whole-Program Simulation is Slow

  5. Solution: Select Simulation Points • Manually • Randomly • Anywhere • From uniform regions • Fine-grain sampling (SMARTS: CMU) • By program-phase analysis (SimPoint:UCSD, iPart: Intel/MRL)

  6. Running Commercial Applications on Simulators is Hard • Resource Requirements: Disks etc. • Need to modify/re-configure the simulator • OS dependencies • Need support for specific kernel and device drivers • License checking • Need special action

  7. Solution: Native Execution with Instrumentation Use PIN to select simulation points (PinPoints) and generate traces PIN: A dynamic-instrumentation system • A tool for writing tools • No special compiler/linker flags required

  8. PIN-Tools: Profiling, Trace Generation and more…. PIN-based profiler PIN-based Trace Generator Profile PinPoints PIN-based Branch Predictor Simulation Point Selection Your Simulator Here

  9. Simulation Point Selection withSimPoint [UCSD] Why SimPoint? • Instrumentation based • Microarchitecture independent • Works well (results later) Applied to multi-threaded programs Basic Block Vectors PinPoints PIN-based profiler SimPoint Tools

  10. SimulationStats (CPI) PinPoints Traces Goal: Accurate Performance Prediction Phase-detection is not enough! Need Trace Generation and Simulation Error Source: Phase detection Error Source: Non-repeatability Error Source: Warm-up, Modeling Multiple Sources of Error

  11. Main Contributions • A Toolkit that automatically: • Profiles, finds phases/ simulation regions (PinPoints) • Validates that PinPoints are representative • Generates traces for simulators Available for Itanium/IA32/EM64T • Evaluations in a production environment

  12. The PinPoints Toolkit Phase Detection + PinPoint Selection H/W counters-based Validation(pfmon : ItaniumPAPI : IA32) PinPoints file Compute CPI Weighted Sum for PinPoints Whole Program Trace Generation/Simulation Match?

  13. Evaluations Applications: Built w/ Intel’s compilers (high opt)HPC: Fluent, AMBER, LS-Dyna, RenderMan SPEC2000: Processed 8-9 times Test Configurations: Linux (RedHat)

  14. PinPoints Generated • PinPoints << 1% of program execution • Turnaround time (Traces) : Few days

  15. Results: Overview • PinPoints: Whole-Program CPI prediction (SPEC2000 and HPC applications): • Average CPI prediction error ~5% • PinPoints better than random selection • Predicting speedup between microarchitectures • PinPoints can be used to evaluate microarchitecture variations • PinPoints Traces: Prediction of native SPEC2000 ratios • INT within 8% FP within 3% More results in the paper

  16. CPI: Actual vs. PredictedSPEC2000: Itanium-Madison

  17. SPEC2000 CPI PredictionAverage Error: Madison : 2.8% Merced : 3.2% McKinley : 2.7%

  18. HPC Applications CPI PredictionAverage Error: Madison : 5.0%

  19. Comparison With Random Selection[ 48 unique program runs ]

  20. Comparison With Random Selection[ 18 unique program runs ]

  21. Speedup: Merced  McKinleySPEC2000

  22. PinPoints Speedup Prediction: SPEC2000: Merced  McKinley

  23. PinPoints: Speedup Prediction Across Multiple MicroarchitecturesSame Binaries/PinPoints

  24. Putting it All Together:From PinPoints to Projections SimulationStats (CPI) PinPoints Traces Does simulation of traces for PinPoints predict native performance? Error: Cumulative Error Source: Phase detection Error Source: Non-repeatability Error Source: Warm-up, Modeling

  25. CPI Prediction with SimulationSPEC2000: Itanium Madison

  26. Native SPEC2000 Ratios[Spring 2004]Itanium: Madison 1.5GHz/6MB L3

  27. Performance Prediction from PinPoints TracesItanium: Madison 1.5GHz/6MB L3

  28. Summary PinPoints toolkit : Automaticsimulation region selection, tracing, and validation Dynamic instrumentation (PIN )LARGE programs • PinPoints: << 1% of executionCapture whole-program CPI • Average error < 5% for SPEC2000, HPC apps. • Better than random selection • PinPoints traces: Predict SPEC2000 Ratios • INT within 8% FP within 3%

  29. Try it out! (PIN + PinPoints) toolkit:http://rogue.colorado.edu/Pin New

  30. Backup: Simulator Warm-up • Strategy 1: Large slice-size (250 million instructions) • Too coarse-grain for phase detection • Too much simulation time • Strategy 2: 7 warm-up traces per simulation trace (30 million instructions) Art (SPECFP2000): First pinpoint touches most of the working set • Simulate all pinpoint traces in succession

More Related