1 / 17

Why do we need micro-benchmarks?

This research paper presents MicroProbe, a framework for automated generation of micro-benchmarks to systematically characterize the energy consumption and performance of CMP/SMT processor systems. The paper discusses the need for micro-benchmarks, the limitations of manual benchmark generation, and the distinguishing features of MicroProbe. It also showcases three case studies demonstrating the effectiveness of MicroProbe in EPI characterization, max-power stressmark generation, and counter-based processor power modeling.

johnsbrown
Download Presentation

Why do we need micro-benchmarks?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Systematic Energy Characterization of CMP/SMT Processor Systems via Automated Micro-BenchmarksR. Bertran*+, A. Buyuktosunoglu*, M. Gupta*, M. Gonzalez+, P. Bose**IBM T.J. Watson Research Center+Barcelona Supercomputing Center

  2. Why do we need micro-benchmarks? What is the maximum power consumption? Any performance bug? Any reliability issues? … Micro-benchmarks! • Time consuming and tedious • Error prone task • Trial and error process • Several micro-benchmarks are required • Deep expertise limited to few designers • Detailed knowledge of the underlying architecture is required AUTOMATED SOLUTION NEEDED! 2

  3. MicroProbe:a micro-benchmark generation framework

  4. MicroProbe Workflow Inputs Outputs Micro-benchmark generation policy MicroProbe Framework User Micro- Bench-mark Micro- Bench-mark Micro- Bench-mark Micro- Bench-mark Endless loop for each instruction of the ISA Endless loop 50% INT 50% FP Max Power stressmark Architecture Definition files External tools Real platforms Simulators Models

  5. MicroProbe: Distinguishing Features 5

  6. MicroProbe Usage and Design Overview Research idea Micro-benchmark Micro-benchmark Micro-benchmark Micro-benchmark generation policies (user-defined scripts) Loop stressing the floating point unit Sequence of loads hitting 50% L1 and 50% L2 Generate a stress- mark for each functional unit of the architecture Search for the sequence of 2 loads and 2 integer operations with maximum IPC MicroProbe Framework (Python API) Architecture module Code generation module Design space exploration module ISA definitions Micro-architecture analytical models ISA definitions Micro-architecture analytical models ISA definitions Micro-architecture analytical models Micro-benchmark synthesizer Search drivers Search drivers Search drivers Micro-architecture definitions Automatic bootstrap process Micro-architecture definitions Micro-architecture definitions Properties Properties Passes Properties Passes Passes External tools

  7. Max-power Stressmark Generation Use MicroProbe to generate max-power stressmark Characterize energy per instruction (EPI) and IPC (Architecture Module) mulldo xvnmsubmdp lxvw4x Select N instructions with max (IPC* EPI) Loop: … mulldo mulldo lxvw4x lxvw4x xvnmsubmdp xvnmsubmdp … Form a basic endless loop (e.g. 4K) using selected instructions (Code Generation Module) Loop: … mulldo lxvw4x mulldo xvnmsubmdp lxvw4x xvnmsubmdp … Generate micro-benchmarks with different orders of the selected N instructions Evaluate using Design Space Exploration Module Pick the highest power microbenchmark 7

  8. CASE studies MicroProbe:A Micro-benchmark Generation Framework 8

  9. Experimental Methodology • Platform: • Processor: POWER7 @ 3GHz • 8-core 4-way SMT • 32KB L1, 256KB L2 and 4MB L3 per core • Memory: 32 GB DDR3 SDRAM @ 800MHz • OS: RHEL 5.7 + Linux 3.0.1 • EnergyScale architecture • Power measurements in miliwatts • Sampling rate up to 1ms • In-house software collects power and performance counter traces [C. Lefurgy et al, IBM] 9

  10. Case Study 1: EPI Characterization High differences in EPI across instructions stressing different micro-architecture components High differences in EPI across instructions stressing the same micro-architecture components and at the same rate (IPC) 10

  11. Case Study 2: Max-power Stressmark Generation Loops Loops Loops Loops Loops Loops Loops Loops Loops Loops Loops Loops Use MicroProbe Use complex instructions accessing different functional units with high IPC Use a computational intensive kernel Generate all possible combinations of complex instructions stressing different units ? MicroProbe Expert manual Loop: … mullw lxvd2x mullw xvmaddadp lxvd2x xvmaddadp … Loop: … mullw lxvd2x mullw xvmaddadp xvmaddadp lxvd2x … Loop: … mullw mullw xvmaddadp xvmaddadp lxvd2x lxvd2x … Expert DSE MicroProbe Heuristic: Max(EPI * IPC) DAXPY Selected intructions: mullw xvmaddadp lxvd2x Selected instructions: mulldo, xvnmsubmdp, lxvw4x MicroProbe 11

  12. Max-power Stressmark Generation 12

  13. Case Study 3: Counter-based Processor Power Model 1 Bottom-up Power modeling method Dynamic Power f(PMCs) Func.Unit micro- Benchmarks CMP1–SMT1 Intercept SMT1 Random micro- Benchmarks CMP1–SMT1 2 SMT effect Random micro- Benchmarks CMP1–SMT2/4 Intercept SMT2-4 CMP effect Random micro- Benchmarks CMP1/8–SMT2/4 Linear Regression f(CMP) 3 Uncore power Model: Dynamic Power f(PMCs) Uncore power SMT effect SMT enabled CMP effect # cores 13

  14. Counter-based Processor Power ModelValidation Within acceptable error margins: < 4% on average

  15. Counter-based Processor Power ModelValidation on Corner Cases • Models trained using non-micro-architecture aware training sets show high errors and variability • Models trained using the micro-architecture aware training set show acceptable error margins: < 5% on average

  16. Conclusions • MicroProbe is a productive micro-benchmark generation framework • Adaptive and flexible • Includes micro-architecture semantics • Integrates design space exploration • Presented three case studies: • Instruction-based EPI characterization • Automated max-power stressmark generation • CMP/SMT-aware bottom-up counter-based processor power model 16

  17. QUESTIONS? MicroProbe:A Micro-benchmark Generation Framework 17

More Related