320 likes | 514 Views
Automated Microprocessor Stressmark Generation. Ajay M. Joshi* Lieven Eeckhout** Lizy K. John* Ciji Isen* *The University of Texas at Austin **Ghent University, Belgium HPCA 2008, Feb 19, Salt Lake City, UT. Energy, power, power density, temperature, voltage variation, ….
E N D
Automated Microprocessor Stressmark Generation Ajay M. Joshi* Lieven Eeckhout** Lizy K. John* Ciji Isen* *The University of Texas at Austin **Ghent University, Belgium HPCA 2008, Feb 19, Salt Lake City, UT
Energy, power, power density, temperature, voltage variation, … • First-class design constraints • Embedded processors • High-performance processors • Understanding and analysis of primary importance • Average: typical • Maximum: worst-case
Why care about worst-case? • Processor must operate properly under extreme conditions • Examples • Max power power supply, DPM • Max temperature thermal package, DTM • Max dI/dt power delivery • Localized max power hot spots circuit failure, timing errors, etc. • Max temperature differentials sensor placement
How to characterize worst-case? • Stressmarks • Hand-coded synthetic stress codes • Examples • Max power: Alpha’s Toast • Max dI/dt: Alpha’s Thumper • Limitations • Time-consuming to develop • Requires intimate understanding of system • Tied to a specific processor • Difficult to do in early design stages
A possible solution • Automatic stressmark generation • In two steps • BenchMaker • Generate synthetic benchmark from abstract workload model • StressMaker • Explore workload space by ‘turning knobs’ using BenchMaker and search for stressmarks
Outline • BenchMaker • Description • Evaluation • StressMaker • Description • Evaluation through case studies • Max-power, max single-cycle power, dI/dt • Related work • Conclusion and future work
BenchMaker hardware abstract workload model instruction mix ILP synthetic benchmark I & D footprint benchmark synthesizer D stream strides branch transition simulator BB size
Instruction mix abstract workload model Fraction short int Fraction long int Fraction short fp Fraction long fp Fraction int loads Fraction int stores Fraction fp load Fraction fp stores instruction mix ILP I & D footprint D stream strides branch transition BB size
ILP abstract workload model Probability for inter-operation dependency distance = 1 = 2 = 3, 4 = 5, 6 = 7, 8 = 9, … , 16 = 17, … , 32 > 32 instruction mix ILP I & D footprint D stream strides branch transition BB size
I & D stream behavior abstract workload model No. unique I & D addresses Fraction memory operations with a local stride (at 32-byte block level) of 0, 1, 2, …, 8, or greater than 8 instruction mix ILP I & D footprint D stream strides branch transition BB size
Branch behavior abstract workload model Probability for a transition rate of 0%-10%, 10%-20%, etc. Avg and stdev of the basic block size distribution instruction mix ILP I & D footprint D stream strides branch transition BB size
Abstract workload model abstract workload model • Only 40 characteristics • Explicit goal • In contrast to prior work • Microarchitecture-independent instruction mix ILP I & D footprint D stream strides branch transition BB size
Synthetic benchmark generator • Program spine • Instruction types • Inter-operation dependencies • Stride assignment • Branch transition • Register assignment • Code generation add sub br add ld mul br add ld sub ld st br
Synthetic benchmark generator • Input: abstract workload model • Output: synthetic benchmark • C program with embedded assembly code • Benefit: synthetic benchmark converges after 10 million dynamic instructions
Experimental setup • sim-alpha validated Alpha 21264 simulator • Wattch for power modeling • HotSpot for thermal modeling • SPEC CPU2000 • 100M simulation points • Commercial workloads • SPECjbb2005, DBT2, DBMS
Synthetic clone benchmark preserves characteristics Original benchmark Synthetic clone benchmark 2.0 1.5 IPC 1.0 0.5 0.0 vpr gcc mcf gzip dbt2 twolf bzip2 crafty dbms vortex perlbmk jbb2005 Original benchmark Synthetic clone benchmark 35 30 25 20 EPI 15 10 5 0 vpr gcc mcf gzip dbt2 twolf bzip2 dbms crafty vortex perlbmk jbb2005
Outline • BenchMaker • Description • Evaluation • StressMaker • Description • Evaluation using case studies • Max-power, max single-cycle power, dI/dt • Related work • Conclusion and future work
StressMaker BenchMaker synthetic benchmark abstract workload configuration microprocessor model abstract workload space exploration stressmark objective function: e.g., max power
Workload space exploration • Huge space • Heuristic search using genetic algorithm • Bio-inspired algorithm • Reduces likelihood for local optima • Iterative algorithm • Start from randomly generated solutions • Probabilistically retain solutions with highest objective function value • Generate new solutions using crossover & mutation • End result: stressmark
Max-power stressmark StressMaker SPEC CPU / commercial art 30 25 mesa SPECjbb2005 20 perlbmk gzip Power (Watts) 15 perlbmk perlbmk mesa gzip dbt2 gzip 10 eon mcf art 5 0 lsq alu fetch clock icache issue bpred regfile dcache window rename dispatch dcache2 resultbus • 8-wide OOO processor; 81.5Watts in total • assuming Wattch (0.18um, 1.2GHz, aggressive clock gating)
Max-power stressmark chars • Keep functional units busy • Uniform mix of instruction types • Keep issue logic busy • High ILP • No pipeline flushes • High branch predictability • Keep caches busy • Good locality similar to hand-crafted stressmarks [Gowan et al., DAC’98] [Vishwanath, Intel Tech Journal, 2000]
Evaluation of genetic algorithm • Speed • Three orders of magnitude faster than exhaustive search • Effectiveness • Max-power stressmark through StressMaker achieves 99% of max-power stressmark through exhaustive search: 48Watts for 4-wide OOO processor
Max single-cycle power • Estimate max instantaneous (single-cycle) current drawn from the power supply • StressMaker’s stressmark: 72W • Its average power consumption: 32W • [4-wide OOO processor] • Maximum power assuming all units are 100% active: 85W • StressMaker gets 85% of theoretical maximum
dI/dt stressmark • Current swings cause ripples in supply voltage • dI/dt stressmark alternates between high and low power consumption [Joseph et al., HPCA’03] [Alpha’s Thumper] • StressMaker • Generate N-insn max-power stressmark: 72W • Generate N-insn min-power stressmark: 16W • Concatenate both • Cyclic behavior with period 2N
Thermal stressmarks • Thermal hotspots • Max component power • Thermal differentials • Thermal sensor placement [Lee et al., ICCD’05] • Examples • L2 vs. I-fetch: 44.6ºC difference • No stress on L2, high ILP, high branch predictability • L2 vs. register remap: 48.4ºC difference • Lots of L2 accesses: stress L2 and minimal stress on register remap
Why automate the process? 2-wide OOO max-power stressmark 100 4-wide OOO max-power stressmark 80 8-wide OOO max-power stressmark 60 Power (Watts) 40 20 0 2-wide OOO 4-wide OOO 8-wide OOO stressmark is processor-specific
Outline • BenchMaker • Description • Evaluation • StressMaker • Description • Evaluation using case studies • Max-power, max single-cycle power, dI/dt • Related work • Conclusion and future work
Related work • VLSI test vectors • at circuit level, not at (micro)architectural level • Hand-crafted stressmarks • Current practice • Max-power, dI/dt, thermal hotspots, temp differentials • Performance model validation • Microbenchmarks • Benchmark synthesis • Statistical simulation
Conclusion: two contributions • BenchMaker • Abstract workload model • Generates proxies for real-life benchmarks • High accuracy • StressMaker • Automated stressmark generation • Case studies: max-power, max single-cycle power, dI/dt, thermal hotspots, etc.
Future work • Compare StressMaker against hand-crafted stressmarks • Fine-tune abstract workload model • Bit toggling data values and instruction opcodes • Interactions between threads and programs • Multi-threaded and multi-core processors
Thank you. Questions? Automated Microprocessor Stressmark Generation Ajay M. Joshi* Lieven Eeckhout** Lizy K. John* Ciji Isen* *The University of Texas at Austin **Ghent University, Belgium