360 likes | 382 Views
Explore the innovative Razor technology for dynamic voltage scaling at the circuit level, minimizing energy consumption while maintaining performance. Learn how timing speculation enhances efficiency and reduces power overhead. Discover the benefits of voltage/frequency adaptation and in-situ error detection and correction for optimal processor operation.
E N D
Razor: Dynamic Voltage Scaling Based on Circuit-Level Timing Speculation Advanced Computer Architecture Laboratory The University of Michigan Dan Ernst, Nam Sung Kim, Shidhartha Das, Sanjay Pant, Rajeev Rao, Toan Pham, and Conrad Ziesler Faculty Members: David Blaauw, Todd Austin, and Trevor Mudge Krisztián Flautner, ARM Ltd. December 3rd, 2003
Intra-die variations in ILD thickness Dynamic Voltage Scaling and Design Uncertainty • DVS - Adapting voltage/frequency to meet performance demands of workload • Lower processor voltage during periods of low utilization • Lower Voltage is a Good Thing™ for power • Minimum voltage is limited by Safety Margins • Error-free operation must be guaranteed! • Technology trends are Maximizing the Minimums • Process and temperature variation • Capacitive and inductive noise • Key Observation: worst-case conditions also highly improbable • Significant gain for circuits optimized for common case • Efficient mechanisms needed to tolerate infrequent worst-case scenarios
Traditional DVS Zero margin Sub-critical Shaving Voltage Margins with Razor • Goal: reduce voltage margins with in-situ error detection and correction for delay failures • Proposed Approach: • Remove safety margins and tolerate occasional errors • Tune processor voltage based on error rate • Purposely run below critical voltage • Data-dependent latency margins • Trade-off: voltage power savings vs. overhead of correction • Analogous to wireless power modulation
Main FF Shadow Latch Main FF Razor Timing Error Detection • Second sample of logic value used to validate earlier sample • Key design issues: • Maintaining pipeline forward progress - Meta-stable results in main flip-flop • Short path impact on shadow-latch - Recovering pipeline state after errors • Power overhead of error detection and correction 5 9 3 9 MEM 4 9 clk clk clk_del
Main FF Shadow Latch Main FF Hold Constraint (~1/2 cycle) Razor Short Path Constraint • Second sample of logic value used to validate earlier sample • Key design issues: • Maintaining pipeline forward progress - Meta-stable results in main flip-flop • Short path impact on shadow-latch - Recovering pipeline state after errors • Power overhead of error detection and correction 3 5 9 9 8 MEM 2 4 8 clk clk clk_del
Razor FF Razor FF PC Centralized Razor Pipeline Error Recovery Cycle: 2 0 3 6 5 1 4 inst2 inst1 inst6 inst4 inst5 inst3 IF ID EX MEM WB (reg/mem) Razor FF Razor FF error error error error recover recover recover recover clock • Once cycle penalty for timing failure • Global synchronization may be difficult for fast, complex designs
Stabilizer FF Razor FF Razor FF Razor FF Razor FF PC Distributed Razor Pipeline Error Recovery Cycle: 3 2 5 1 0 7 8 9 6 4 inst3 inst4 inst7 inst1 inst8 inst3 inst4 inst2 inst5 inst6 inst2 IF ID EX MEM (read-only) WB (reg/mem) error bubble error bubble error bubble error bubble recover recover recover recover Flush Control flushID flushID flushID flushID • Multiple cycle penalty for timing failure • Scalable design since all recovery communication is local • Builds on existing branch / data speculation recovery framework
35% energy savings with 1.3% error 22% saving once every 20 seconds! Error Rate Studies – Empirical Results
Error Rate Studies – SPICE-Level Simulations • Based on a SPICE-level simulations of a Kogge-Stone adder 200 mV
3 mm I-Cache Register File WB 3.3 mm IF ID EX MEM D-Cache Razor I - Prototype Razor Implementation • 4 stage 64-bit Alpha pipeline: • 200MHz expected operation in 0.18mmtechnology, 1.8V, ~500mW • Tunable via software from50-200MHz, 1.1-1.8V • Razor applied to combinational logic • Razor overhead: • Total of 192 Razor flip-flops out of 2408 total (9%) • Error-free power overhead: ~ 3%
Pipeline Throughput Energy IPC Total Energy, Etotal = Eproc + Erecovery Optimal Etotal Energy of Processor Operations, Eproc Energy of Pipeline Recovery, Erecovery Energy of Processor w/o Razor Support Decreasing Supply Voltage Effects of Razor DVS
EX-Stage Analysis – Optimal Voltage Sweep Recovery cost includes energy to recover entire pipeline (18x an add)
clk D1 Q1 0 Main Flip-Flop PC Razor FF Stabilizer FF Razor FF Razor FF Razor FF 1 Error_L Shadow Latch comparator IF ID EX MEM (read-only) WB (reg/mem) Error RAZOR FF clk_del bubble bubble bubble error error bubble error error recover recover recover recover Flush Control flushID flushID flushID flushID Conclusions • In-situ detection/correction of timing errors • Eliminate process, temperature, and safety margins • Tune processor voltage based on error rate • Purposely run below critical voltage to capture data-dependent latency margins • Implemented with architecture/circuit support • Double-sampling metastability-tolerantRazor flip-flops validate logic results • Pipeline initiates recovery after circuit timing errors, no voltage/clock re-tuning needed • Trade-off: supply voltage power savingsvs. overhead of correction • Running with error is good!
Future Directions • Research opportunities • Razor for caches/memory and control logic • Voltage control algorithms, especially per-stage tuning • Typical-case energy optimized designs (instead of worse-case latency optimized) • Turnkey application of Razor technology • Prototype design, fabrication, evaluation • Razor I – Q4 2003 – Razor-ized combinational logic, global tuning • Razor II – Q3 2004 – Razor-ized caches and control logic, per-stage tuning • Other applications • Single-event upset (SEU) protection using Razor error detection/re-execution • Over-clocking for performance improvement (large gains among hobbyists)
Questions ? ? ? ? ? ? ? ? ? ? ? ?
Mem C ontrol Data cache I O U N I T Floating point and graphics Ex Unit Control Unit Cache control control L2 tags L2 Cache L2 Cache Other Approaches to Dynamic Voltage Scaling • Traditional DVS • Valid voltage / delay combinations “blessed” at design time • Approach leaves a significant amount of energy “on the table” • Temperature, process, data, and safety margins placed on voltage • Other approaches miss some margins • Slack detector – automatic tuning • ARM’s Intelligent Energy Manager (IEM) • Processor voltage automatically tuned toexternal ambient conditions • Inverter chain designed to track mostrestrictive critical path, margin still required
Logic Stage L1 Logic Stage L2 0 1 Error_L Shadow Latch clk_del Razor Flip-Flop Implementation • Compare latched data with shadow-latch on delayed clock • Upon failure: place data from shadow-latch in main latch • Ensure shadow latch always correct using conservative design techniques • Correct value in shadow latch guarantees forward progress • Recover pipeline using microarchitectural recovery mechanism clk Q D Main Flip-Flop comparator Error RAZOR FF
clk_b clk D Q clk_b clk Meta-stability detector Inv_n Error_L Inv_p clk_del_b clk_del Shadow Latch Razor Flip-Flop Circuit Error_L
clock intended path short path Min. Path Delay > tdelay + thold clock_del tdelay thold Min. path delay Overcoming Short Path Constraints • Delayed clock imposes a short-path constraint • Razor necessary only for latches on slow paths • Pad fast path for latches with mixed path delays • Trade-off between DVS headroom and short path constraints ff Pad with extra delay Razor_ff Long Paths Short Paths clock
X X X Hardware Measurement Setup Slow Pipeline A 36 18 18x18 48-bit LFSR != 40-bit Error Counter clk/2 clk/2 Slow Pipeline B 36 clk/2 18x18 48-bit LFSR clk/2 clk/2 18 Fast Pipeline 36 stabilize 18x18 clk clk clk
Simulation Methodology • Challenge: instruction latency depends on circuit evaluation latency • May vary with changes in stage inputs, stage logic, voltage, temperature… • Dynamic timing simulation combines architectural/circuit simulation • Initial implementation utilized a hand-generated EX-stage circuit model • Effort ongoing to automate extraction/decomposition/integration into SimpleScalar
reset Ediff = Eref - Esample Pipeline Voltage Control Function Voltage Regulator Esample Vdd Ediff . . . error signals Eref - Supply Voltage Control System • Current design utilizes a very simple proportional control function • Control algorithm implemented in software
Error Pipeline Recovery IF ID EX MEM MEM WB inst inst inst inst inst inst clk clk_d ID.d EX.d Redo instruction in MEM MEM.d No Error error
Utilization Time Voltage Scaling under Dynamic Workloads • Adapt frequency/voltage to performance demands of workload • Software controlled processor speed • Lower processor voltage during periods of low operating frequency Vdd Freq Voltage • Quadratic reduction in dynamic power and energy • Super-quadratic reduction in leakage
High-level HDL Specification WB IF ID EX MEM PC FF FF FF FF Circuit Extractionwith Parasitics Variable Voltage SDF generation Architecture Specification Power/Delay C-model SimpleScalar + DTA Voltage Control Algorithm Detailed Power/Delay Analysis Simulation Flow • Automatic creation of very detailed power/delay C-models
Simulation Methodology • Dynamic timing simulation combines architectural/circuit simulation • Contrast to static timing simulation which is only concerned with critical path • SimpleScalar/Alpha architectural-level simulation • Gate-level simulation of per-stage logic blocks • Logic block model describes cells, local and global interconnect • Cells characterized with SPICE at varied slew/cap-load/voltage • Each cycle, circuit simulator evaluates delay of each stages’ logic block\ 0 1 0 1 1 0 1 0 1 1 1
pos neg pos error fail Dynamic Or / Latch restore restore bubble bubble flush flush neg More Details on Meta-Stability • Sub-critical operation invites meta-stability • Meta-stability detector itself can become meta-stable • double latch error signal to obtain sufficient small probability clk_b clk D Q clk_b clk restore clk_del_b clk_del • Flush entire pipe • No forward progress • Reduce frequency
I1 I2 I1 I2 Short Path Short Path Failure IF ID EX MEM WB inst1 inst2 inst2 inst1 inst1 clk clk_d ID.d EX.d MEM.d error