1 / 39

Slack Redistribution for Graceful Degradation Under Voltage Overscaling

Slack Redistribution for Graceful Degradation Under Voltage Overscaling. Andrew B. Kahng † , Seokhyeong Kang † , Rakesh Kumar ‡ and John Sartori ‡ † VLSI CAD LABORATORY, UCSD ‡ PASSAT GROUP, UIUC. Outline. Background and motivation Voltage scaling and BTWC designs

fionn
Download Presentation

Slack Redistribution for Graceful Degradation Under Voltage Overscaling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Slack Redistribution for Graceful Degradation Under Voltage Overscaling Andrew B. Kahng†, Seokhyeong Kang†, Rakesh Kumar‡ and John Sartori‡ †VLSI CAD LABORATORY, UCSD ‡PASSAT GROUP, UIUC

  2. Outline • Background and motivation • Voltage scaling and BTWC designs • Limitation of Traditional CAD Flow • Power-Aware Slack Redistribution • Our design optimization goal • Related work: BlueShift • Our Heuristic • Experimental Framework and Results • Design methodology • Testbed • Results and analysis • Conclusions and Ongoing Work

  3. Reducing Power with Voltage Scaling Power is a first-order design constraint Voltage scaling can significantly reduce power Voltage scaling may result in timing violations Timing errors begin to occur Voltage • Voltage scaling is limited because of timing errors

  4. Better-Than-Worst-Case Design • Better-Than-worst-case (BTWC) design approach • Optimize for normal operating conditions • Trade off reliability and power/performance • Have error detection/correction mechanism (e.g., Razor*) Traditional IC design BTWC design CPU, Heal Thyself... • Does not allow timing errors in STA • Error correction architecture allows timing errors • Fixed target frequency and operating voltage • Overclocking or voltage overscaling * Ernst et al. “Razor: A low power pipeline based on circuit-level timing speculation”, Proc. MICRO 2003. • BTWC design allows tradeoffs between reliability and power

  5. Voltage Scaling with Error Correction • Error correction incurs power overhead Minimum power at point b A B B Voltage v Voltage v A PE(v) : Error rate at voltage v pwr(v) : Power consumption at v • Overscaling is possible for Better-Than-Worst-Case designs

  6. Limitations of Traditional CAD Flow • Conventional designs exhibit critical operating points • Many paths have near-critical slack → wall of (critical) slack • Scaling beyond COP causes massive errors that cannot be corrected • Conventional designs fail critically when voltage is scaled down • Error rate should be increased gracefully : • gradual slope slack

  7. Outline • Background and motivation • Voltage scaling and BTWC designs • Limitation of Traditional CAD Flow • Power-Aware Slack Redistribution • Our design optimization goal • Related work: BlueShift • Our Heuristic • Experimental Framework and Results • Design methodology • Testbed • Results and analysis • Conclusions and Ongoing Work

  8. Our Design Optimization Goal • Problem: Minimize power for a given error rate • Goal: Achieve a ‘gradual slope’ slack distribution • Approach: • Frequently-exercised paths: upsize cells • Rarely-exercised paths: downsize cells with gradual failure characteristic • We make a gradual slope slack distribution

  9. Related Work: BlueShift • BlueShift speed up • Paths with the highest frequency of timing errors • FBB (forward body-biasing) & Timing override • BlueShift* : maximize frequency for a given error rate ER < Target NO Compute error rate Speed up paths Gate-level simulation YES Finish • Limitation • Repetitive gate level simulation – impractical • Design overhead of FBB * Grescamp et al. “Blueshift: Designing processors for timing speculation from the ground up”, HPCA 2009 • BlueShift is impractical with modern SOC designs

  10. Our Heuristic • Optimize slack distribution by cell swaps, exploiting switching activity information • Iteratively scale target voltagethe until error rate exceeds a target, and optimize negative slack paths ER < ERtarget Set initial voltage Error rate estimation Power Reduction Optimize Paths NO • Our heuristic: • Voltage scaling → Optimize paths → Power reduction YES Voltage scaling Finish

  11. Heuristic Implementation – Voltage Scaling • Optimize with fixed target voltage ER < ERtarget Set initial voltage Error rate estimation Power Reduction Optimize Paths • Lower voltage incrementally • Load a pre-characterized library at each voltage point • With iterative voltage scaling, we can find minimum operating voltage NO Voltage scaling YES Finish

  12. Heuristic Implementation – Optimize Paths • Main idea: increase slack of frequently-exercised paths in order of increasing switching activity • Procedure • Pick a critical path p with maximum switching activity • Resize cell instance ci in p • If slack of path p is not improved, cell change is restored • Repeat 2. ~ 3. for all cell instances in path p • Repeat 2.~ 4. for all critical paths ER < ERtarget Set initial voltage Error rate estimation Power Reduction Optimize Paths NO YES Voltage scaling • OptimizePaths procedure reduces error rates and enables further voltage scaling Finish

  13. Heuristic Implementation – Power Reduction • Main idea: Downsize cells on rarely-exercised paths in order of decreasing toggle rate • Procedure • Pick a cell c with minimum toggle rate • Downsize cell c with logically equivalent cell • Incremental timing analysis and check error rate • If error rate is increased, cell change is restored • Repeat 1. ~ 4. ER < ERtarget Power Reduction Set initial voltage Error rate estimation Optimize Paths NO YES Voltage scaling • PowerReduction procedure reduces power without affecting error rate Finish

  14. Heuristic Implementation – Error Rate Estimation • Error rate contribution of one flip-flop • Error rate estimation: use toggle rate from SAIF(Switching Activity Interchange Format) • Error rate of an entire design • α : compensation parameter Error rate estimation ER < ERtarget Set initial voltage Power Reduction Optimize Paths NO • We estimate error rates without functional simulation YES Voltage scaling Finish

  15. Power Reduction Through Slack Redistribution • Power consumption @BTWC • Minimum power Pmin is obtained at minimum operating voltage Vmin • OptimizePaths • Minimize error rate • Enable to scale voltage further 1 2 • ReducePower • Downsize cells • Obtain additional power reduction

  16. Outline • Background and motivation • Voltage scaling and BTWC designs • Limitation of Traditional CAD Flow • Power-Aware Slack Redistribution • Our design optimization goal • Related work: BlueShift • Our Heuristic • Experimental Framework and Results • Design methodology • Testbed • Results and analysis • Conclusions and Ongoing Work

  17. Design Methodology • Benchmark generation • Virtutech Simics – Full system simulation and capture test vectors • Library characterization • Cadence SignalStorm – Synopsys Liberty generation for each voltage • Functional simulation • Cadence NC Verilog –Gate level simulation • ECO P&R • Cadence SOCEncounter – ECO implementation • Heuristic (Slack Optimization) • Implement in C++ and use Tcl socket interface with Synopsys PrimeTime

  18. Testbed • Target design : sub-modules of OpenSPARC T1 • Benchmark • Ammp, bzip2, equake, sort and twolf • Make test vectors with 1 billion cycles for each sub-module • Implementation • TSMC 65GP technology with standard SP&R flow

  19. List of Experiments • Design techniques • SP&R with 0.8 GHz (loose constraints) • SP&R with 1.2 GHz (tight constraints) • Blueshift: timing override • Slack Optimizer • Experiments compare all design techniques with respect to: • Power consumption at each voltage point • Actual error rates from gate level simulation • Power consumption at each target error rate • Estimated processor-wide power consumption

  20. Error Rate and Power Results • Error rate at each operating voltage (test case : lsu_dctl) • Power consumption at each operating voltage

  21. Comparison of Power and Slack Results • Power consumption at each target error rate • Slack distribution

  22. Power Reduction and Area Overhead • Power reduction after optimization (@ 2% error rate) • Area overhead of design approaches Max. 32.8 %, Avg. 12.5% power reduction

  23. Processor-wide Results * *Kahng et al. “Designing a Processor From the Ground Up to Allow Voltage/Reliability Tradeoffs”, HPCA 2010. • Slack optimization extends range of voltage scaling and reduces Razor recovery cost

  24. Conclusions and Ongoing Work • Showed limitations of a BTWC design • Presented design technique – slack redistribution • Optimize frequently exercised critical paths • De-optimize rarely-exercised paths • Demonstrated significant power benefits of gradual slack design • Reduced power 33% on maximum , 12.5% on average • Ongoing work • Reliability-power tradeoffs for embedded memory • Applying to heterogeneous multi-core architecture

  25. THANK YOU

  26. BACKUP

  27. CPU, Heal Thyself • Razor* system • Timing errors can be corrected • Manage the trade-off between system voltage and error rate • New design methodology is needed * Razor: A low power pipeline based on circuit-level timing speculation. In International Symposium on Micro architecture, December 2003.

  28. Razor – How it works • Razor Implementation • Razor: A low power pipeline based on circuit-level timing speculation. In International Symposium on Microarchitecture, December 2003. • Main flip-flop latches at T, but Shadow latch latches at T+skew • If a timing violation occurs, main flip-flop will latch incorrect value, but shadow latch should latch correct value • Comparator signals error and the late arriving value is fed back into the main flip-flop

  29. BTWC: Voltage Scaling • Error correction needs additional clock cycles and incurs power overhead • Overclocking case Maximum performance at point c • Voltage scaling case Minimum power at point c PE(f) : Error rate at frequency f perf(f) : Performance at f PE(v) : Error rate at voltage v pwr(v) : Power consumption at v

  30. Limitation of Voltage Scaling • At some voltage, circuit breaks down Voltage scaling must halt after only 10% scaling.

  31. Reason for Steep Error Degradation • Critical paths are bunched up in traditional designs.

  32. Slack Re-distribution Example Negative Slack Positive Slack Error Rate = 1% Error Rate = 25% Negative Slack Positive Slack 0.0 -0.1

  33. Heuristic Implementation – Error Rate Estimation • Error rate contribution of one flip-flop • Error rate of an entire design • Actual vs. estimated error rates (1) • α : compensation parameter (2)

  34. Gradual Slack Distribution Slack optimization achieves gradual slack distribution.

  35. Processor Error Rate and Power Designs with comparable error rates have much higher power/area overheads.

  36. Reliability/Power Tradeoff Slack-optimized design enjoys continued power reduction as error rate increases.

  37. Enhancing Razor-based Design Slack optimization extends range of voltage scaling and reduces Razor recovery cost.

  38. Moore’s Law • Power consumption of processor node doubles every 18 months.

  39. Power Scaling • With current design techniques, processor power soon on par with nuclear power plant

More Related