1 / 19

On-Line Adjustable Buffering for Runtime Power Reduction

On-Line Adjustable Buffering for Runtime Power Reduction. Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown University. Outline. Introduction Adjustable Buffering Methodology Experiments & Results Conclusions. Power: First-Class Objective.

sean-madden
Download Presentation

On-Line Adjustable Buffering for Runtime Power Reduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. KahngΨ Sherief Reda† Puneet SharmaΨ ΨUniversity of California, San Diego †Brown University

  2. Outline • Introduction • Adjustable Buffering Methodology • Experiments & Results • Conclusions

  3. Power: First-Class Objective • Power bottleneck to Moore’s law • Power-frequency tradeoff exists in CMOS circuits • Much higher power required to operate at high frequency • Techniques to exploit power-frequency tradeoff are of interest • Allow high freq. operation • Can give significant power reduction when max. performance not required • Mainstream approach: Dynamic voltage and frequency scaling (DVFS) Power-frequency tradeoff with VDD scaling

  4. Ideal frequency-power from VDD scaling Actual frequency-power from DVFS Dynamic VDD & Freq. Scaling • Scale down VDD and freq. when high performance not needed • Limitations of DVFS • VDD cannot be scaled down indefinitely • Range of VDD scaling is small and diminishing • Extremely high power at high VDD reduce max. VDD • High Vth to reduce leakage, noise margins, variability, soft errors  increase min. VDD • Discrete allowed voltages • Our objective: enable additional modes to exploit frequency-power tradeoff • Useable when VDD cannot be scaled further • Useable without DVFS

  5. Select Transform 32X 16X / 32X Proposal: Adjustable Buffering • Our approach, like DVFS, provides runtime-selectable low-power modes  supplement or replace DVFS • Key idea: Lot of logic added for performance, not functionality  Turn this logic off when high-performance not needed • Poor interconnect scaling  large number of repeaters • 20-30% of cells are repeaters • Fat repeaters are used to improve delay but consume a lot of power • We modify repeaters to dynamically adjust their driving capacity

  6. Outline • Introduction • Adjustable Buffering Methodology • Experiments & Results • Conclusions

  7. Traditional Inverter (INVX8) Adjustable Inverter Adjustable Repeater Design We add PMOS-NMOS pair to turn half the devices off dynamically Control Gate Control Gate “LPM” = ON only half devices operational (low-power mode). “LPM” = OFF all devices operational (high-performance mode). • What power components are likely to reduce in low-power mode? • Short-circuit power: during switching, PMOS & NMOS ON momentarily  short circuit between VDD and VSS • High when transition time (slew) is large • Subthreshold leakage: when one of PMOS-NMOS pair between VDD and VSSON

  8. Adjustable Repeater Requirements • Low area overhead • Added PMOS-NMOS pair (LPM devices) takes area • LPM signal to be routed or locally generated • Layout of the new cell must be simple and low area overhead • High performance in high-performance mode • On-resistance of LPM devices may reduce performance • Good power reduction in low-power mode

  9. Control Gate Control Gate Area Overhead • Problem: High performance needed when LPM signal OFF  use large control gates  large area overhead Delay overhead: increase in delay of adjustable repeater over traditional repeater Solution:Share control gates among multiple repeaters

  10. V’DD V’SS LPM devices shared by two inverters Control Gate Sharing Fewer control gates but virtual VDD (V’DD) and VSS (V’SS) need routing • How many control gates needed? • Compute simultaneous switching rate (SSR) by finding the max. #repeaters that have overlapping timing windows. Time = O(RlogR) (R = #repeaters) • Find total width of all repeater devices controlled by CGs (=WR) • For good performance, width of control gates = 4 x SSR x WR Typical SSR=~10%  small area overhead

  11. Ensuring High Performance • Problem: Adjustable repeaters ~5% slower when LPM signal OFF  Up to ~5% reduction in circuit performance • Solution:do not use adjustable repeaters on timing-critical paths • Additional constraint: slew constraints not violated when LPM signal is OFF or ON. • We characterize adjustable repeaters (i.e., find delay, slew, power, input capacitance) and then substitute traditional repeaters with adjustable repeaters subject to delay and slew constraints.  No loss in circuit performance & no slew violations

  12. Power Reduction in Low-Power Mode Short-circuit energy and leakage reduce OFF OFF Traditional Inverter Adjustable Inverter Reduction in short-circuit energy and leakage for INVX8

  13. Outline • Introduction • Adjustable Buffering Methodology • Experiments & Results • Conclusions

  14. Experimental Validation • Circuits: s38417 (8,890 cells), AES (15,272), OpenRisc (46,732) • Tools: Synopsys HSPICE (SPICE), Design Compiler (synthesis, timing and power analysis); Cadence SoC Encounter (P&R), SignalStorm (library characterization); Artisan TSMC 90nm library models • Other settings: power and timing analysis at slow corner, VDD of 1.1V and 0.9V, activity factor of 0.01.

  15. Results: Power Reduction • We perform comparative analysis of: • Circuit with DVFS • Circuit with DVFS + LPM VDD=0.9 LPM=0 VDD=1.1 LPM=0 VDD=0.9 LPM=1 VDD=1.1 LPM=1 • Both dynamic and leakage power reduce • 6-12% reduction in total power at low-power mode

  16. Routing overhead • LPM, LPM routed to control gates • routing overhead depends on locations of control gates • # control gates small  overhead small • V’DD, V’SS routed to all repeaters • For overhead estimation, nets assumed to be Steiner trees Results: Area Overhead • Logic area overhead due to control gates • Depends on SSR • Smaller if control gates can be placed in whitespace

  17. Outline • Introduction • Adjustable Buffering Methodology • Experiments & Results • Conclusions

  18. Conclusions • Presented a novel technique that dynamically trades off power and performance by turning off devices not needed at less than max. performance • Both leakage and dynamic power reduce; total power reduction is 6-12% on our testcases • By sharing of control gates, area overhead reduced to <5.57% • No adverse affect on performance of the circuit when LPM signal OFF • Future work: • Actual layout of adjustable repeaters with routing of V’DD, V’SS, LPM nets to accurately estimate power, performance, area impacts • Customization of more cells especially clock repeaters to further improve power-performance tradeoff

  19. Thank You • Questions?

More Related