1 / 28

Word-Size Optimization for Low Energy, Variable Workload Sub-threshold Systems

Word-Size Optimization for Low Energy, Variable Workload Sub-threshold Systems. Sudhanshu Khanna, Anurag Nigam ECE 632 – Fall 2008 University of Virginia <sk4fs, an2z>@virginia.edu. Introduction. Energy constrained Sub-Vt systems Medical devices Environmental sensors

woody
Download Presentation

Word-Size Optimization for Low Energy, Variable Workload Sub-threshold Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Word-Size Optimization for Low Energy, Variable Workload Sub-threshold Systems Sudhanshu Khanna, Anurag Nigam ECE 632 – Fall 2008 University of Virginia<sk4fs, an2z>@virginia.edu

  2. Introduction • Energy constrained Sub-Vt systems • Medical devices • Environmental sensors • Need to lower E in order to enable “lifelong” operation • SMALL “FORM-FACTOR” => Area Reduction • Total E = Active E + Sleep E

  3. Top Level Problems Addressed • Energy Reduction • Active • Sleep Mode • Area Reduction • Adaptation of Super-threshold designs to sub-threshold

  4. Current Approaches Voltage Regulated from THIS off-chip, (expensive) DC-DC converter Ref: K.Craig, R.Matthews, EE632 Fall 2008

  5. Our approach Make the “starting point” design more E-efficient, Specifically for Sleep Mode operation

  6. Can we optimize the Logic system for sub-Vt operation, or should it be the same 1.2V 0.2V Sure way of lowering CV2 : Lower V => Sub-threshold Logic System Logic System

  7. 1.2V 0.2V Logic System Smaller Logic System Sure way of lowering CV2 : Lower V => Sub-threshold Make the system as small as feasible. Use it over and over till the required operation is done. Then goto sleep and leak less !! How do we make the system smaller: USE A SMALLER WORD-SIZE Will using the SMALL system over and over increase the ACTIVE Energy???

  8. Smaller Word-Size: Problems Addressed • For Sure, small word-size means: • Lower Area • Lower Sleep Energy • Higher Delay • We need to find: • How much is the Area/Sleep E benefit ? • Impact of multi-cycle operation on Active E ?? • Can we somehow make them faster without losing the Sleep E and Area advantage ???

  9. Smaller Word-Size: OurContribution • For Sure, small word-size means: • Lower Area • Lower Sleep Energy • Higher Delay • We need to find: • How much is the Area/Sleep E benefit ? • Impact of multi-cycle operation on Active E ?? • Can we somehow make them faster without losing the Sleep E and Area advantage ??? • > 20x area benefit • > 33x sleep energy benefit • Multi-cycle operation increases Active E • But the final value of the Active E is about the same/lesser than that of a 32-bit system. • Yes, delay degradation can be overcome !!! while still being more energy efficient

  10. Systems Compared • Addition of two 32-bit numbers using: • Large word-size (32-bit) • Kogge-Stone Adder • Ripple Carry Adder • Full-Adder • Small word-size (1-bit) • 1-bit taken for simplicity, the trends would be valid for other word-sizes e.g. 16-bit, 8-bit etc. • Addition is taken as a sample digital function. However, trends founds can be generalized to other digital functions as well.

  11. PA CLK OUT 32 Bit 32 Bit Register Reset 32 Bit KSA or RCA 32 Bit 32 Bit Register 32 Bit 32 Bit Register Reset CLK PB PA = Parallel input A PB = Parallel input B OUT = Parallel output from Sum Register CLK 32-bit Kogge-Stone Adder (KSA), 32-bit Ripple Carry Adder (RCA)

  12. n-bit Full Adder 1-bit Full Adder n-Bit Register 1-Bit Register n-Bit Register 1-Bit Register n-Bit Register 1-Bit Register In case n = 1, the system will take 32 clock cycles to add two 32-bit numbers. Hence the higher delay. CLK CLK n = 1 1-bit Serial Adder (SA) Small-Word Size system Let the smaller word-size be n. Then the system will look like this: Just like a 32-bit system, but only smaller! n < 32 In general, an n-bit word system will have n-bit operands

  13. A conceptual fully-serial 1-bit system 1-bit input from other part of chip 1-bit input from other part of chip Analog Input 1-Bit Register Serial Multiplier 1-bit Full Adder Serial ADC 1-Bit Register 1-Bit Register 1-Bit Register CLK Serial DAC 1-Bit Register Simulated 1-bit SA Analog Output CLK

  14. 32-bit Serial Adder (SA)using Full-Adder PA CLK OUT 1 Bit 32 Bit Shift Register 1 Bit Full Adder 1 Bit 32 Bit Shift Register 1 Bit 32 Bit Shift Register Cout CLK PB Cin Carry Flip Flop CLK Regular 32-bit word system, But parallel adder replaced by 1-bit full adder => LOWER SLEEP ENERGY Takes 32 cycles but is amenable for use in a an un-modified 32-bit word system

  15. Important Metric: Energy per operation • Energy drawn for addition of two 32-bit numbers is measured for all the 4 systems: • 32-bit KSA • 32-bit RCA • 32-bit SA • 1-bit SA • Clock and register power taken into account Large word-size systems Small word-size system

  16. HIGH Edyn ~ Etot ~ 6pJ But leakage current is 1.7x lower Active Energy @ VDD = 300mV • Shows that active energy of 1-bit system < 32-bit systems • 40% active energy benefit @ 22nm • 33x reduction in leakage current (note that above plot is only showing active energy)

  17. Thus multi-cycle operation doesn’t increase active energy too much Hence once sleep time is added, benefits of small-word systems will increase Hence once sleep time is added, benefits of small-word systems will increase Hence once sleep time is added, benefits of small-word systems will increase Hence once sleep time is added, benefits of small-word systems will increase => if word-size limited to 32, serial addition will save energy if the application has lot of sleep time e.g. in sensor nodes Conclusions @ 300mV • 1-bit SA has 40% lower active E than the best 32-bit system • 1-bit SA has 33x lesser leakage current than the best 32-bit system • 32-bit SA has 1.7x lesser leakage current than 32-bit KSA => if word-size limited to 32, serial addition will save energy if the application has lot of sleep time e.g. in sensor nodes !!!

  18. 0.4V 1.2V 0.2V 0.2V VDD incs => delay decs • Can be used to make small-word size systems faster !!! • But, impact of the VDD increase on Energy ??? Already compared Logic System Logic System Logic System small word Logic System small word

  19. Logic System small word Logic System 0.2V 0.4V Energy @ constant delay • Delay is equal • Now we compare energy at constant delay • Small word-size more energy efficient even after the VDD increase • But the margins of energy benefits do go down • The same is not true in super-Vt ! WHY??? • Difference in On-Current Equation in super-Vt and sub-Vt

  20. Sub-Vt Super-Vt LARGE SLOPE SMALL SLOPE VDD change => no impact on E !! LARGE SLOPE SMALL SLOPE

  21. Pareto-Optimal E-D Curve Super-Vt Sub-Vt Cross-over: 1-bit system becoming optimal Super-Vt -> 32-bit system is pareto-optimal Sub-Vt -> 1-bit system is pareto-optimal

  22. Generality of Trends • 1-bit system is used as an example. Energy and area benefits will be achieved in any small word-size system. • Shift in pareto-optimal curve happens because of difference in Ion equation. • Hence this behavior can be observed in other parts of a digital system as well, and not just addition. Opens energy saving opportunities in more areas of digital design

  23. 0.2V 0.4V Conclusions @ constant delay • While going into sub-Vt operation, re-look the word-size of the system being used. • Optimal word-size goes down: Small word size gives lower E and Area and matches delay Energy less Leakage less Area ($$$) less Delay Same Logic System small word Logic System

  24. Different Word-Size Systems • 1-bit ( Digital Audio System – Sharp) • 4-bit ( Marc4 Micro controller, Intel 4040) • 8-bit ( Micro controllers, Intel 8080 processor) • 16-bit ( Intel 8086 processor) • 64-bit ( Athlon 64, Opteron processor)

  25. FIR Filter • Used in many real time DSP systems ( audio, video processing) 4-Tap FIR Filter K(i): Filter Coefficients • Serial Implementation of a Parallel FIR filter

  26. X(n) X(n-1) X(n-2) X(n-3) Delay Delay Delay K0 K1 K2 K3 Multiplier Multiplier Multiplier Multiplier 4-input Parallel Adder Y(n) K0 , K1 ,K2 ,K3 : Filter Coefficients Stored in memory

  27. X(n): serial input data Serial Parallel Multiplier Filter Coefficients (K3, K2, K1, K0) Serial output From memory 1-bit Serial Adder Register Y(n)

  28. QUESTIONS

More Related