530 likes | 621 Views
Circuits and Architectures to Deliver Low Power and High Speed Systems. By: Jabulani Nyathi Washington State University School of EECS April 30, 2009. Outline. CMOS Scaling Its benefits and The challenges it brings about Various Techniques for Limiting Leakage Currents Their shortfalls
E N D
Circuits and Architectures to Deliver Low Power and High Speed Systems. By: Jabulani NyathiWashington State UniversitySchool of EECSApril 30, 2009
Outline CMOS Scaling Its benefits and The challenges it brings about Various Techniques for Limiting Leakage Currents Their shortfalls Bridging the speed-Power Gap The Tunable Body Biasing Scheme Emerging Devices and Technologies Concluding Remarks
CMOS Scaling and its Benefits Aggressive CMOS scaling has been a very positive development allowing: Fast switching devices, thus high speed computing. Massive integration due to miniaturization No longer do we need multiple chips to implement a microprocessor and its peripherals In fact, we can now have multiple computing elements on a single die resulting in system on a chip.
CMOS Scaling and its Challenges CMOS scaling results in: increased leakage currents (5X/node) and Increased dynamic power dissipation. The interconnect does not scale as fast as the transistor thus Highly integrated designs require elaborate clock distribution schemes. IPs within a System on a Chip would be difficult to synchronize with a single clock source.
Scaling Implications Local Interconnects Module1 Scaled Global Interconnects Global Interconnects Module2
Research Motivation • Desire to Bridge the Speed-Power Gap by Exploring the feasibility of optimizing devices to operate effectively in both sub-threshold and above threshold voltages. • Emerging Technologies that are Ultra-Low power can benefit from increased speed. • Wearable computers, sensor networks, implantable medical technology • Emphasis on design for energy-efficiency
Existing Low Power Design Approaches • Solve energy dissipation problem from a region of operation standpoint • Sub-threshold design • DTMOS: shows a 5.5 times increase in current • Dynamic threshold provides energy efficiency • SBB: 4.4 times frequency increase • Above threshold (Super-threshold) design • MTCMOS: high and low threshold devices • VT Scheme: reduce power by 50% using ABB and “sleep”/“active” modes • Architectural • Gating Techniques: 45% of total power
SBB, DTMOS, TBB 1.8 V 600 mV Traditional DTMOS/SBB Output Voltage Clamping
Proposed Approach • Change approach to include all possible operating regions: Tunable Body Biasing (TBB) • Sub-threshold and super-threshold operation bridged • Ultra-low energy and low speed or high energy and high speed • Utilize body biasing to improve performance of sub-threshold operation • Target increased performance at sub-threshold and slightly above threshold. • Save energy by eliminating idle time and process continuously with variable power supplies (perform just in time task completion) • Target applications • Mobile, battery operated (power constrained), variable processing devices • Cell phones, PDAs, notebooks, wireless sensors, embedded systems, ASICs, medical technology, etc.
TBB Implementation • Goals • Attain ON state current gain while minimizing OFF state leakage current increase • Highlight advantages of sub-threshold operation while allowing super-threshold operation if needed • Control bulk terminal to tunable potentials depending on VDD and desired region of operation • MOS Bulk Control Circuits • Multiplexer-based approach • Two transistors per bulk control circuit • Utilizes Vthn0
TBB Bulk Control Circuits • Relies on passing of good/poor logic “1” and logic “0” properties of pass-transistors • Requires external control signals • SubVt and SubVt_b
TBB Bulk Control Circuit Simulation Super-threshold: pBulk = VDD – Vthn0 Sub-threshold: pBulk = 0 V
Device Optimization • TBB encourages varying supply voltages • How will devices be sized for optimal operation at any supply voltage? • Maintain symmetric switching • Examine inverter at varying supply voltages
Sub-threshold Noise Margins • Noise Margins significant for proper logic levels • TBB and Traditional static CMOS inverter have comparable noise margins • TBB VIH is 12.5% worse • TBB VIL is 14.3% better
Propagation Delay Gate Traditional Delay TBB Delay % Decrease TG 98 ns 14 ns 86 Inv 125 ns 20 ns 84 NAND 133 ns 18 ns 86 NOR 163 ns 25 ns 85 XOR 289 ns 40 ns 89
Review of SubVth Circuits Benefits • So far, the presentation has shown: • TBB requires control of MOS bulks to span the operating regions of interest. Implementation is successful. • Study of simple logic gates showed: • TBB gives a dramatic speed increase (up to 7x) • Static CMOS design style is suitable for sub-threshold and super-threshold operation • Sizing of efficient devices for the TBB approach is possible • However, how will a complex system perform? • Design with previous knowledge (logic style, sizing) • Analyze post-layout simulations
Complex System-on-Chip Design Using TBB Work addresses the challenges of Global Interconnect Delays Clock distribution Synchronization of unrelated clocks and Power dissipation
Conclusion • TBB scheme has been devised to span all regions of operation from ultra-low power to high-speed. New kind of body biasing • Forward-biasing causes exponential sub-threshold current gain • Leads to 7 times frequency increase in simple logic gates • Focus on sub-threshold and slightly above threshold to utilize leakage • Bulk control circuits are effective • 4% area and 8.9% power dissipation increase • Static CMOS is ideal overall design style • Device sizing at either sub-threshold or super-threshold allows efficient operation with variable supply voltages
Concluding Remarks • Allowing tunable operation allows the designer to choose operating point (kHz, MHz, GHz) – Energy Dissipation is affected. • Other schemes do not offer this flexibility • TBB can lead to significant energy savings • LFSR results show TBB gives: • Maximal 5.7 times speed increase (sub-threshold) • Comparable energy at super-threshold and favorable at sub-threshold • Favorable EDP at all operating regions • Operate at the same speed with less energy dissipation • Idle state leakage current can be minimized by collapsing the supply voltage
Integrating Research Into Instruction • Data Path Circuits • Memory Design • Sub-System • ROUTER CHIP
Incorporating Research into Instruction • A long term objective is to place some of the integrated chips on development boards such as those Digilent Inc produces. • The integrated chips become part of a system and can be used in some of our low level courses. • Most important is the use of these programmable boards to show case the research outcomes, particularly to visiting prospective students. • A sample development board:
Reducing Interconnect Delays • Improved latency and bandwidth • Global interconnects are pipelined at or near the rate of computation
Sources of Power Consumption Most straight forward method to reduce power consumption from any source is to reduce VDD Controlling frequency directly manipulates dynamic power Controlling device threshold manipulates leakage current, affecting leakage and short circuit power.
Traditional vs. Tunable Body Biasing The synchronizer/buffer shows an increase in performance at sub-threshold voltages when using tunable body biasing
Pursuit of Low Power Operation It is likely that not all IP blocks in a SoC need to operate at high speed Power dissipation for those IP blocks could be reduced by operating at a lower voltage TBB offers the possibility to dynamically operate at either sub-threshold or super-threshold voltages
Variable Voltage SoC Consider a SoC with 50 IP blocks, each requiring communication at a rate of 10 MHz Each IP could operate at sub-threshold levels The channel could operate at super-threshold voltages while the IP blocks are in sub-threshold Vdd1 Vdd4 Vdd5 Vdd2 Vdd3
Idle vs Operating Power During idle periods, it is advantageous to reduce leakage current by Reducing the power supply voltage or Increasing the threshold voltage (e.g. bulk voltage manipulation)
Speed at Varying VDD TBB 5.7x Faster At 376.2 mV TBB 20% Faster At 1.8 V
Energy-delay Product EDP of TBB outperforms Traditional at ALL operating regions, significantly in super-threshold
1.1 GHz with 3.85 nJ/cycle 3.9 MHz with 0.6 fJ/cycle 222.2 MHz with 103 fJ/cycle Regions of Operation
Contributions of this work Proposed scheme alleviates the communication bottleneck and offers a way to synchronize SoC multiple clocks Perform data transfers up to 10 GHz Proposed scheme maintains high performance under the influence of any clock skew 6.5 GHz for any process corner and any skew Low power FIFO scheme with a small impact on area when used in SoCs with many modules
Contributions of this work Process corners have a minor impact on performance, resulting in a 10% reduction of speed The optimal voltage for minimum energy consumption per transaction is at 2Vth Introduction of TBB to address leakage and dynamic power dissipation 500% increase in performance at sub-threshold voltages with a modest 80% increase in power 5-10% less power dissipation than traditional body biasing
Summary of Proposed FIFO Scheme Linear FIFO scheme that addresses Signal propagation across communication channel Sustained throughput over long distances Successful Synchronization Synchronizes equal, rational & arbitrary clocks 6.5 GHz sustained performance after process corner analysis using 3 stages. Compared to CN scheme Fewer devices per stage, fewer stages needed 25% higher performance, 12% lower power Operates at both super- and sub-threshold voltages Lower instantaneous power demands from local clocks (less di/dt) Optimal energy per transaction at 0.7V in a 65nm process Sub-threshold reduces power by 3 orders of magnitude Tunable Body Biasing provides 50% increased performance in sub-threshold while maintaining super-threshold operation
At 90 nm, the % difference is much less At 180 nm, TBB sub-threshold static power % is large Total TBB sub-threshold power is large Total TBB sub-threshold power isn’t so large TBB Scalability