280 likes | 427 Views
Low Power Architecture and Implementation of Multicore Design. Khushboo Sheth, Kyungseok Kim Fan Wang, Siddharth Dantu. Advisor: Dr. V Agrawal. ELEC6270 Low Power Design of Electronic Circuits Team Project. VLSI D&T Seminar Nov. 8 2006. Project Objectives.
E N D
Low Power Architecture and Implementation of Multicore Design Khushboo Sheth, Kyungseok Kim Fan Wang, Siddharth Dantu Advisor: Dr. V Agrawal ELEC6270 Low Power Design of Electronic Circuits Team Project VLSI D&T Seminar Nov. 8 2006
Project Objectives • Design and verify 16-bit ALU with synchronous clocked inputs and outputs. • Study low-voltage power and delay characteristics of the design. • Redesign ALU for minimum power and highest speed.
Component of Power Dissipation • Dynamic • Power due to Signal transitions. • Logic power (due to logic transitions). • Glitch power (due to glitches). • Short Circuit power • Static • Leakage power (due to leakage currents).
Power components in CMOS circuit Ron Dynamic power VDD Leakage power vi (t) vo(t) Short circuit power CL R=large Ground Power = CVDD2
1-bit ALU Core Reg B Reg C Reg A 1-bit ALU Design
A NX156 C B NX60 Z Combinational Logic NX16 CY CYIN DFF NX80 CLK 1-bit ALU Core Timing ( Vdd=2.5V ) opcode[3:0] COMPOUT opcode 1010 (nand) opcode 1001 (c<=b) opcode 1000 (c<=a) opcode 0111 (and) opcode 0110 (or) opcode 0101 (nor) opcode 0100 (xor) opcode 0011 (not equal) opcode 0010 (equal) opcode 0001 (a-b) opcode 0000 (a+b) opcode others (all zero’s output) Longest Path in Combinational Logic: c <= a+b (Opcode 0000) C CY Z COMPOUT
1-bit ALU Core Sweep Vdd from 2.5V to 0V 2.5V 2.0V 1.5V 1.0V 0.5V 0.0V Analog Mode C(NX156) Output Vdd=2.5 Vdd=0.5
Vsupply = 0.80 V (Analog Domain) Vsupply = 0.85 V (Analog Domain) Overshoot opcode 1000 (c<=a) Ripples Output Output Input Input Vsupply = 0.80 V Wrong Operation Vsupply = 0.85 V Correct Operation 1Bit ALU Core Logic Operation Voltage @200Mz Supply Voltage Sweep near PMOS Vth = -0.5625 V ( ver. NMOS Vth= 0.365) Sweep From Vsupply = 0.50 to 1.00 Volt ( linear increment 0.05 V, 11 point)
354.563 2.2493 179.9153 1.4203 82.8828 0.4955 31.0283 0.4123 0.7204 0.5427 0.0 0.5 1.0 1.5 2.0 2.5 1-bit ALU Average Power vs. Delay @200MHz 1bit ALU Block Average Power 1-bit ALU Core Average Power 1-bit ALU Core Delay Power =CVDD2
16 Bit ALU (Single Core) Design Combinational Logic (16-Bit ALU) Output Register Input Register Cref CK Supply voltage = Vref Total capacitance switched per cycle = Cref Clock frequency = f Power consumption: Pref = CrefVref2f
16-BIT ALU Vectors *Vector4 activate the critical path, carryout = 1
16-Bit ALU Simulation Result Circuit information: # 694 Gates Clock Frequency applied: 10 MHz Temperature: 27C o Vectors Applied: 6 vectors TSMC025 Technology : Vthn = 0.365 V, Vthp = -0.562 V By ELDO, SPICE simulation Simulation Time: 700 ns
16 Bit ALU Functional Correct Operation at 2.5 V, 1.25 V, 0.85 V and 0.625 V for 6 Vectors
Circuit fail @0.45 V (< Vth) Simulated Single Vector Pair
16-Bit ALU Power Savings and Delay Increase with Reference @ 2.5 Volts
16 Bit ALU Power Savings and Delay Increase with Reference @1.25 Volts
Different Technology Impact On Power Saving 16 Bit ALU Simulation Setup: • Supply Voltage: 2.5v • Simulation Transient Time: 700 ns • 6 vectors • Temperature: 27Co
Temperature Influence On Power • Circuit information: # 734 Gates • Clock Frequency applied: 10 MHz ; Vdd=2.5V • Vectors Applied: 6 vectors • Simulation Time: 700 ns • TSMC035 Technology
Multicore Design Methodology • Lower supply voltage • This slows down circuit speed • Use parallel computing to gain the speed back • Multi-core means to place two or more complete cores within a single module. • This architecture is a “divide and conquer” strategy. By splitting the work between multiple execution cores , a multi-core design can perform more work within a given clock cycle. • About more than 60% reduction in power is observed. Source: http://www.eng.auburn.edu/~vagrawal/D&TSEMINAR_SPR06/SLIDES/Agrawal_DTSem06.ppt
Rgst Rgst Rgst Register Parallel Architecture Comb. Logic Copy 1 f/4 16 Bit ALU Comb. Logic Copy 2 Output Input f/4 4 to 1 multiplexer Comb. Logic Copy 3 Rgst f f/4 Ck3 Comb. Logic Copy 4 Ck2 Ck1 f/4 Ck0 Mux control CK
Control Signals, N = 4 CK Phase 1 Phase 2 Phase 3 Phase 4 Mux control 00 01 10 11 00 01 01 10 11 ……
16 Bit ALU Multi-core Power Savings and Delay Increase with Reference @2.5 Volts Circuit information: # 2617 Gates Clock Frequency applied: 10 MHz Temperature: 27C Vectors Applied: 6 vectors TSMC025 Technology : Vthn = 0.365 V, Vthp = -0.562 V Simulator: ELDO(Spice) Simulation Setup: Simulation Time: 700 ns
16 Bit ALU Multicore Power Savings and Delay Increase with Reference @1.25 Volts
Power and Delay comparison @2.5 V Reference Design with Multicore Design at different voltages
Summary • For Single core ALU design we get more than 60% power savings at reduced voltage but at the cost of performance. • With Reference of 2.5 Volts we observe power drops faster than 1/Vsquare. • With Reference of 1.25 Volts, power drop is almost equal to 1/Vsquare. • Multi-core design helps to gain the speed back at reduced voltage and consumes less power.
References • ELEC6270 Low Power Design Electronics Class Slides from Dr. Agrawal • Spring 06, Dr. Agrawal’ Presentation on VLSI D&T seminar “Multi-Core Parallelism for Low-Power Design” • www.tomshardware.com • N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Reading, Massachusetts, Addison-Wesley, 2005. • L. Shang, R.P Dick, “Thermal crisis: challenges and potential solutions,” Potentials IEEE, vol. 25 , Issue 5, 2006 • International Technology Roadmap for Semiconductors. http://public.itrs.net • Alokik Kanwal, “A review of Carbon Nanotube Field Effect Transistors” Version 2.0, 2003 • K. K Likharev, “Single Electron Devices and their applications,” Proc IIEEE, vol. 87, no. 4, pp. 606-632, Apr. 1999 • A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995. • “Quad-core processor forecas”,Alexander Wolfe @TechWeb