290 likes | 514 Views
Progress Update. Energy-Performance Characterization of CMOS/MTJ Hybrid Circuits. Fengbo Ren. 05/28/2010. Modern MTJ. Bias voltage/current controlled variable resistance device Low: R P High: R AP TMR = (R AP - R P )/ R P Spin-transfer-torque (STT) Switching
E N D
Progress Update Energy-Performance Characterization of CMOS/MTJ Hybrid Circuits Fengbo Ren 05/28/2010
Modern MTJ • Bias voltage/current controlled variable resistance device • Low: RP • High: RAP • TMR = (RAP - RP)/ RP • Spin-transfer-torque (STT) Switching • Switching is controlled by the direction of writing current. • Writing current density has to exceed thresholds
Motivations for Hybrid Logic • Significant application in MRAM design. • Why logic? • CMOS-compitible • Switching current: 200uA – 2mA • 90nm transistor: 1mA/um gate width • Non-volatility, high stability • Introducing MTJ's non-volatility into CMOS, which may suppress leakage in active mode and reduce the leakage in idle mode to minimum. • 3D – stack • Replace CMOS with MTJ may increase density.
Questions? • What architecture can best utilize MTJ's non-volatility feature to improve energy efficiency? • Can MTJ/CMOS hybrid circuit has better energy delay trade-off than CMOS circuit? • How much leakage power can be saved by introducing MTJ to CMOS? • Any overhead? How much is the switching power of MTJ? • What will be the trend of MTJ/CMOS hybrid circuit with technology scaling?
Logic-in-Memory MTJ (LIM-MTJ) Logic Style • LIMT-MTJ • Use differential MTJ in Dynamic Current-mode Logic (DyCML) • Outputs are evaluated based on the resistance difference of pull down networks through x-coupled PMOS. • Claimed to have dynamic and static power than SCMOS. Schematic of LIM-MTJ 1-bit full adder.
Energy-Performance Characterization • V.S. SCMOS & DyCML • LIM-MTJ has no energy performance advantage as compared to the equivelent CMOS implementation Schematic of SCMOS 1-bit full adder. Schematic of DyCML 1-bit full adder.
MTJ Switching Energy Analysis • Switching Energy • IW = JC∙A, • JC is the critical current density • A is the junction area. A = π∙W∙L= K∙L2 , L is junction size. • R = δ/A • δ is the resistance-area product, intrinsic MTJparameter. δ = 20 Ω ∙ um2 • t is time.
MTJ Switching Energy Analysis • JC is a function of current pulse width. • Switching time is a function of current density. • Δ is the thermal stability factor (Δ≥40) • t0 is the intrinsic switching time. t0 = 1 ns • JC0 is the intrinsic critical current density, JC0 = JC at t= t0. • Modern MTJs have been shown to have JC0 = 2-7 MA/cm2
MTJ Switching Energy Analysis • Switching Energy • Function of switching time (t) given JC0, δ, L, Δ • Ref. MTJ • JC0 = 5 MA/cm2, δ=20 Ω ∙ um2, L=135nm, (W=65 nm,) • RP=725 Ω, IC=1.4mA @ t=1ns • Switching Energy > 1 pJ • CMOS/MTJ hybridlogic circuits require frequent switching ishardly energy efficient.
MTJ Switching Energy Analysis • Switching Energy with scaling • δ, L, JC0 • fJ Switching • δ ≤ 5Ω ∙ um2 & JC0 ≤ 0.6 MA/cm2 & L ≤ 33nm
LUT-based Logic • Store the true table in memory • Reads out the logic value based oninput selection. • Reconfigurable • Can implement all type of logics. e.g. FPGA • Replace storage cell with MTJ • No MTJ switching during the logic operation. Only need to be configured once. • Non-volatile, minimum stanby power. • Instant boot-up. Example of 3 input LUT
MTJ Reading Circuit • Conventional current-mirror sense amplifier based reading circuit. (SA) • Slow (2 stages) • Power hungry (DC current) ∆V ∆V VIP VIN
MTJ Reading Circuit • X-coupled inverter based reading circuit. (XSA) • Fast • ∆V are generated and amplified at the same time • Power efficient • no DC current, only charging discharging capacitance 1MTJ and 1Rref accessed per read ∆V at evaluation phase Amplified by X-coupled inverter
1 Bit Full Adder (CMOS_LUT) • Transistor Count • 16xEDFF • 4xMUX4 • 2xMUX2 • 672 Transistors
1 Bit Full Adder (MTJ_LUT1) • Transistor Count • 16xREAD1XMTJ • 4xMUX4 • 2xMUX2 • 2xWRTCKT • 448 Transistors • 33% Reduction • 16 MTJ
READ1XMTJ • 15T+1MTJ • Need writing circuit
1 Bit Full Adder (MTJ_LUT2) • Transistor Count • 2x READ8XMTJ • 1x 9-WORD DECODER • 2x MUX2 • 1x INV • 1x WRTCKT • 174 Transistors • 76% Reduction • 16 MTJ
READ8XMTJ • MTJs share reading circuit • 1MTJ + 1 Rref are accessed / read • 1MTJ is accessed / write • 23T + 8 MTJ
Simulation Setup • 3 LUT architecture are compared • CMOS-LUT • MTJ-LUT1: MTJ reading circuit + MUX • MTJ-LUT2: Shared MTJ reading circuit + decoder • Configured to implement 1-bit full adder • 2 3-input LUTs • ASU predictive technology model (PTM) • 90nm, 65nm (bulk) • 45nm, 32nm (SOI) • MTJ characteristic • Rp = 700, Rap = 1400, TMR = 100%, Icap2p = 223uA, Icp2ap = 500uA • Verilog-A MTJ model from Richard.
Configuration Power • CMOS-LUT • 1GHz • MTJ-LUT • 250MHz • 750uA Writing Current • About 3 ns Writing time/ MTJ • MTJ-based LUT are 10x bigger configuration power • 16 MTJ’s switching energy
Delay • MTJ-based LUT2 has 2.5x bigger delay
Leakage Power • MTJ-LUT1 has a little bit bigger leakage power • MTJ-LUT2 has about 5x smaller total leakage power and • 10x smaller storage leakage (due to MTJ) • 2x smaller logic leakage (from MUX to decoder)
Energy (Operation Frequency:100MHz) • LUT2 • 4x total energy saving @ 32nm • 1/10 leakage_storage, ½ leakage_logic, bigger dynamic_logic • Dynamic_storage overhead decreases with technology scaling down.
Energy (Operation Frequency:250MHz) • LUT2 • 3x total energy saving @ 32nm • 1/10 leakage_storage, ½ leakage_logic, ½ dynamic_logic • Dynamic_storage overhead decreases with technology scaling down.
Energy (Operation Frequency:500MHz) • LUT2 • 2x total energy saving @ 32nm • 1/10 leakage_storage, ½ leakage_logic, ½ dynamic_logic • Dynamic_storage overhead decreases with technology scaling down.
Standby Power • Dynamic sleep transistor • 50mV voltage drop across sleep transistor • 5-20X reduction
Conclusions • What architecture can best utilize MTJ's non-volatility feature to improve energy efficiency? • LUT-based logic which require no MTJ switching. • Can MTJ/CMOS hybrid circuit has better energy delay trade-off than CMOS circuit? • Yes. • How much leakage power can be saved by introducing MTJ to CMOS? • About 10x reduction • Any overhead? How much is the switching power of MTJ? • Yes. MTJ reading energy is overhead. MTJ writing energy of modern MTJ is around several pJ. • What will be the trend of MTJ/CMOS hybrid circuit with technology scaling? • Will play significant role in suppressing leakage below 45 nm.