1 / 38

Takahiro Hanyu Laboratory for Brainware Systems

Essderc2010_ITRS workshop on Emerging Spin and Carbon-based Nanoelectronic Logic Devices @ Barcelo Renacimiento Hotel, Seville, Spain, Sep. 17, 2010. Magnetic FPGAs: Challenge of Nonvolatile Logic-in-Memory Architecture Using MOSFETs and Magnetic Tunnel Junctions. Takahiro Hanyu

pahana
Download Presentation

Takahiro Hanyu Laboratory for Brainware Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Essderc2010_ITRSworkshop on Emerging Spin and Carbon-based Nanoelectronic Logic Devices @ Barcelo Renacimiento Hotel, Seville, Spain, Sep. 17, 2010 Magnetic FPGAs:Challenge of Nonvolatile Logic-in-Memory Architecture Using MOSFETs and Magnetic Tunnel Junctions Takahiro Hanyu Laboratory for Brainware Systems Research Institute of Electrical Communication (RIEC) Tohoku University, Japan Acknowledgements: This work supported by the Japan Society for the Promotion of Science (JSPS) through its “Funding Program for World-Leading Innovative R&D on Science and Technology (FIRST Program;Prof. Hideo Ohno).“ This work was also supported by Laboratory for Nanoelectronics and Spintronics, Tohoku University, Japan.

  2. Outline • Impact of Nonvolatile (NV) Logic-in-Memory (LIM) Architecture • Design of an MTJ-Based NV LIM Circuit • Application1: NV-FPGA • Application 2: NV-TCAM • Conclusions & Future Prospects

  3. Background: Increasing delay & power Leakage current Logic and Memory modules are separated On-chip memory modules are volatile. Many interconnections between modules Power supply must be continuously applied in memory modules. Wire delay dominates chip performance Global wires requires large drivers. Delay: Long Power: Large Static power: Large

  4. Nonvolatile logic-in-memory architecture • Logic-in-MemoryArchitecture (proposed in 1969): Storage elements are distributed over a logic-circuit plane. Magnetic Tunnel Junction (MTJ) device • No volatility • Unlimited endurance • Fast writability • Scalability • CMOS compatibility • 3-D stackcapability MTJ layer CMOS layer ●Storage is nonvolatile: (Leakage current is cut off) ●MTJ devices are put on the CMOS layer ●Storage/logic are merged: (global-wire countis reduced) Static power is cut off. Chip area is reduced. Wire delay is reduced. Dynamic power is reduced.

  5. Outline • Impact of Nonvolatile (NV) Logic-in- Memory (LIM) Architecture • Design of an MTJ-Based NV LIM Circuit • Application1: NV-FPGA • Application 2: NV-TCAM • Conclusions & Future Prospects

  6. Model of a MOS/MTJ-hybrid circuit Configuration inputs ◆ Circuit configuration ◆ Pattern data Data inputs Outputs Storage(MTJ device) Logic-circuit plane (CMOS) Typical applications: ◆ Circuit-configuration type:Field-Programmable Gate Array (FPGA) ◆ Pattern-data type: Content-Addressable Memory (CAM)

  7. Design example x1+x2 x1 MUX Data inputs Output x1・x2 x2 Configuration input 1-bit storage y Configuration Memory How to design this logic circuit ?

  8. CMOS implementation VDD NOR MUX x1+x2 Logic and storage parts are separatedeach other. Small ? GND NAND x1 Output x1・x2 x2 VDD SRAM cell CTRL y y’ GND Transistor counts : 20+α(nonvolatile devices)

  9. x RMOS Principle of MOS/MTJ-hybrid circuitry RH(High resistance) if x=0 RL(Low resistance) if x=1 RMOS = RAP (High resistance) if y=0 RP(Low resistance) if y=1 RMTJ = y RMTJ NAND NOR Logic function is configurable by stored data in MTJ.

  10. MOS/MTJ-hybrid circuit implementation VDD Output generator Rload Rload out (Vout) (Vout’) out’ Rx1 Rx1’ Rx2 Rx2’ I I’ x1 x1’ Logic & Storage x2 x2’ y y’ Ry Ry’ y y’ y’ y CTRL CTRL Current comparator CLK 0 (if I > I’) out = Transistor counts : 11 1 (if I’ > I) Merging logic & storageCompact

  11. Outline • Impact of Nonvolatile (NV) Logic-in-Memory (LIM) Architecture • Design of an MTJ-Based NV LIM Circuit • Application1: NV-FPGA • Application 2: NV-TCAM • Conclusions & Future Prospects

  12. Typical Application 1 : Nonvolatile FPGA NV devices are distributed across the FPGA. NV LUT (Lookup Table) NV device ☺ Leakage current elimination and short latency are possible. NV FPGA NVM  How to design? MOS/MTJ-hybrid circuit Not required!

  13. MTJ MTJ MTJ MTJ SA SA SA SA Conventional nonvolatile FPGA  CMOS logic circuit requires high-voltage input swing. Combinationallogic(CMOS) Output (SA: Sense Amplifier) Low voltage High Voltage How do we perform logic operation by using low swing signal from MTJ device directly?

  14. MOS/MTJ-hybrid circuitry (Proposed) Current-mode logic (CML) Logic operation is performed even low swing voltage by using the small difference of the current value. MTJ Combinationallogic(Current-Mode) MTJ SA Output MTJ MTJ Low voltage High voltage Device count is reduced to 28% with less performance degradation.

  15. A A B B B MOS/MTJ-hybrid structure Selection transistor tree Reference resistor Truth table IF IREF A A B B B ZAB=0RAB=RAP ZAB=1RAB=RP RAP >RREF > RP RREF R11 R10 R01 R00 2-input LUT function is realized by using 10 NMOS trs and 4 MTJs (and 1-resistor).

  16. Operation example (XOR) Z = 0 Sense Amplifier Z = 1 IF > IREF IF IREF Truth table ‘1’ ‘0’ ‘1’ ‘0’ ‘0’ ‘0’ ‘1’ ‘1’ ‘1’ ‘0’ RAP RP RP RAP RREF Logic operation in low swing voltage is performed by using a MOS/MTJ-hybrid network.

  17. VCLK Z Z VZ VC1 VC2 VZ IF < IREF (Z, Z) = (0, 1) IF >IREF (Z, Z) = (1, 0) Precharge-Evaluate Logic SA IF IREF MOS/MTJ-hybrid network (LUT operation) C1 C2 CLK CL Precharge (CLK=0) Evaluate (CLK=1) CLK Dynamic current-mode logic (DyCML)-based circuit.  Reduction of dynamic power dissipation.

  18. Spin-Injection Write Operation Selection Transistor Tree Reference Resistor W=‘1’ ITMR BL =‘0’ WL0 RTMR WL1 0 WL2 RAP WL3 =1 1 BL =1 RP ITMR Spin-injection-based write operation. 0 ICAP ICP

  19. Test chip features Fabricated 2-input LUT Selection Transistor Tree 4 MTJ devices are stacked over MOS layer

  20. Measured waveforms (Basic operations) P: Pre-Charge E: Evaluate P E P E P E P E Input A Input B Output Z ‘1’ ‘0’ ‘0’ ‘0’ ‘1’ ‘1’ ‘1’ ‘0’ Output Z NAND NOR A 0.78V/div B Z ‘0’ ‘1’ ‘1’ ‘0’ ‘1’ ‘0’ ‘0’ ‘1’ Z 100s/div XOR XNOR

  21. Immediate wakeup behavior Standby Active Active CLK VDD= 0 VDD 01 01 00 10 11 00 10 11 Z Z 0.78V/div Immediate wakeup behavior has also measured successfully. 50s/div

  22. Comparison of performances Nonvolatile SRAM [3] SRAM/MRAM Proposed 29 MOSs + 4 MTJs 102 MOSs + 8 MTJs 46 MOSs + 1 MRAM *1) Device Counts 287m2 Area *2) 455 m2 *1) 702 m2 Delay *3) 100 ps 140 ps 185 ps Active Power*3) 22.5 mW 26.7 mW 17.5 mW Standby Power 0 mW 0 mW 0 mW Standby to Active 42 ns/bit Delay 0 ns/bit 0 ns/bit Energy 19 pJ/bit 0 pJ/bit 0 pJ/bit *1) It consists of four SRAM cells (24 MOSs), three 2-input multiplexers (18 MOSs), and two output buffers (4 MOSs). MRAM and its peripheral circuits are not considered in this evaluation. *2) Estimation based on a 0.14mm process *3) HSPICE simulation based on a 0.14mm MOS/MTJ-hybrid process

  23. MRR vs. Operation Margin in NV-LUT □ MRR in 6-input LUT. Shmoo Plot Large MRR →Sufficient operation margin

  24. Outline • Impact of Nonvolatile (NV) Logic-in-Memory (LIM) Architecture • Design of an MTJ-Based NV LIM Circuit • Application1: NV-FPGA • Application 2: NV-TCAM • Conclusions & Future Prospects

  25. Application 2: Ternary Content-Addressable Memory (TCAM) Input key Fully parallel masked equality search 0 1 0 0 1 ・・・ 1 0 Search-line / Word-line driver 2 2 2 2 2 2 2 OUT1 2 BL1 BL1’ 0 (Mismatch) 1 1 0 0 0 ・・・ 1 X OUT2 2 BL2 BL2’ 1 (Match) 0 1 0 0 1 ・・・ X X Bit-line driver Output driver Stored words ・・・ OUTn 2 BLn BLn’ 0 (Mismatch) 1 0 1 0 1 ・・・ X X Fully parallel search and fully parallel comparison can be done. TCAM is a “functional memory.” TCAM is the powerful data-search engine useful for various applications such as database machine and virus checker in network router TCAM must be implemented more compactly with lower power dissipation.

  26. NV-TCAM Cell Circuit S’ / WL1 S / WL2 Wired-OR ML (Match line) b1 b2 IZ S’ ・b1 S・b2 ML= b1・S’+ b2・S

  27. CMOS-based TCAM cell circuit Equality-detection (ED) circuit 1-bit storage 1-bit storage ML VDD Leakage current Leakage current WL VSS BL1 SL’ SL BL2 • Transistor counts : 12(ED;4T, 2-bit storage;8T) • Input/output wires : 8(BL;2, WL;1, VDD&VSS;2, SL;2, ML;1) • Always supply the power : Many leakage current path How to realize compact & cut off the leakage current ?

  28. MOS/MTJ-hybrid TCAM cell circuit S. Matsunaga, K. Hiyama, A. Matsumoto, S. Ikeda, H. Hasegawa, K. Miura, J. Hayakawa, T. Endoh, Hideo Ohno, and Takahiro Hanyu, "Standby-Power-Free Compact Ternary Content-Addressable Memory Cell Chip Using Magnetic Tunnel Junction Devices," Applied Physics Express (APEX), vol. 2, no. 2, pp. 023004-1~023004-3, 2009. ML/BL 2-bit storage (MTJs) Logic (MTJs & MOSs) SL’/WL1 SL/WL1 • Merge storage into logic circuit :Compact (2T-2MTJ) • Share wires : 4 (ML/BL, SL/WL, No-VDD) • 3-D stack structure : Great reduction of circuit area Compact & nonvolatile TCAM cell with MTJ devices

  29. Power-Gating Scheme of Bit-Serial NV-TCAM 1st-bit search 2nd-bit search 3rd-bit search Search word Search word Search word 1 1 1 1 1 1 1 1 1 Mismatch Mismatch Mismatch 0 0 X SA ACC 0 0 X SA ACC 0 0 X SA ACC 0 1 0 SA ACC Mismatch 0 1 0 SA ACC Mismatch 0 1 0 SA ACC Mismatch Mismatch Mismatch Mismatch 0 X 1 SA ACC 0 X 1 SA ACC 0 X 1 SA ACC Match Mismatch Mismatch 1 0 X ACC 1 0 X ACC 1 0 X ACC SA SA SA Match Match Mismatch 1 1 0 SA ACC 1 1 0 SA ACC 1 1 0 SA ACC Match Match Match 1 X 1 SA ACC 1 X 1 SA ACC 1 X 1 SA ACC Match Mismatch Mismatch X 0 X SA ACC X 0 X SA ACC X 0 X SA ACC X 1 0 ACC X 1 0 ACC X 1 0 ACC SA Match SA Match SA Mismatch SA ACC SA ACC SA ACC X X 1 Match X X 1 Match X X 1 Match TCAM cell in standby mode (Static power is suppressed.) TCAM cell in active mode Accumulator in active mode ACC Sense amplifier in standby mode (Static power is suppressed.) Sense amplifier in active mode SA SA According to the word length of the TCAM, the effectiveness of the standby-power reduction is increased.

  30. TCAM cell circuit test chip 3.0 mm Chip features Output generator in MLSA TCAM cell Ref. cell 9.8 mm Dynamic current comparator in MLSA a) A CMOS-based TCAM cell with 12 transistors, whose cell size is 17.54 mm2 under a 0.18 mm CMOS process, has been reported.8) The size of the conventional TCAM cell can be estimated as 10.61 mm2 under a 0.14 mm CMOS process by scaling down. Thus, the size of the fabricated TCAM cell is reduced to 30 % compared to that of the conventional one. Moreover, minimum size of the proposed TCAM cell can be considered as 1/6 of the conventional one. b) More high-speed write operation is possible with increase of write current. For example, with the average current of 327 mA at 10 ns write.

  31. Waveforms of equality-search operations P : Precharge phase E : Evaluate phase E E E E E E P P P P P P ・・・ ・・・ CLK Stored data B=1 Stored data B=0 Stored data B=X S Search data ・・・ ・・・ S=0 S=1 S=0 S=1 S=0 S=1 OUT ・・・ ・・・ Match result Match Match Match Match 780mV Mismatch 10ms Mismatch Bit-level equality-search is successfully demonstrated.

  32. Waveforms of sleep/wake-up operations Power-off Power-off VDD Active Active Active E E E E P P P P Standby Standby CLK Stored data B=0 Stored data B=0 S=0 S=0 S=1 S=1 S OUTbefore=1 OUTafter=1 OUTbefore=0 OUTafter=0 OUT 780mV Match Match 10ms Mismatch Mismatch Instant sleep/wake-up behavior is successfully demonstrated.

  33. Comparison of 144-bit x 256-word Bit-Serial TCAM HSPICE simulation under a 90nm CMOS/MTJ technology @125MHz, RP : 2kW, TMR ratio : 100% 103% 1.2% 43% Ultra-low-power/high-performance bit-serial TCAM is achieved by MOS/MTJ-hybrid circuit with fine-grain power gating.

  34. Outline • Impact of Nonvolatile (NV) Logic-in-Memory (LIM) Architecture • Design of an MTJ-Based NV LIM Circuit • Application1: NV-FPGA • Application 2: NV-TCAM • Conclusions & Future Prospects

  35. Conclusions Propose a MOS/MTJ-hybrid circuit (nonvolatile logic-in-memory circuit using MTJ devices) style Two kinds of typical applications with logic-in-memory architecture; NV-LUT circuit and NV- TCAM Compact and no static power dissipation Confirm basic behavior with fabricated test chips under an MTJ/CMOS process. It could open an ultra-low-power logic-circuit paradigm Future Prospects and Issues: • Establish the fabrication line • Establish the CAD tools • Explore the appropriate application fields (Impact towards “Reliability Enhancement”)

  36. Reliability Enhancement Using MTJs Adjust VGS by MTJ devices connected to transistors MTJ device Programmable resistance value (RTMR RmaxorRmin) VS Non-volatile storage element RMTJ VGS can be adjusted by controlling RMTJ Vth-variation compensation after fabrication is realized Small overhead Non-volatility Compensation state is held without electric supply MTJ devices can be set above CMOS layer Vth-variation compensation is realized with small overhead by using MTJ devices

  37. Evaluation in comparator Conventional comparator Proposed comparator 1.2 1.2 Vo Vo 1.0 Output [V] Output [V] 1.0 Vo’ Vo’ 0.8 0.8 -0.2 0 0.2 -0.2 0 0.2 VIN - VT [V] VIN - VT [V] Shift range of cross-point:60mV 38% 23mV Robustness of the proposed comparator against the Vth variation Fabricated chip 0.18μm CMOS/MTJ process (Measurement now on going…)

More Related