240 likes | 598 Views
Energy Reduction for STT-RAM Using Early Write Termination. Ping Zhou , Bo Zhao, Jun Yang, *Youtao Zhang Electrical and Computer Engineering Department *Department of Computer Science University of Pittsburgh. ICCAD 2009. Introduction. Traditional SRAM Cache
E N D
Energy Reduction for STT-RAM Using Early Write Termination Ping Zhou, Bo Zhao, Jun Yang, *Youtao Zhang Electrical and Computer Engineering Department *Department of Computer Science University of Pittsburgh ICCAD 2009
Introduction • Traditional SRAM Cache • Limited by density, leakage and scalability • STT-RAM Cache? • High density (~4x than SRAM) • High speed (same read speed as SRAM) • Non-volatile • No write endurance problem
STT-RAM: Cell • Magnetic Tunnel Junction (MTJ) • Relative magnetization direction • Different resistances Logic 0 or 1 • Write: spin-polarized current • Much less write current than conventional MRAM Reference Layer MgO MgO Free Layer High Resistance (Logic 1) Low Resistance (Logic 0)
STT-RAM: Cell Array • Similar array structure as SRAM • Bidirectional write current BL SL BL SL MTJ MTJ MTJ WL MTJ write 0 write 1 WL
STT-RAM Cache: Challenge • High dynamic energy • 6~14x more energy per write access [Dong et al. DAC 2008, Sun et al. HPCA 2009] • Write contributes >74% of total dynamic energy 74.2% Need to reduce write energy in STT-RAM cache!
Opportunity • Many bits are unchanged in a write access – Redundant bit-writes [Zhou et al. ISCA 2009] • Redundant bit-writes in 16MB STT-RAM cache 88% How to exploit this opportunity?
Exploiting Redundant Bit-Writes • Need to know the old value… • Read & compare before write [Zhou et al. ISCA 2009] • Can we do better?
Observation • MTJ resistance changes abruptly by the end of write cycle • Cell still holds old value at early stage of write cycle • Read is much faster than write Y. Chen et al. ISQED 2008 Possible to sense the old value at early stage of write cycle
Early Write Termination: Idea • On a write access… • Start write cycle like normal • Sense the old value at early stage • Terminate the write cycle if old value is same as new value • Does not require a preceding read & compare!
EWT Circuit BL SL WL MTJ write 0 Rwire write 1 Rwire Vin1 conversion Vsense1 Vsense0 conversion Vin0 pass pass Vsense0 Vref0 New value • Conversion circuit • Basic differential amplifier • Input lower Output higher • Input higher Output lower Vsense1 Vref1 Sense-Amp Terminate?
How EWT Works? BL SL WL MTJ write 0 Rwire high low Rwire Vin1 conversion Vsense1 Vsense0 conversion Vin0 pass pass 0.536ns Old Value New Value Vin0 Vsense0 SA output Action 0 0 lower higher 1 Terminate 1 0 higher lower 0 Continue
Advantages of EWT • No performance penalty! • Carried within a write cycle • No need to read & compare before a write • Write access may finish early Slight speedup • Low energy overhead (3.23%) • Low complexity • Easy to integrate with existing designs
Latency Modeling • Cell • Derived from recent works [Dong et al. DAC 2008] • Peripheral • Derived from CACTI[Thoziyoor et al. ISCA 2008, Dong et al. DAC 2008]
Dynamic Energy Modeling • Baseline: Derived from recent works[Dong et al. DAC 2008] • EWT • Read energy: same as baseline • Write energy: variable Peripheral (derived from CACTI) Extra energy introduced by EWT circuits (HSPICE) Nchanged × Echanged + Nunchanged × Eunchanged Cell change Terminated cell change
Leakage Energy Modeling • STT-RAM is non-volatile • Power gate the idle banks • Assume 1ns delay to “wake up” • Used in both baseline and EWT
Experimental Setup • Simics-based simulator • 4-core CMP, 1GHz • 32KB private L1 cache • 16MB shared L2 cache using STT-RAM, 16 banks • 4GB main memory • Enhanced cache model: STT-RAM & EWT
Results: Performance • Normalized Cycle-Per-Instruction (CPI) 1% speedup Slight performance improvement
Results: Write Energy • Normalized write energy 70% saving Up to 80% write energy reduction
Results: Dynamic Energy • Normalized dynamic energy Base EWT 52% reduction
Results: Total Energy • Normalized total energy 33% reduction
Results: Energy-Delay Product • Normalized ED2 34% reduction
Conclusion • Address a key challenge to STT-RAM cache: dynamic energy • EWT: Exploit redundant bit-writes without performance penalty • Low overhead and complexity • Modeling and evaluation • Up to 80% write energy reduction • 34% ED2 reduction