350 likes | 607 Views
Zombie Memory. Zombie Memory : Extending Memory Lifetime by Reviving Dead Blocks. John D. Davis 2 ,. Rodolfo Azevedo 1 , John D. Davis 2 , Karin Strauss 2 , Parikshit Gopalan 2 , Mark Manasse 2 , Sergey Yekhanin 2 University of Campinas 1 & Microsoft Research 2.
E N D
Zombie Memory Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks John D. Davis2, Rodolfo Azevedo1, John D. Davis2, Karin Strauss2, Parikshit Gopalan2, Mark Manasse2, Sergey Yekhanin2 University of Campinas1& Microsoft Research2
The “End” of the Road for DRAM • DRAM scaling wall • Fabrication limitations • Variability • Increasing error correction overhead (more transient errors) • Increasing active/standby/refresh power • Industry looking for byte-addressable alternatives …but, main gating factor is memory lifetime
Coming on the Horizon: NEW *RAM! • Phase Change Memory (PCM), CBRAM, Memristors, etc. • Fabrication friendly • Value stability • “Zero” standby power • Shorter lifetime (108) vs. DRAM (1015) • Mismatch in memory cell failure mechanisms Zombie Page Dead Page 4 KB Page
Cell Failure Remediation Mismatch I am NOT Dead Yet!
Why Should You Care About Zombies? • Not all dead things are bad for you! • Lots of good cells in “dead” pages • Single-level cell (SLC) & multi-level cell (MLC) mechanisms • The first resistance drift + cell failure mechanism for MLC PCM • Adaptive error correction mechanisms • Maximizes memory capacity over the lifetime
Outline • Block Pairing • Zombie Memory • Zombie ECP • Zombie ERC • Zombie XOR • Zombie MLC • How Long do Zombies Live? (Evaluation) • Conclusions • Zombie ECP Single-Level Cell • Zombie ERC Multi-Level Cell
The Basics Zombie Page Primary Page • Reintegrating Zombies backinto the memory system • Phase Change Memory + 6 Error Correcting Pointers (ECP) • Other error correction schemes can be used • 512 bit blocks + 64 bits error correction, 64 blocks/ 4 KB page • Differential writes • Simulation details in the paper, SPEC CPU2006
Error Correcting Pointers Review • Use pointer + replacement bit for cell failure • 9 bits pointer + 1 bit • Additional metadata • ISCA ‘10 12% EC Overhead Failed Cell Good Block 512-bit block Worn Block ECP Entry
Adaptive Block Pairing • Pairing with different sized spare blocks • EC bits in the primary point to the spare • Reuse intrinsic error correction in the spare block • Re-pairing at the sub-block and block levels • Re-pair with different spare blocks • Gives Zombie a second chance Primary Spare Zombie block pools Spare Spare Good Block Primary Worn Block Spare Block
Zombie XOR • Pairs primary and spare blocks using XOR aligned bits to produce data • Bias wear to spare block to maximize primary lifetime • Reuse spare error correction bits to correct aligned cell failures in the primary and spare • Re-pair with “new” spare Spare Failed Cell Good Block Worn Block ECP Entry Spare Block Pairing Pointer Primary Spare
Zombie MLC • Must handle drift and cell failures • Rank modulation* to handle drift 11 10 01 00 Relative cell values Fixed guard bands Number String Codeword 0 1 Reprint of D. Ielmini et al., IEDM2007 *N. Papandreou et al. IMW, 2011
Zombie MLC • Must handle drift and cell failures • Rank modulation* to handle drift • Anchor symbols are added to handle cell failures • Known anchor location and/or known values • Optimal encoding: # replacement cells = # failed cells Coordinate shuffle equation * *over a finite field 2 Cells Stuck-at 0 1 Cell Stuck-at 0 See the paper for 3 stuck-at cells mechanism. Original string Codeword Anchors Original non-uniform string Codeword Anchor 3 3 0 3 3 0 3 3 1 0 2 0 3 3 0 0 3 2 0 1 0 3 1 2 3 0 1 3 2 3 3 0 0 2 1 3 0 0 3 3 1 2 0 3 2 1 0 2 1 Bit positions 1 2 3 4 5 6 7 8 9 10 11 12 1 2 *N. Papandreou et al. IMW, 2011
Zombie ECP & ERC • Pairing + existing error correction mechanisms • Adaptive: 1/4, 1/2, and full block pairing • ECP [ISCA ‘10]: Use spare block to add more Error Correcting Pointers to the primary block • ERC [PIT ‘74, HPCA ‘13] : Change the model to an erasure model • Instead of correcting (d-1)/2 errors (error model), can correct d-1 errors • Bias wear to spare block to maximize primary lifetime
Zombie SLC Write Capacity 58% longer life
Zombie SLC Write Capacity 58% longer life
Zombie SLC Write Capacity 92% longer life
ZombieSLC Performance < 0.5% slowdown on SPEC workloads < 6% slowdown on SPEC workloads
ZombieMLC Write Capacity 17X longer life
ZombieMLC Write Capacity 11X longer life
ZombieMLC Performance < 4% slowdown on SPEC workloads
Zombies Can Be Rehabilitated! • Zombie framework • Using dead blocks to extend memory lifetime • Versatile and adaptive • Low implementation overhead • MLC: First drift + cell failure solution • Using fixed positions and/or fixed values for anchors • Lifetime improvement 11X – 17X • SLC: Multiple mechanisms • Maximize lifetime or capacity • Lifetime improvement of 58-92%
Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Questions? For more details: Read the paper, read the tech report, and/or talk to us rodolfo@ic.unicamp.br & {john.d, kstrauss, parik, manasse, yekhanin}@microsoft.com
Mitigating Drift-Induced Soft Errors • Previous Assumptions: • Fixed guard band for cell value • Uniform distribution of resistance values. • ~2 second data lifetime…. • Relaxing the drift-induced soft error constraint • Rank modulation (no fixed guard band) • Non-uniform distribution of resistance values • Cluster the low levels and spread apart the high levels • ~5 Days of data lifetime (worst-case wear is 5 seconds) • More knobs: • Tighten resistance distribution • Use different drift coefficients