1 / 34

Zombie Memory : Extending Memory Lifetime by Reviving Dead Blocks

Zombie Memory. Zombie Memory : Extending Memory Lifetime by Reviving Dead Blocks. John D. Davis 2 ,. Rodolfo Azevedo 1 , John D. Davis 2 , Karin Strauss 2 , Parikshit Gopalan 2 , Mark Manasse 2 , Sergey Yekhanin 2 University of Campinas 1 & Microsoft Research 2.

sorcha
Download Presentation

Zombie Memory : Extending Memory Lifetime by Reviving Dead Blocks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Zombie Memory Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks John D. Davis2, Rodolfo Azevedo1, John D. Davis2, Karin Strauss2, Parikshit Gopalan2, Mark Manasse2, Sergey Yekhanin2 University of Campinas1& Microsoft Research2

  2. The “End” of the Road for DRAM • DRAM scaling wall • Fabrication limitations • Variability • Increasing error correction overhead (more transient errors) • Increasing active/standby/refresh power • Industry looking for byte-addressable alternatives …but, main gating factor is memory lifetime

  3. Coming on the Horizon: NEW *RAM! • Phase Change Memory (PCM), CBRAM, Memristors, etc. • Fabrication friendly • Value stability • “Zero” standby power • Shorter lifetime (108) vs. DRAM (1015) • Mismatch in memory cell failure mechanisms Zombie Page Dead Page 4 KB Page

  4. Cell Failure Remediation Mismatch I am NOT Dead Yet!

  5. Why Should You Care About Zombies? • Not all dead things are bad for you! • Lots of good cells in “dead” pages • Single-level cell (SLC) & multi-level cell (MLC) mechanisms • The first resistance drift + cell failure mechanism for MLC PCM • Adaptive error correction mechanisms • Maximizes memory capacity over the lifetime

  6. Zombies in the Paper

  7. Outline • Block Pairing • Zombie Memory • Zombie ECP • Zombie ERC • Zombie XOR • Zombie MLC • How Long do Zombies Live? (Evaluation) • Conclusions • Zombie ECP Single-Level Cell • Zombie ERC Multi-Level Cell

  8. The Basics Zombie Page Primary Page • Reintegrating Zombies backinto the memory system • Phase Change Memory + 6 Error Correcting Pointers (ECP) • Other error correction schemes can be used • 512 bit blocks + 64 bits error correction, 64 blocks/ 4 KB page • Differential writes • Simulation details in the paper, SPEC CPU2006

  9. Error Correcting Pointers Review • Use pointer + replacement bit for cell failure • 9 bits pointer + 1 bit • Additional metadata • ISCA ‘10 12% EC Overhead Failed Cell Good Block 512-bit block Worn Block ECP Entry

  10. Adaptive Block Pairing • Pairing with different sized spare blocks • EC bits in the primary point to the spare • Reuse intrinsic error correction in the spare block • Re-pairing at the sub-block and block levels • Re-pair with different spare blocks • Gives Zombie a second chance Primary Spare Zombie block pools Spare Spare Good Block Primary Worn Block Spare Block

  11. Zombie XOR • Pairs primary and spare blocks using XOR aligned bits to produce data • Bias wear to spare block to maximize primary lifetime • Reuse spare error correction bits to correct aligned cell failures in the primary and spare • Re-pair with “new” spare Spare Failed Cell Good Block Worn Block ECP Entry Spare Block Pairing Pointer Primary Spare

  12. Zombie MLC • Must handle drift and cell failures • Rank modulation* to handle drift 11 10 01 00 Relative cell values Fixed guard bands Number String Codeword 0 1 Reprint of D. Ielmini et al., IEDM2007 *N. Papandreou et al. IMW, 2011

  13. Zombie MLC • Must handle drift and cell failures • Rank modulation* to handle drift • Anchor symbols are added to handle cell failures • Known anchor location and/or known values • Optimal encoding: # replacement cells = # failed cells Coordinate shuffle equation * *over a finite field 2 Cells Stuck-at 0 1 Cell Stuck-at 0 See the paper for 3 stuck-at cells mechanism. Original string Codeword Anchors Original non-uniform string Codeword Anchor 3 3 0 3 3 0 3 3 1 0 2 0 3 3 0 0 3 2 0 1 0 3 1 2 3 0 1 3 2 3 3 0 0 2 1 3 0 0 3 3 1 2 0 3 2 1 0 2 1 Bit positions 1 2 3 4 5 6 7 8 9 10 11 12 1 2 *N. Papandreou et al. IMW, 2011

  14. Zombie ECP & ERC • Pairing + existing error correction mechanisms • Adaptive: 1/4, 1/2, and full block pairing • ECP [ISCA ‘10]: Use spare block to add more Error Correcting Pointers to the primary block • ERC [PIT ‘74, HPCA ‘13] : Change the model to an erasure model • Instead of correcting (d-1)/2 errors (error model), can correct d-1 errors • Bias wear to spare block to maximize primary lifetime

  15. How Long do Zombies Live?

  16. Zombie SLC Write Capacity

  17. Zombie SLC Write Capacity 58% longer life

  18. Zombie SLC Write Capacity 58% longer life

  19. Zombie SLC Write Capacity 92% longer life

  20. ZombieSLC Performance < 0.5% slowdown on SPEC workloads < 6% slowdown on SPEC workloads

  21. I’m NOT Dead YET!

  22. I’m Still NOT Dead YET!

  23. I’m STILL NOT Dead YET!

  24. Squeezed Blood From a Turnip!

  25. ZombieMLC Write Capacity

  26. ZombieMLC Write Capacity 17X longer life

  27. ZombieMLC Write Capacity 11X longer life

  28. ZombieMLC Performance < 4% slowdown on SPEC workloads

  29. Zombies Can Be Rehabilitated! • Zombie framework • Using dead blocks to extend memory lifetime • Versatile and adaptive • Low implementation overhead • MLC: First drift + cell failure solution • Using fixed positions and/or fixed values for anchors • Lifetime improvement 11X – 17X • SLC: Multiple mechanisms • Maximize lifetime or capacity • Lifetime improvement of 58-92%

  30. Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Questions? For more details: Read the paper, read the tech report, and/or talk to us rodolfo@ic.unicamp.br & {john.d, kstrauss, parik, manasse, yekhanin}@microsoft.com

  31. More About Zombie…

  32. Zombie SLC Performance

  33. Zombie MLC Performance

  34. Mitigating Drift-Induced Soft Errors • Previous Assumptions: • Fixed guard band for cell value • Uniform distribution of resistance values. • ~2 second data lifetime…. • Relaxing the drift-induced soft error constraint • Rank modulation (no fixed guard band) • Non-uniform distribution of resistance values • Cluster the low levels and spread apart the high levels • ~5 Days of data lifetime (worst-case wear is 5 seconds) • More knobs: • Tighten resistance distribution • Use different drift coefficients

More Related