1 / 17

Data Dependent Sparing to Manage Better-Than-Bad Blocks *

Data Dependent Sparing to Manage Better-Than-Bad Blocks *. Rakan Maddah 1 , Sangyeun Cho 2, 1 , and Rami Melhem 1 1 Computer Science Department, University of Pittsburgh 2 Memory Division, Samsung Electronics Co. Introduction.

keon
Download Presentation

Data Dependent Sparing to Manage Better-Than-Bad Blocks *

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Dependent Sparing to Manage Better-Than-Bad Blocks* RakanMaddah1, SangyeunCho2, 1, and Rami Melhem1 1Computer Science Department, University of Pittsburgh 2Memory Division, Samsung Electronics Co *Published in IEEE CAL. Manuscript accessible at: http://people.cs.pitt.edu/~rmaddah/cal.pdf

  2. Introduction • Bad block management is a vital technique for memories subject to relatively low write endurance • NAND Flash can sustain 103 to 105 program/erase cycles • Phase Change Memory (PCM) can sustain 106 to 108set/reset cycles • Bad block: a block with a number of defective cells that can result in more errors than the capability of the error correction code • The common practice is to replace a bad block with a good spare block after the first write failure • 20% sparing is typical for server products

  3. Motivation • Reconsider the bad block management technique • PCM as well NAND flash exhibit a stuck-at fault model • A failed cell gets stuck permanently at either 0 or 1 • A stuck-at cell can still be read but not reprogrammed • Failures in the context of the stuck-at fault model are data dependent!

  4. Data-Dependent Failures Physical state • A Write on a storage block having a number of faults greater than the capability of the error correction code does not necessarily fail! Write Request Errors after write Write request Errors after write Write request Errors after write

  5. Data-Dependent Failures Physical state • Example: With an ECC code of capability 2, only 1 write out of the 3 fails Write Request Errors after write Write request Errors after write Write request Errors after write

  6. Block Write Failure • Block write failure probability vs. # of faults within a 4KB storage block, when an error correction mechanism covers up to 20 errors # Faults

  7. Block Classification • Classify storage blocks into three categories: • Good: a block with no write failures • Better-Than-Bad: a block with rare write failures • Bad: a block with frequent write failures • More lifetime can still be squeezed from better-than-Bad block! • Observation: retiring a block after the first write failure is overly conservative

  8. Data Dependent Sparing • Delay block retirement • Temporally borrow a spare block after a write failure • Attempt a later write request on the original (faulty) block and reclaim spare block in case of write success • Retire a block when frequent write failures start to occur i.e. a better-than-bad block becomes bad Primary Storage Blocks Spare Blocks Spare Blocks Primary Storage Blocks Write Requests Write Requests Later Writes

  9. Execution Flow Read Verification Keep track of “goodness” Write a block Write Successful? No Yes Failure frequency > Threshold Reclaim assigned spare, if any Yes No Retire block and replace it with a spare Obtain a spare to write to; do not retire block

  10. Alternative Design Strategies • Spare Allocation Strategies • Temp-Sparing: a healthy spare block temporally substitutes a better-than-bad block • Role-Exchange: a spare block permanently replaces the failing block which is added to the pool of spare blocks • Block mapping Strategies • If temp-sparing is adopted, then keep a table that stores pointers to spare blocks • If role-exchange is adopted, then update address remapping table in SSD • Determining Block “goodness” Strategies • A counter per better-than-bad blocks • A global data structure that approximate individual counters e.g. counting bloom filter Primary Storage Spare Storage

  11. Evaluation • Monte Carlo Simulation • Simulation of 2000 Storage blocks of size 4KB each • Assign lifetime to each storage cell out of Gaussian distribution • PCM: mean 108 and stdev 25x106 • NAND Flash: mean 8.27x105 and stdev 2.48x105 • Assume perfect wear leveling • Protect each storage block with BCH code of capability n

  12. Lifetime Improvement • Lifetime of PCM blocks with BCH-20 and 10% failure frequency threshold. “DD” denotes data dependent sparing and “SS” static sparing DD(PCM) SS(PCM) 18.1% 78%

  13. Sensitivity to Over-Provisioning • Lifetime increase achieved by data dependent sparing at various levels of over-provisioning compared with static sparing with 20% over-provisioning

  14. Sensitivity of BCH Capability • Lifetime increase achieved by data dependent sparing relative to static sparing for various BCH code capabilities

  15. Sparing Overhead Reduction • Required over-provisioning for data dependent sparing to match static sparing lifetime

  16. Conclusion • Data Dependent Sparing is a new bad block management technique • Introduces the concept of better-than-bad blocks • Delays the retirement of blocks through engaging better-than-bad block in write operations • Data Dependent Sparing can be used to either extend the lifetime of storage devices or achieve a target lifetime with fewer spares

  17. Thank You!

More Related