220 likes | 483 Views
Error Patterns in MLC NAND Flash Memory: Measurement, Characterization, and Analysis. Yu Cai 1 , Erich F. Haratsch 2 , Onur Mutlu 1 and Ken Mai 1. DSSC, ECE Department, Carnegie Mellon University LSI Corporation. Evolution of NAND Flash Memory.
E N D
Error Patterns in MLC NAND Flash Memory:Measurement, Characterization, and Analysis Yu Cai1, Erich F. Haratsch2 , Onur Mutlu1 and Ken Mai1 • DSSC, ECE Department, Carnegie Mellon University • LSI Corporation
Evolution of NAND Flash Memory • Flash memory widening its range of applications • Portable consumer devices, laptop PCs and enterprise servers CMOS scaling More bits per Cell Seaung Suk Lee, “Emerging Challenges in NAND Flash Technology”, Flash Summit 2011 (Hynix)
Reliability and Endurance Challenges for NAND Flash Memories • Endurance continues to deteriorate • Only a few thousand reliable P/E cycles of NAND Flash memory • Error correction capability requirements of ECC keep increasing • Big gap between MLC flash endurance and storage reliability requirements • Enterprise storage needs >50k P/E cycles
Future NAND Flash Storage Architecture Raw Bit Error Rate Memory Signal Processing Error Correction BER < 10-15 Noisy • Read voltage adjusting • Data scrambler • Data recovery • Soft-information estimation • Hamming codes • BCH codes • Reed-Solomon codes • LDPC codes • Other Flash friendly codes Need to understand NAND flash error patterns
Test System Infrastructure Algorithms Wear Leveling Address Mapping Garbage Collection ECC (BCH, RS, LDPC) • Reset • Erase block • Program page • Read page Control Firmware Signal Processing Software Platform USB PHYChip FPGA USB controller NAND Controller Flash Memories USB Driver Host USB PHY Host Computer USB Daughter Board Mother Board Flash Board
NAND Flash Testing Platform USB Daughter Board USB Jack HAPS-52 Mother Board Virtex-II Pro (USB controller) NAND Daughter Board 3x-nm NAND Flash Virtex-V FPGA (NAND Controller)
NAND Flash Usage and Error Model Read Errors Retention Errors Program Errors Erase Errors Read Errors Retention Errors Start P/E cycle n P/E cycle i P/E cycle 0 Erase Block Program Page (Page0 - Page128) … Retention1 (t1 days) Retention j (tj days) Read Page Read Page … … End of life
Testing Methodology • Erase errors • Count the number of cells that fail to be erased to “11” state • Program interference errors • Compare the data immediately after page programming and the data after the whole block being programmed • Read errors • Continuously read a given block and compare the data between consecutive read sequences • Retention errors • Compare the data read before retention and after retention • Characterize short term retention errors under room temperature • Characterize long term retention errors by baking in the oven under 125℃
Flash Error Rates Comparison • Error rate increases with P/E cycles • Retention errors are the most dominant errors • Retention error rates increase as retention time increase retention errors
Retention Error Mechanism • Electrons loss from the floating gate causes retention errors • Cells with more programmed electrons suffer more from retention errors • Threshold voltage is more likely to shift one interval than multiple intervals LSB/MSB Stress Induced Leakage Current (SILC) Floating Gate REF2 REF1 REF3 10 00 11 01 Vth Erased Fully programmed
Retention Error Value Dependency (3 months) • Cells with more programmed electrons tend to suffer more from retention noise (i.e. 00 and 01) 01 10 00 01
2-bit MLC Background Overview • Internal Architecture of 2-bit NAND Flash Memory LSB-Even Page Sets LSB-Odd Page Sets MSB-Even Page Sets MSB-Odd Page Sets
Retention Error Location Dependency • LSB page has less BER Even Page Cells Odd Page Cells • Even pages have less BER REF1 REF3 REF2 LSB/MSB 00 10 01 11 Vth
Program interference • Program interference errors are caused by extra electrons injection when programming neighbor cells • Cells with less programmed electrons suffer more from interference errors • Threshold voltage is less likely to shift up more than one level LSB/MSB Additional Electrons Injected Floating Gate REF2 REF1 REF3 10 11 01 00 VT Erased Fully programmed
Program Interference Error Value Dependency • Cells with less programmed electrons tend to suffer more from neighboring cell interference (i.e. 11 and 10) 10 01 11 10
Program Interference Error Location Dependency • Program interference errors appear in even-MSB pages • BER of bottom pages are orders of magnitude higher
Write Interference on bottom wordline • Potential of drain edge of SGS transistor is raised by channel boosting • Electrons are accelerated between SGS and WL0 and are quite possible to injected into the floating gate of WL0 • HCI noise generated by source/drain hot-electrons in WL0 • Threshold voltage of cells on WL0 shift right and it can even shift across more than one level (e.g. 11->01 or 00) 0 V Vpgm(20V) Vpass(10V) Vpass(10V) Vdd Vdd bitline SGD WL n WL31 SGS WL0 … … GND 10 V Channel Voltage 0 V -
Read Error Analysis Floating Gate Erased Fully programmed REF2 REF1 REF3 10 00 11 01 VT
Erase Errors Analysis 0 V • Continuous erases can significantly reduce errors • remove residual electrons n+ n+ +18 V
Conclusions & Future work • Flash errors could show up for any operations • Erase error, program error, retention error and read error • Retention errors are the most dominant errors • Flash errors show explainable error patterns • Cycle-dependency, value-dependency and location-dependency • Understanding of modern flash memory error patterns will enable designing effective error tolerance mechanisms • Value-asymmetry aware coding techniques • Cell location-aware wear leveling mechanisms