180 likes | 281 Views
Using Flash memories as SIMO channels for extending the lifetime of Solid-State Drives. Maria Varsamou and Theodore Antonakopoulos. 17th International Conference on Electronics , Circuits, and Systems. Department of Electrical and Computers Engineering, University of Patras, Greece
E N D
Using Flash memories as SIMO channels for extending the lifetime of Solid-State Drives Maria Varsamou and Theodore Antonakopoulos 17th International Conference on Electronics, Circuits, and Systems Department of Electrical and Computers Engineering, University of Patras, Greece e-mail: mtvars@upatras.gr,antonako@upatras.grwebsite: www.loe.ee.upatras.gr
Presentation Outline • Introduction • Solid-State Drives • Flash memory technology • Solid-State Drives lifetime • Experiment for NAND flash characterization • Method for extending the flash endurance • Experimental results • Conclusions
Solid-State Drives (SSDs) FCC FCC • SSDs have become a mature solution for consumer and enterprise applications • SSDs have to demonstrate similar or better performance compared to magnetic disks • SSDs performance metrics: • Data reliability (retention and endurance) • I/O performance (kIOPs and latency) • SSDs use Flash memories (SLC/MLC) • SSDs performance depends on: • Used Flash technology • Supported workload • Internal architecture • High-level functions • Flash memory demonstrates a time-varying behavior in terms of raw BER and wears out as workload (P/E cycles) increases Flash Die Flash Die Host Interface Main Processor Flash Channel #1 DRAM memory DRAM memory Flash Die Flash Die Flash Channel #M DMA engines
Control gate Floating gate Bulk Ids Flash memory cell Vcc • VT is shifted by injecting electrons into the floating gate; • VTis shifted back by removing the electrons. Erased “1” Programmed “0” R Vcell “1” “0” icell array cell Vcg Floating gate isolated in oxide Vcg Vt • Programming = Electrons stored on the FG = High Vt • Erasing = Remove electrons from the FG = Low Vt • Threshold Voltage shift = DQFG/CCG
Flash memory error conditions Retention:capability of keeping the stored information in time. Endurance:capability of maintaining the stored information after erase, program and read cycling. High voltages are applied during block erase (all pages of a block) page program (all cells of a page and adjacent pages) And a high electric field is applied to the tunnel oxide and that results to oxide aging. • More frequent error conditions • Variations on the stored charge (more permanent errors) • Variations on the detected voltage during read (more temporary errors) • Shift in operating margin (more permanent errors) • Probability 1 0 is much higher than the probability 0 1 • 1 0 typical error condition during the life-time of a flash cell • 0 1 error condition only at the end of the life-time of a flash cell
Typical NAND IC Architecture • A number of NAND Flash cells forms a page • A number of pages forms a block • Read/Write per page • Erase per block • Overwriting is not permitted • NAND Interfaces • ONFI 1.0 Asynchronous 40 Mbytes/sec • ONFI 2.0 Synchronous 166/200 Mbytes/sec
NAND Read/Write Page Write Page Read
Experimental Setup for Flash Characterization ML507 Flash board Host Computer JTAG USB Ethernet Ethernet MATLAB Main Memory PPC440 Virtex5 FPGA TCP/IP Flash Memory Chips Kernel ONFI 2.0 Flash Interface Ethernet
Raw BER of SLC Flash memory Worst Page Bit Error Ratio (BER) Best Page Block
Methods for extending the Lifetime of SSDs S: user space E : endurance (number of P/E cycles)V : user written space per time unitA : write amplification • Error Correction Codes (BCH, RS, LDPC etc., additional parity information) • Wear-leveling (System level, intra-block) • Exploit the characteristics of the error insertion mechanism (proposed method)
SLC Endurance Measurements Tx Rx 1 1 0 0 • Page Size:4320 bytes • Experiment: Erase block, Write all pages with random data, Read all pages, Compare Target user BER: 10-15 SLC Channel Model
Extending the endurance • BCH (n, k) code: error correction capability of t-error bits • BCH error correction capability can be extended to 2t using erasures • The errors can be: • Write and Read related • Permanent and Temporary • The SLC channel inserts errors that only change the bits from ‘1’ to ‘0’ • We read the corrupted page additional times and estimate erasures according to bit differences Read #1 SIMO + nR Write + + SISO nR Read #N + nW nR
Proposed Correction Method Hardware complexity • The correction mechanism is activated only when the user data can not be recovered. • In this case, a small delay is introduced, comparable with the delay introduced during BCH decoding.
Performance of the proposed method As the number of read cycles increases, the method's performance also improves, but with less gain.
Effect on Flash Controller I/O Performance FCC FCC Flash Controller Architecture • Today’s high performance SSDs support: • Large number of Flash channels, usually 16 • A few Gbytes of SLC memory are used per channel • Host interface data rate of a few Gbps • Expected I/O Rate for 16 channels: • ~ 300 KIOPs, no pipeline • ~ 600 KIOPs, with pipeline • Measured I/O Rate: ~ 120 KIOPs • Limited by: • Internal architecture • Latency introduced by ECC • Flash related functions (wear leveling, garbage collection, etc.) Flash Die Flash Die Host Interface Main Processor Flash Channel #1 DRAM memory DRAM memory Flash Die Flash Die Flash Channel #M DMA engines The proposed method for extending flash endurance does not decrease the SSD’s storage efficiency (no additional parity) and does not affect the I/O performance as long as the used ECC can correct all errors.
Conclusions • The lifetime of an SSD can be extended by improving the endurance of its Flash memories • A method that exploits the error characteristics of SLC Flash memory to identify possible error locations was proposed: • Sustains the memory endurance for a few tens of thousands P/E cycles • Limited hardware complexity • No additional parity bits are required (No decrease the SSD’s storage efficiency). • Does not affect the SSD’s I/O performance during normal operation and as long as the used ECC scheme can recover any corrupted data.