FTL Design Exploration in Reconfigurable High-Performance SSD (RHPSSD) for Server Applications

FTL Design Exploration in Reconfigurable High-Performance SSD (RHPSSD) for Server Applications International Conference on Supercomputing June 12, 2009 Ji-Yong Shin12, Zeng-Lin Xia1, Ning-Yi Xu1, Rui Gao1, Xiong-Fei Cai1, Seungryoul Maeng2, and Feng-Hsiung Hsu1 1Microsoft Research Asia 2Korea Advanced Institute of Science and Technology

Introduction and Background (1/3) • Growing popularity of flash memory and SSD • Low latency • Low power • Solid state reliability • SSD widening its range of application • Embedded devices • Desktop and laptop PC • Server and supercomputer • SSD expected to revolutionize storage subsystem

Introduction and Background (2/3) • Flash memory • Erase needed before write • Unit of read/write and erase differs • Read/Write: page (typically 2 to 4KB) • Erase: block (typically 64 pages) • Latency for read, write, erase differs • Read (25us) < write (250us) < erase (500us) • Erase carried out on demand: cleaning or garbage collection • Wear-leveling necessary • Memory cells wears out when erased • Typically a block endures 100K erase operations

Introduction and Background (3/3) • Flash translation layer (FTL) • Provides abstraction of flash memory characteristics • Maintains logical to physical address mapping • Carries out cleaning operations • Conducts wear leveling • FTL in multiple flash chip environment • Manages parallelism and wear level among chips Host Machine IO Request Flash Memory FTL Flash Request Flash Request Flash Request Flash Memory module Flash Memory module Flash Memory module

Motivation (1/2) • Servers and Supercomputing Environment • High performance storage subsystem required • Applications are usually fixed • SSD performance characteristics • Highly dependent on FTL design and workloads Customized SSD Can Boost Up Servers and Supercomputers

Motivation (2/2) Based on Reconfigurable High-Performance SSD Architecture, we will explore FTL design considerations and tradeoffs and propose guidelines for customizing FTL

Reconfigurable High-Performance SSD (RHPSSD) • 2. Wear Leveling for Endurance • Among all blocks • Among chips, dies and planes • RHPSSD architecture • High performance • 36 independent flash channels • 4GB/s PCI Express host-to-SSD interface • Flexibility from FPGA for reconfiguring of FTL • Maintaining • High Parallelism • for Performance! Random Access Memory Flash Daughter Board Flash Chip with Independent Channel Flash Chip with Independent Channel Chip Flash Chip with Independent Channel Flash Chip with Independent Channel FPGA Die Die FTL or flash controller flash channel controllers for each flash channel … … Plane Plane Plane Plane Flash Chip with Independent Channel Flash Chip with Independent Channel PCI Express (4GB/s)

FTL Design Exploration and Analysis • Simulation-based method to discover: • Logical page to physical flash plane allocation • Effect of hot/cold data separation • Wear leveling and Cleaning • Cleaning analysis for different allocation • Wear leveling in different clusters

Simulation Environment and Workloads (1/2) • Simulation Environment • Modified DiskSim 4.0 and SSD plug-in of MSR SVC • Various FTL algorithms implemented • Basic Configurations • RHPSSD architecture • Flash chip • Latencies (read - 25us, write - 250us, erase - 500us) • Two types of chip for different SSD capacities • 4GB (2 dies with 2 planes) chip • 8GB (4 dies with 2 planes) chip

Simulation Environment and Workloads (2/2) • Traces used for simulation

Logical Page to Physical Plane Allocation (1/2) • Allocation is directly related to parallelism • Static allocation • Binding logical page address to specific plane • Striping methods • Wide striping, page striping unit: high parallelism, more cleaning • Narrow striping, block striping unit: low parallelism, less cleaning • Dynamic allocation • Allocate page request to idle plane on runtime • Binding logical address to • Chip: less degree of freedom • SSD: maximum degree of freedom Wide Striping Narrow Striping

Logical Page to Physical Plane Allocation (2/2) Response Time Normalized to STATIC W-PAGE

Hot/Cold Data Separation (1/2) • Separating pages according to temperature in each plane • Block with hot data are likely to be full of invalid page • Block with cold data are likely to maintain its condition • Known to reduce erase operation and valid page migration • Also leads to smaller response time

Hot/Cold Data Separation (2/2) Improvement after applying the separation (%)

Wear Leveling and Cleaning • High performance and wear level of SSD is a different story • Static allocation • Logical addresses are bounded to plane so no page migration can take place to the outside of the dedicated plane (only local wear leveling) • Selecting allocation to evenly wear out each plane is important • Dynamic allocation • Wear leveling can be carried out in different clusters (chip, SSD) • Cluster is the scope where the lifetime of blocks will be maintained evenly • The Larger the cluster is, the more even the wear level is in SSD as a whole • The Larger the cluster is, the greater the overhead is

Number of Cleaning and Erase Distributionwithout Wear Leveling # of Operations Normalized to W-Page

Wear Leveling in Different Clusters • Wear leveling cluster • Group of blocks that wear leveling algorithm maintains the age even • The larger the cluster the worse the performance becomes • The larger the cluster the evener the age of blocks are

Summary • Static vs. dynamic allocation • Static wide striping: dominant sequential IO workloads • Page striping unit: small response time, more cleaning • Block striping unit: large response time, less cleaning • Trade off between response time and cleaning operations • Dynamic: dominant random IO workloads • Hot/Cold data separation • Effective for evenly distributed IO • Wear leveling cluster • Large cluster: large overhead, even distribution of wear level • Small cluster: small overhead, uneven distribution of wear level • Trade off between response time and even wear level

Conclusion • Algorithms in each FTL functionality studied for high performance SSD • Tradeoffs and simple guidelines for designing customized FTL in different workload and SSD’s lifetime requirements presented • Please read the paper for more details

Thank you. Questions?

FTL Design Exploration in Reconfigurable High-Performance SSD (RHPSSD) for Server Applications

FTL Design Exploration in Reconfigurable High-Performance SSD (RHPSSD) for Server Applications

Presentation Transcript

Trade-offs in High-Performance Numerical Library Design

Reconfigurable Computing - Options in Circuit Design

Design of a Reconfigurable Hardware

Reconfigurable Radio Design

Performance and Overhead in a Hybrid Reconfigurable Computer

Reconfigurable Computing - Performance Issues

High-Performance System Design

Design challenges in sub-100nm high performance microprocessors

Platform-Based Reconfigurable Computing Design

Reconfigurable Communication System Design

Design Exploration

High Performance Tray Design

High Performance in Trading

Reconfigurable Computing - Multipliers: Options in Circuit Design

High-Level Programming of High-Performance Reconfigurable Computers: MAPLD BOF-H2 Panel

Automating Resource Optimisation in Reconfigurable Design

Reconfigurable Computing - Options in Circuit Design

Reconfigurable Communication System Design

Design Space Exploration

Reconfigurable Computing - Verifying Circuit Performance!

HIGH PERFORMANCE

FTL SHIPPING SERVICES