200 likes | 452 Views
FTL Design Exploration in Reconfigurable High-Performance SSD (RHPSSD) for Server Applications. International Conference on Supercomputing June 12, 2009 Ji-Yong Shin 12 , Zeng-Lin Xia 1 , Ning-Yi Xu 1 , Rui Gao 1 , Xiong-Fei Cai 1 , Seungryoul Maeng 2 , and Feng-Hsiung Hsu 1
E N D
FTL Design Exploration in Reconfigurable High-Performance SSD (RHPSSD) for Server Applications International Conference on Supercomputing June 12, 2009 Ji-Yong Shin12, Zeng-Lin Xia1, Ning-Yi Xu1, Rui Gao1, Xiong-Fei Cai1, Seungryoul Maeng2, and Feng-Hsiung Hsu1 1Microsoft Research Asia 2Korea Advanced Institute of Science and Technology
Introduction and Background (1/3) • Growing popularity of flash memory and SSD • Low latency • Low power • Solid state reliability • SSD widening its range of application • Embedded devices • Desktop and laptop PC • Server and supercomputer • SSD expected to revolutionize storage subsystem
Introduction and Background (2/3) • Flash memory • Erase needed before write • Unit of read/write and erase differs • Read/Write: page (typically 2 to 4KB) • Erase: block (typically 64 pages) • Latency for read, write, erase differs • Read (25us) < write (250us) < erase (500us) • Erase carried out on demand: cleaning or garbage collection • Wear-leveling necessary • Memory cells wears out when erased • Typically a block endures 100K erase operations
Introduction and Background (3/3) • Flash translation layer (FTL) • Provides abstraction of flash memory characteristics • Maintains logical to physical address mapping • Carries out cleaning operations • Conducts wear leveling • FTL in multiple flash chip environment • Manages parallelism and wear level among chips Host Machine IO Request Flash Memory FTL Flash Request Flash Request Flash Request Flash Memory module Flash Memory module Flash Memory module
Motivation (1/2) • Servers and Supercomputing Environment • High performance storage subsystem required • Applications are usually fixed • SSD performance characteristics • Highly dependent on FTL design and workloads Customized SSD Can Boost Up Servers and Supercomputers
Motivation (2/2) Based on Reconfigurable High-Performance SSD Architecture, we will explore FTL design considerations and tradeoffs and propose guidelines for customizing FTL
Reconfigurable High-Performance SSD (RHPSSD) • 2. Wear Leveling for Endurance • Among all blocks • Among chips, dies and planes • RHPSSD architecture • High performance • 36 independent flash channels • 4GB/s PCI Express host-to-SSD interface • Flexibility from FPGA for reconfiguring of FTL • Maintaining • High Parallelism • for Performance! Random Access Memory Flash Daughter Board Flash Chip with Independent Channel Flash Chip with Independent Channel Chip Flash Chip with Independent Channel Flash Chip with Independent Channel FPGA Die Die FTL or flash controller flash channel controllers for each flash channel … … Plane Plane Plane Plane Flash Chip with Independent Channel Flash Chip with Independent Channel PCI Express (4GB/s)
FTL Design Exploration and Analysis • Simulation-based method to discover: • Logical page to physical flash plane allocation • Effect of hot/cold data separation • Wear leveling and Cleaning • Cleaning analysis for different allocation • Wear leveling in different clusters
Simulation Environment and Workloads (1/2) • Simulation Environment • Modified DiskSim 4.0 and SSD plug-in of MSR SVC • Various FTL algorithms implemented • Basic Configurations • RHPSSD architecture • Flash chip • Latencies (read - 25us, write - 250us, erase - 500us) • Two types of chip for different SSD capacities • 4GB (2 dies with 2 planes) chip • 8GB (4 dies with 2 planes) chip
Simulation Environment and Workloads (2/2) • Traces used for simulation
Logical Page to Physical Plane Allocation (1/2) • Allocation is directly related to parallelism • Static allocation • Binding logical page address to specific plane • Striping methods • Wide striping, page striping unit: high parallelism, more cleaning • Narrow striping, block striping unit: low parallelism, less cleaning • Dynamic allocation • Allocate page request to idle plane on runtime • Binding logical address to • Chip: less degree of freedom • SSD: maximum degree of freedom Wide Striping Narrow Striping
Logical Page to Physical Plane Allocation (2/2) Response Time Normalized to STATIC W-PAGE
Hot/Cold Data Separation (1/2) • Separating pages according to temperature in each plane • Block with hot data are likely to be full of invalid page • Block with cold data are likely to maintain its condition • Known to reduce erase operation and valid page migration • Also leads to smaller response time
Hot/Cold Data Separation (2/2) Improvement after applying the separation (%)
Wear Leveling and Cleaning • High performance and wear level of SSD is a different story • Static allocation • Logical addresses are bounded to plane so no page migration can take place to the outside of the dedicated plane (only local wear leveling) • Selecting allocation to evenly wear out each plane is important • Dynamic allocation • Wear leveling can be carried out in different clusters (chip, SSD) • Cluster is the scope where the lifetime of blocks will be maintained evenly • The Larger the cluster is, the more even the wear level is in SSD as a whole • The Larger the cluster is, the greater the overhead is
Number of Cleaning and Erase Distributionwithout Wear Leveling # of Operations Normalized to W-Page
Wear Leveling in Different Clusters • Wear leveling cluster • Group of blocks that wear leveling algorithm maintains the age even • The larger the cluster the worse the performance becomes • The larger the cluster the evener the age of blocks are
Summary • Static vs. dynamic allocation • Static wide striping: dominant sequential IO workloads • Page striping unit: small response time, more cleaning • Block striping unit: large response time, less cleaning • Trade off between response time and cleaning operations • Dynamic: dominant random IO workloads • Hot/Cold data separation • Effective for evenly distributed IO • Wear leveling cluster • Large cluster: large overhead, even distribution of wear level • Small cluster: small overhead, uneven distribution of wear level • Trade off between response time and even wear level
Conclusion • Algorithms in each FTL functionality studied for high performance SSD • Tradeoffs and simple guidelines for designing customized FTL in different workload and SSD’s lifetime requirements presented • Please read the paper for more details
Thank you. Questions?