BPLRU: A Buffer Management Scheme for Improving Random Writes in Flash Storage

BPLRU: A Buffer Management Scheme for Improving Random Writes in Flash Storage Origianal Work Of Hyojun Kim and SeongjunAhn Software Laboratory of Samsung Electronics, Korea Presented At : FAST'08 , March, 2008 NehaSahay and SreeramPotluri

Flash!! Flash!! • Speed of traditional hard disks in bound down by the speed of mechanical parts. • Decreasing costs (by 50% per year) of flash presents us with an alternative. • Advantages of Flash : High random read performance Very low power consumption Smaller and portable Shock resistance • Robust • Disadvantages of Flash: Very poor random write performance Limited Life time (100,000 erases for SLC NAND and 10,000 for MLC NAND)

Outline • Characteristics of Flash • Flash Translation Layer • Existing Techniques and Related Work • BPLRU • Implementation Details • Evaluation • Conclusion

Characteristics of Flash • Planes, blocks and pages. • Erased before programmed. Random rewrites are not allowed. • Read/Write in pages but we erase in blocks. • Effectively we write sequentially within a page boundary. • Erase operation takes a much longer time. • Requires wear-leveling. • An FTL masks these properties and emulates a normal hard disk. Flash memory has poor performance for random writes while it has good read and sequential write performance.

Flash Translation Layer • Emulates hard disk and provides logical sector updates. • Types : • Page Mapping • Maintains mapping information at the page level • Requires large amount of memory for mapping information. • Block Mapping • Maintains mapping information at the block level • A page update requires a whole block update. • Hybrid Mapping • Maintains block level mapping but page position is not fixed inside a block. • Requires additional offset-level information. • Other Mapping Techniques • Exploited write locality using some reserved locations. • Effective algorithms can be applied for these reserved locations while simple block mapping for others

Flash Translation Layer • Log-Block FTL • Writes to a log block that use a fine-grained mapping policy. • Once full it is merged with the older block and written to a new block. • The older location and the log block become free blocks. • Full Merge and Switch Merge P0 P0 Valid P1 P2 Invalid P2 P2 Valid P3 P3 Invalid P4 P3 Valid Data Block New Block Log Block

Flash Aware Caches • Use of RAM Buffer inside SSDs • Clean First LRU (CFLRU) • Chooses a clean page as a victim rather than a dirty page. • Flash Aware Buffer Policy (FAB) • Buffers that belong to the same erasable block are grouped together. • The block with maximum number of buffers is evicted. • Works well for sequential writes. Effective than LRU. • Related Work – DULO – proposed by Zhang et al. • Exploits both temporal and spatial locality. • Dual locality caching. P31 P11 P21 P12 P32 P13

BPLRU – Block Padding LRU • Applied to write buffer inside SSDs. • Reads are simply redirected to the FTL. • Coverts random writes to sequential writes. • Three Pronged • Block-level LRU • Page Padding • LRU Compensation

Block-Level LRU • RAM Buffers are grouped in blocks that have same size as erasable block size in NAND. • Groups all pages in the same erasable block range into one buffer block. • Least recently used block is selected as the victim instead of a page. MRU Block LRU Block 0 0 12 9 1 1 9 5 5 6 6 15 19 19 15 6 Referenced 12

Block-Level LRU • Example – 0,4,8,12,16,1,5,9,13,17,2,6,10. • 2 Log blocks and 2 pages can reside on write buffer. • 12 Merges in FTL while only 7 merges in Block-Level LRU.

Page Padding • Replaces expensive full merge to switch merge

LRU Compensation • To compensate for sequential writes

Implementation • Two-level indexing using two sets of nodes, Block Header Nodes and Sector Nodes. • Two link points for LRU(nPrev, nNext), Block Number(nLbn), Number of sectors in a Block(nNumOfSct) and Sector Buffer(aBuffer). • For Sector Nodes, aBuffer[] contains contents of writing sector. • For Block Header Nodes, it contains secondary index table pointing to its child nodes. • Faster searching of sector nodes; memory overhead is the cost.

Evaluation MS Office Installation task (NTFS) • 43% faster throughput than FAB for 16-MB buffer. • 41% lower erase count than FAB for 16-MB buffer.

Evaluation Temporary Internet files of Internet Explorer (NTFS) • Performance slightly worse than FAB for buffers of size less than 8 MB. • For buffer size greater than 8MB, performance improves. • Erase count always less than FAB.

Evaluation HDD test of PCMark 05 (NTFS) • Performance and erase count very similar to the previous Temporary Internet Files test.

Evaluation Random writes by Iometer (NTFS) • No locality exists in Iometer. • FAB shows better write performance, getting better with bigger buffer sizes. • BPLRU shows better erase counts due to page padding.

Evaluation Copying MP3 Files (FAT16) • 90 MP3 files with an average size of 4.8 MB. • Sequential write pattern.

Evaluation P2P File Download, a 634-MB file (FAT 16) • Peer-to-peer program randomly writes small parts of a file as different parts of the file are getting downloaded concurrently from numerous peers. • This graph illustrates the poor performance of flash storage for random writes. • FAB requires more RAM for better performance. • Performance improves significantly by BPLRU.

Evaluation Untar Linux Source Files • From linux-2.6.21.tar.gz (EXT3). • BPLRU shows 39% better throughput than FAB.

Evaluation Kernel Compile • With Linux-2.6.21 sources (EXT3). • BPLRU shows 23% better performance than FAB.

Evaluation Postmark • Evaluation the performance of I/O subsystems. • One of file creation, deletion, read or write is executed at random. NTFS FAT16 EXT3

Evaluation Buffer Flushing Effect • File systems use buffer flush command to ensure data integrity. • Reduces the effect of write buffering. • With a 16-MB buffer reduces the throughput by approximately 23%.

Conclusion • The proposed BPLRU scheme is more effective than the previous two methods, LRU and FAB. • Two important issues still remain, • When a RAM buffer is used, integrity of file system may be damaged due to sudden power failures. • Frequent buffer flush commands from the host computer degrades BPLRU performance. • Future Research, • Hardware like small battery or capacitor, or non volatile magneto resistive RAM or ferroelectric RAM. • Host side buffer cache policy similar as in the storage device. • Read requests with a much bigger RAM capacity and an asymmetrically weighted buffer management policy.

BPLRU: A Buffer Management Scheme for Improving Random Writes in Flash Storage