1 / 34

DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings

DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings. A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS 2009 Shimin Chen, Big Data Reading Group. Introduction. Goal: improve performance of flash-based devices for workloads with random writes

holland
Download Presentation

DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn StateASPLOS 2009 Shimin Chen, Big Data Reading Group

  2. Introduction • Goal: improve performance of flash-based devices for workloads with random writes • New Proposal: DFTL (Demand-based FTL) • FTL: flash translation layer) • FTL maintains a mapping table: virtual  physical address

  3. Outline • Introduction • Background on FTL • Design of DFTL • Experimental Results • Summary

  4. Basics of Flash Memory • OOB (out-of-band) area: • ECC • Logical page number • State: erased/valid/invalid

  5. Flash Translation Layer • Maintain mapping: • Virtual address (exposed to upper level) physical address (on flash) • Use a small, fast SRAM for storing this mapping • Hide erase operation to the above • Avoiding in-place update • Updating a clean page • Performing garbage collection and erasure • Note: • OOB has the physical  virtual mapping • FTL virtual  physical mapping can be rebuilt (at restart)

  6. Page-Level FTL • Keep page to page mapping table • Pro: can map any logical page to any physical page • Efficient flash page utilization • Con: mapping table is large • E.g., 16GB flash, 2KB flash page, requires 32MB SRAM • As flash size increases, SRAM size must scale • Too expensive!

  7. Block-Level FTL • Keep block to block mapping • Pro: small • Mapping table size reduced by a factor of (block size / page size) ~ 64 times • Con: page number offset within a block is fixed • Garbage collection overheads grow

  8. Hybrid FTLs (a generic description) LPN: Logical Page Number • Data blocks: block-level mapping • Log/update blocks: page-level mapping

  9. Operations in Hybrid FTLs • Update on data blocks: write to log blocks • Log region is small (e.g., 3% of total flash size) • Garbage collection (gc) • When no free log blocks are available, invoke gc to merge log blocks with data blocks

  10. Full Merge can be Recursive thus Expensive • Often resulted from random writes

  11. Outline • Introduction • Background on FTL • Design of DFTL • Experimental Results • Summary

  12. DFTL Idea • Avoid expensive full merges totally • Do not use log blocks at all • Idea: • Use page-level mapping • Keep the full mapping on flash to reduce SRAM use • Exploit temporal locality in workloads • Dynamically load / unload page-level mappings into SRAM

  13. DFTL Architecture Global mapping table

  14. DFTL Address Translation Case 1: request_LPN hits in cache mapping table Done. Retrieve the mapping directly Global mapping table

  15. DFTL Address Translation Case 2: a miss in cache mapping table (CMT) If (CMT is not full) then look up GDT read the translation page fill in CMT entry goto case 1 Global mapping table

  16. DFTL Address Translation Case 3: a miss in cache mapping table (CMT) If (CMT is full) then select CMT entry to evict (~LRU) write back dirty entry goto case 2 Global mapping table

  17. Address Translation Cost • Worst case cost (case 3) • 2 translation page reads • 1 translation page write • Temporal locality: • More hits, fewer misses, fewer evictions • CMT contains multiple mappings in a single translation page • Batch updates

  18. Data Read • Address translation: LPN  PPN • Read the data page PPN

  19. Writes • Current data block • Updated data page is appended into current data block • Current translation block • Updated translation page is appended into current translation block • Until number of free blocks < GC_threshold

  20. Garbage Collection • Select a victim block [15] Kawaguchi et al. 1995

  21. Garbage Collection • If selected victim block is a translation block • Copy valid page to a free translation block • Update GTD (global translation directory) • If selected victim block is a data block • Copy valid page to a free data block • Update the page-level translation for each data block • Possibly update CMT entry (if so, done) • Locate translation page, update it, change GTD • Batch update opportunities if multiple page-level translations are in the same translation page

  22. Benefits • Page-level mapping: • No expensive full merge operations • Better random write performance as a result • But random writes are still worse than sequential • more CMT misses, more translation page writes • Data pages in a block are more scattered • GC costs higher: less opportunities for batch updates

  23. Outline • Introduction • Background on FTL • Design of DFTL • Experimental Results • Summary

  24. FTL Schemes Implemented • FlashSim simulator • The authors enhanced DiskSim • Block-based FTL • A state-of-the-art hybrid FTL (FAST FTL) • DFTL • An idealized page-based FTL

  25. Experimental Setup • Model 32GB flash memory, 2KB page, 128KB block • Timing is displayed in Table 1

  26. Traces Used in Experiments

  27. Block Erases Baseline: idealized page-level FTL

  28. Extra Read/Write Operations 63% CMT hits for financial

  29. Response Times (from tech report)

  30. CDF

  31. CDF address translation overhead shows up

  32. CDF FAST has a long tail

  33. Figure 10. Microscopic analysis

  34. Summary • Demand-based page-level FTL • Two-level page table: • (Flash) Translation page: LPN to PPN entries • (SRAM) Global translation directory: translation page entries • Mapping cache in SRAM

More Related