1 / 21

A Case for Flash Memory SSD in Enterprise Database Applications

A Case for Flash Memory SSD in Enterprise Database Applications. Authors: Sang-Won Lee, Bongki Moon, Chanik Park, Jae-Myung Kim, Sang-Woo Kim Published on SIGMOD2008 Presented by Jin Xiong 11/4/2008. Outline. Flash memory SSD DB storage and workload

ziazan
Download Presentation

A Case for Flash Memory SSD in Enterprise Database Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Case for Flash Memory SSD in Enterprise Database Applications Authors: Sang-Won Lee, Bongki Moon, Chanik Park, Jae-Myung Kim, Sang-Woo Kim Published on SIGMOD2008 Presented by Jin Xiong 11/4/2008

  2. Outline • Flash memory SSD • DB storage and workload • Experimental settings • Transaction log • MVCC rollback segment • Temporary table spaces • Conclusions

  3. Flash memory SSD (1) • Flash memory SSD • NAND-type flash memory • SAMSUNG • Interface: IDE

  4. Flash memory SSD (2) • Characteristics • Uniform random access speed • Purely electronic device, no mechanically moving parts • Access latency is almost linearly proportional to the amount of data irrespective of their physical locations in flash memory. • One of the key characteristics we can take advantage of • Erase before overwriting • Data on SSD cannot be updated in place • Erase unit is much larger than a sector, 128KB vs. 1KB • Erase is time consuming, typically 1-2 ms • Asymmetry of read and write speed • Write is much slower than read on SSD, 0.4 ms vs 0.2 ms in this paper

  5. Flash memory SSD (3) • Hardware logic • Dual channel architecture, 4-way interleaving • Hide flash programming latency and increase bandwidth • 128KB SRAM for program code, data and buffer memory

  6. Flash memory SSD (4) • Firmware: Flash translation layer (FTL) • Address mapping and wear leveling • Address the issue of limited write cycles of each sector • Based-on super-blocks: 1MB, 8 erase units, 2 on each flash chip • Limit the amount of information required for mapping • Trends • Two-fold annual increase in the density • Original used in mobile computing devices • PDA’s, MP3 players, mobile phones, digital cameras • Recently more and more used in portable computers and enterprise server market • Tremendous potential as a new storage medium that can replace magnetic disk and achieve much higher performance for enterprise database servers

  7. DB Storage • Data structures in DB systems • Database tables and indexes • Not within the scope of this paper • Transaction log • Whenever a transaction updates a data object, its log record is created • Must be kept on stable storage for recoverability and durability • Temporary tables • Used to store temporary data required for performing operations such as sorts or joins • Rollback segments • Used in multiversion concurrent control (MVCC)

  8. DB Workload • Typical transactional database workloads, e.g. TPC-C • Little locality and sequentiality • Many synchronous writes • Forced writes of log records at commit time • Must wait until data are written on disk • Prefetching and write buffering are less effective • Performance is limited by disk latency rather than disk bandwidth and capacity • The latency-bandwidth imbalance of disk seems to be more serious in the future • Low latency of SSD • Improve performance significantly

  9. Experimental Settings • Two machines with identical hardware except disk • 1.86 GHz Intel Pentium dual-core processor • 2GB RAM • OS: Linux-2.6.22 • Disk • SSD: Samsung Standard Type, 32GB, PATA (IDE), SLD NAND • HDD: Seagate Barracuda, 250 GB, 7200 rpm, SATA • DB • A commercial database server • Used HDD/SSD as a raw device (not through FS) • Database tables were cached in memory

  10. Transaction log • Synchronous writes • When a transaction commits, it appends a commit type log record to the log, and force-writes the log tail to stable storage • Response time • Tresponse = Tcpu + Tread + Twrite + Tcommit • Tcommit is a significant overhead, waiting disk I/O • Commit time delay is a serious bottleneck • Append-only sequential writes • HDD: no seek delay , avg latency 4.17ms (7200 rpm) • SSD: do not cause expensive merge or erase operations if clean blocks are available

  11. Transaction log • Simple SQL transactions • Multi-threaded concurrent transactions • TPS on SSD is much higher (12x-4x) than that on HDD • The gap is shrinking with the increase of the number of concurrent transactions • HDD: Disk access latency is the bottleneck, low CPU utilization • SSD • Limited by CPU rather than I/O • Saturated CPU utilization, no increase in TPS

  12. Transaction log • TPC-B benchmark performance • A stress test: transaction commit rate is higher than that of TCP-C • Suitable for testing the log storage: a large number of small transactions causing significant forced-write activities • The number of concurrent users: 20 • TPS on SSD is 3.5x • Considerably lower log write latency on SSD • CPU is the bottleneck for SSD

  13. Transaction log • I/O-bound vs CPU-bound • SSD: faster CPU improves TPS • Dual-core: saturated at about 3000 TPS • Quad-core: saturated at about 4300 TPS • HDD: almost no difference

  14. MVCC rollback segment • MVCC — Multiversion concurrency control • An alternative to the traditional concurrency control mechanism based on lock • When updating a data object, its before image is written to a rollback segment, then the new data is applied to it • When reading a data object, search for the correct version on rollback segment • Two advantages • Minimize performance penalty on concurrent updates of transactions, because read consistency is supported without any lock • Support snapshot isolation and time travel queries • Cost • Costly read operation: search through a long list of versions of a data object if it is updated many times

  15. MVCC rollback segment • Write pattern 1 • Append only, sequential write • Multiple streams in parallel • 1MB extent • Write pattern 2 • In-place writes to a small logical region • HDD is expected to perform poorly • Disk arm movement each 1MB • Excessive disk seek • SSD is expected to perform well • No additional cost when there are clean blocks • Reclamation cost can be amortized • Infrequent, every 1MB extent • Slight performance difference • SSD: avg 6.8ms/block • HDD: avg 7.1ms/block

  16. MVCC rollback segment • Read pattern • Clustered, randomly scattered across quite a large logical address space (1GB) • Performance • SSD: 16x faster than HDD

  17. Temporary table spaces • External sort • Typical algorithm • Partitions an input data set into smaller chunks • Sorts the chunks separately • Merges them into a single sorted file • I/O pattern • Sequential write followed by random read • Performance • Sequential write: small difference • Random read: SSD almost 10 times faster

  18. Temporary table spaces • External sort • Effect of cluster size on sort performance • HDD: sort performance is improved with larger cluster size • SSD: sort performance is deteriorated • Reasons: • Larger cluster is good for the first stage, but not good for merging • The second stage dominates the performance • Effect of buffer cache size on sort performance • Performance is improved with larger buffer size in both cases

  19. Temporary table spaces • Hash join • Similarity with sort algorithm • Partition input data set into smaller chunks, and process each chunk separately • Opposite I/O pattern • Random writes followed by sequential reads • Performance • SSD is expected to perform poorly in the first stage • Actual result is unexpected, sequential append-only write in the first stage • SSD is 3 times faster than HDD

  20. Temporary table spaces • Sort-merge join • SSD is 7 times faster • HDD: sort-merge join is two times slower than hash join • SSD: sort-merger join is as fast as hash join

  21. Conclusions • Demonstrated that processing I/O requests for transaction log, rollback and temporary data can become a serious bottleneck for transaction processing • Showed that flash memory SSD can alleviate this bottleneck drastically • Due attention should be paid to SSD in all aspect of DB system design to maximize the benefit from this new technology

More Related