240 likes | 364 Views
Turbocharging the DBMS Buffer Pool using an SSD. Jaeyoung Do, Donghui Zhang , Jignesh M. Patel, David J. DeWitt, Jeffrey F. Naughton, Alan Halverson. Memory Hierarchy . For over three decades…. Now: a disruptive change…. Cache. SSD wisdom: Store hot data.
E N D
Turbocharging the DBMS Buffer Pool using an SSD Jaeyoung Do, Donghui Zhang, Jignesh M. Patel, David J. DeWitt, Jeffrey F. Naughton, Alan Halverson
Memory Hierarchy For over three decades… Now: a disruptive change… Cache • SSD wisdom: • Store hot data. • Store data with random-I/O access. DRAM Fast random I/Os; but expensive. ?? Disk HDD SSD
Take Home Message • Use an SSD to extend the Buffer Pool. • Implemented in Microsoft SQL Server 2008R2. • Evaluated with TPC-C, E, and H. • Up to 9X speedup.
Prior Art • [Holloway09] A. L. Holloway. Chapter 4: Extending the Buffer Pool with a Solid State Disk. In Adapting Database Storage for New Hardware, UW-Madison Ph.D. thesis, 2009. • [KV09] Koltsidas and Viglas. The Case for Flash-Aware Multi-Level Caching. University of Edinburgh Technical Report, 2009. • [KVSZ10] B. M. Khessib, K. Vaid, S. Sankar, and C. Zhang. Using Solid State Drives as a Mid-Tier Cache in Enterprise Database OLTP Applications.TPCTC’10. • [CMB+10] M. Canim, G. A. Mihaila, B. Bhattacharjee, K. A. Ross, and C. A. Lang. SSDBufferpool Extensions for Database Systems. In VLDB’10. State-of-the-art: Temperature-Aware Caching (TAC)
Research Issues • Page flow • SSD admission policy • SSD replacement policy • Implication on checkpoint
Implemented Designs • Temperature-Aware Caching (TAC) • Dual-Write (DW) • Lazy-Cleaning (LC)
Page Flow TAC writes a clean page to the SSD right after reading from the disk. Buffer pool Buffer pool Buffer pool C C C C C C C Disk SSD BP Disk SSD BP Disk SSD BP BP Operations: read evict read modify evict
Page Flow DW/LC writes a clean page to the SSD upon eviction from BP. Buffer pool Buffer pool Buffer pool C C C C C C C C C Disk SSD BP Disk SSD BP Disk SSD BP BP Operations: read evict read modify evict
Page Flow Read from the SSD: same for all. Buffer pool Buffer pool Buffer pool C C C C C C C C C Disk SSD BP Disk SSD BP Disk SSD BP BP Operations: read evict read modify evict
Page Flow Upon dirtying a page, TAC does not reclaim the SSD frame. Buffer pool Buffer pool Buffer pool D D D C C C I I C C C I C C C Disk SSD BP Disk SSD BP Disk SSD BP BP Operations: read evict read modify evict
Page Flow • Upon evicting a dirty page: • TAC and DW are write through; • LC is write back. Buffer pool Buffer pool Buffer pool D D D Lazy cleaning I I D C C C Disk SSD BP Disk SSD BP Disk SSD BP BP Operations: read evict read modify evict
SSD Admission/Replacement Policies • TAC • Admission: if warmer than the coldest SSD page. • Replacement: the coldest page. • DW/LC • Admission: if loaded from disk using a random I/O. • Replacement: LRU2.
Implication on Checkpoint • TAC/DW • No change, because every page in the SSD is clean. • LC • Needs change, to handle the dirty pages in the SSD.
TPC-C Q: Why is LC so good? A: Because TPC-C is update intensive. In LC, dirty pages in the SSD are frequently re-referenced. Speedup Relative to noSSD LC is 9X better than noSSD, or 5X better than DW/TAC. 83% of the SSD references are to dirty SSD pages.
TPC-E Q: Why do the three designs have similar speedups? A: Because TPC-E is read intensive. Speedup Relative to noSSD Q: Why does the highest speedup occur for 200GB database? A: For 400GB, a smaller fraction of data is cached in the SSD; For 100GB, a larger fraction of data is cached in the memory BP.
TPC-H Q: Why are the speedups smaller than in C or E? A: Because most I/Os aresequential. For random I/Os: Fusion is 10X faster; For sequential I/Os: 8x disks are 1.4X faster. Speedup Relative to noSSD
Disks are the Bottleneck SSD 8 Disks capacity reached! about half capacity I/O traffic to the disks and SSD, for TPC-E 200GB. As long as disks are the bottleneck… Using less expensive SSDs may be good enough.
Long Ramp-up Time TPC-E (200GB) Q: Why does rampup take 10 hours? A: Because the SSD is being filled slowly, gated by the random read speed of the disks. If restarts are frequent… Restart from the SSD may reduce rampup time.
Conclusions • SSD buffer pool extension is a good idea. • We observed a 9X speedup (OLTP) and a 3X speedup (DSS). • The choice of design depends on the update frequency. • For update-intensive (TPC-C) workloads: LCwins. • For read-intensive (TPC-E or H) workloads: DW/LC/TAC have similar performance. • Mid-range SSDs may be good enough. • With 8 disks, only half of FusionIO’s bandwidth is used. • Caution: rampup time may be long. • If restarts are frequent, the DBMS should restart from the SSD.
Architectural Change Buffer Manager BP Buffer Manager BP SSD Manager I/O Manager I/O Manager Disk Disk SSD BP
Further Issues • Aggressive filling • SSDthrottle control • Multi-page I/O request • Asynchronous I/O handling • SSD partitioning • Gather write