1 / 26

Design Patterns for Tunable and Efficient SSD-based Indexes

Design Patterns for Tunable and Efficient SSD-based Indexes. Ashok Anand , Aaron Gember -Jacobson , Collin Engstrom , Aditya Akella. Large hash-based indexes. ≈20K lookups and inserts per second (1Gbps link). ≥ 32GB hash table. WAN optimizers [ Anand et al. SIGCOMM ’08].

Download Presentation

Design Patterns for Tunable and Efficient SSD-based Indexes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design Patterns for Tunable and Efficient SSD-based Indexes Ashok Anand, Aaron Gember-Jacobson, Collin Engstrom, AdityaAkella

  2. Large hash-based indexes ≈20K lookups and inserts per second (1Gbps link) ≥ 32GB hash table WAN optimizers[Anand et al. SIGCOMM ’08] VideoProxy[Anand et al. HotNets ’12] De-duplicationsystems[Quinlan et al. FAST ‘02]

  3. Use of large hash-based indexes Where to store the indexes? WANoptimizers VideoProxy De-duplicationsystems

  4. Where to store the indexes? SSD 8x less 25x less

  5. What’s the problem? • Need domain/workload-specific optimizations for SSD-based index with↑ performance and ↓overhead • Existing designs have… • Poor flexibility – target a specific point in the cost-performance spectrum • Poor generality – only apply to specific workloads or data structures False assumption!

  6. Our contributions • Design patterns that ensure: • High performance • Flexibility • Generality • Indexes based on these principles: • SliceHash • SliceBloom • SliceLSH

  7. Outline • Problem statement • Limitations of state-of-the-art • SSD architecture • Parallelism-friendly design patterns • SliceHash (streaming hash table) • Evaluation

  8. State-of-the-art SSD-based index • BufferHash[Anand et al. NSDI ’10] • Designed for high throughput 0 1 In-memoryincarnation 2 3 KA,VA 2 2 #( ) K Bloom filter KC,VC 0 0 0 K,V K,V incarnation KB,VB 0 KA,VA 1 1 1 K,V K,V K,V 1 2 2 2 K,V K,V 2 KC,VC 3 3 3 K,V K,V 3 KB,VB 4 bytes perK/V pair! 16 page reads in worst case! (average: ≈1)

  9. State-of-the-art SSD-based index • SILT [Lim et al. SOSP ‘11] • Designed for low memory + high throughput 0 1 Hash table 2 Target specific workloads and objectives → poor flexibility and generality 3 KA,VA K,V K,V KB,VB K,V Index KC,VC K,V Log Sorted Hash K,V 0 1 2 Do not leverage internal parallelism 3 ≈0.7 bytesper K/V pair 33 page reads in worst case!(average: 1) High CPU usage!

  10. SSD Architecture Flash mempackage 1 Die 1 Plane 1 Block 1 Block 2 Page 1 Page 1 Plane 2 Page 2 Page 2 Die n How does the SSD architecture inform our design patterns? … Flash mempkg4 Plane 1 Data register … Plane 2 Channel 1 SSD controller … Channel 32 Flash mempkg 125 Flash mempkg 126 Flash mempkg 128 …

  11. Four design principles • Store related entries on the same page • Write to the SSD at block granularity • Issue large reads and large writes • Spread small reads across channels Flash memorypackage 1 Flash memorypackage 1 Page 1 Page 2 SliceHash Block 2 Block 1 Flash memory package 4 Flash memory package 4 Page 1 … Channel 1 … Channel 32

  12. I. Store related entries on the same page • Many hash table incarnations, like BufferHash 2 3: K,V 0: K,V 1: K,V 4 5: K,V 6: K,V 7: K,V 5 5 #( ) K Incarnation Page 4: K,V 5 6: K,V 7: K,V Multiple page reads per lookup! 0: K,V 1 2: K,V 3: K,V Sequential slots from a specific incarnation 2: K,V 3 0: K,V 1: K,V 4: K,V 5: K,V 6 7: K,V

  13. I. Store related entries on the same page • Many hash table incarnations, like BufferHash • Slicing: store same hash slot from all incarnations on the same page 0: K,V 0: K,V 2 0: K,V 3: K,V 0: K,V 1: K,V 4 1: K,V 5: K,V 1 1: K,V 6: K,V 7: K,V 5 Only 1 pageread per lookup! 2 0: K,V 2: K,V 1 2: K,V 2: K,V 3: K,V Slice Page 4: K,V 3: K,V 5 3: K,V 6: K,V 3 7: K,V 4 4: K,V 4: K,V 5: K,V 5 5: K,V 6: K,V 6: K,V 6 Specific slot from all incarnations 7: K,V 7: K,V 7: K,V Incarnation 2: K,V 3 0: K,V 1: K,V 4: K,V 5: K,V 6 7: K,V

  14. II. Write to the SSD at block granularity • Insert into a hash table incarnation in RAM • Divide the hash table so all slices fit into one block SliceTable 0 0: K,V 0: K,V 0: K,V KA,VA 1 1: K,V 1 1: K,V KD,VD Block KA,VA 2 KF,VF 2 2: K,V 2: K,V KD,VD 3 KE,VE 3: K,V 3: K,V 3 KF,VF 4 KC,VC KE,VE 4 4: K,V 4: K,V 5 KB,VB KC,VC 5: K,V 5 5: K,V 6 KB,VB 6: K,V 6: K,V 6 7 7: K,V 7: K,V 7: K,V Incarnation

  15. III. Issue large reads and large writes Packageparallelism Package 1 Package 2 Package 3 Package 4 Reg Page Reg Page Reg Reg Page Page Channelparallelism Page size Channel 1 Channel 2

  16. III. Issue large reads and large writes SSD assigns consecutive chunks (4 pages/8KB) to different channels Block size Channelparallelism

  17. III. Issues large reads and large writes 0: K,V 0: K,V 0: K,V 0 1: K,V 1 1: K,V 1: KA,VA 2 2: K,V 2: K,V 2: KD,VD • Read entire SliceTableinto RAM • Write entire SliceTableonto SSD 3: K,V 3: K,V 3 3: KF,VF 0 KA,VA 1: KA,VA KD,VD 2: KD,VD KF,VF 3: KF,VF 0 0: K,V 0: K,V 0: K,V 0: K,V 0: K,V 0: K,V 1 1: K,V 1: K,V 1 1 1: K,V 1: K,V 2 (Block) 2 2 2: K,V 2: K,V 2: K,V 2: K,V 3 3: K,V 3: K,V 3: K,V 3: K,V 3 3 4 4: K,V 4: K,V 5: K,V 5 5: K,V 6: K,V 6: K,V 6 7: K,V 7: K,V 7: K,V

  18. IV. Spread small reads across channels • Recall: SSD writes consecutive chunks (4 pages) of a block to different channels • Use existing techniques to reverse engineer [Chen et al. HPCA ‘11] • SSD uses write-order mapping channel for chunk i = i modulo (# channels)

  19. IV. Spread small reads across channels • Estimate channel using slot # and chunk size • Attempt to schedule 1 read per channel 2 1 4 5 0 Channel 0 Channel 1 1 (slot # * pages per slot) modulo (# channels * pages per chunk) ( * pages per slot) modulo (# channels * pages per chunk) Channel 2 Channel 3 4 1

  20. SliceHash summary Specific slot from all incarnations SliceTable 4 4: K,V 4: K,V KA,VA 5: K,V 5 5: K,V KD,VD 0 0: K,V 0: K,V 0: K,V KA,VA Page Slice KF,VF 6: K,V 6: K,V 6 1 1: K,V 1 1: K,V KD,VD Block KE,VE 7: K,V 7: K,V 7: K,V 2 0: K,V 0: K,V 0: K,V KF,VF 2 2: K,V 2: K,V KC,VC 3 1: K,V 1 1: K,V KE,VE 3: K,V 3: K,V 3 KB,VB 4 KC,VC 2 2: K,V 2: K,V 4 4: K,V 4: K,V 5 KB,VB 3: K,V 3: K,V 3 5: K,V 5 5: K,V 6 6: K,V 6: K,V 6 7 Incarnation 7: K,V 7: K,V 7: K,V Read/write when updating In-memoryincarnation

  21. Evaluation: throughput vs. overhead See paper for theoretical analysis 128GBCrucial M4 8B key8B value ↑15% ↑2.8x ↑6.6x ↓12% 50% insert50% lookup 2.26Ghz4-core

  22. Evaluation: flexibility • Trade-off memory for throughput 50% insert50% lookup Use multiple SSDs for even ↓ memory use and ↑ throughput

  23. Evaluation: generality Memory (bytes/entry) • Workload may change Constantly low! CPU utilization (%) Decreasing!

  24. Summary • Present design practices for low cost and high performance SSD-based indexes • Introduce slicing to co-locate related entries and leverage multiple levels of SSD parallelism • SliceHash achieves 69K lookups/sec (≈12% better than prior works), with consistently low memory (0.6B/entry) and CPU (12%) overhead

  25. Evaluation: theoretical analysis • Parameters • 16B key/value pairs • 80% table utilization • 32 incarnations • 4GB of memory • 128GB SSD • 0.31ms to read a block • 0.83ms to write a block • 0.15ms to read a page overhead 0.6 B/entry cost avg: ≈5.7μsworst: 1.14ms cost avg & worst: 0.15ms

  26. Evaluation: theoretical analysis BufferHash 4B/entry avg: ≈0.2us worst: 0.83ms avg: ≈0.15ms worst: 4.8ms overhead 0.6 B/entry cost avg: ≈5.7μsworst: 1.14ms cost avg & worst: 0.15ms

More Related