1 / 35

LightStore : Software-defined Network-attached Key-value Drives

LightStore : Software-defined Network-attached Key-value Drives. Chanwoo Chung , Jinhyung Koo * , Junsu Im * , Arvind, and Sungjin Lee * Massachusetts Institute of Technology (MIT) *Daegu Gyeongbuk Institute of Science & Technology (DGIST).

reynoldsb
Download Presentation

LightStore : Software-defined Network-attached Key-value Drives

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LightStore: Software-defined Network-attached Key-value Drives Chanwoo Chung, Jinhyung Koo*, JunsuIm*, Arvind, and Sungjin Lee* Massachusetts Institute of Technology (MIT) *Daegu Gyeongbuk Institute of Science & Technology (DGIST) The 24th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2019) Providence, RI

  2. Datacenter Storage Systems • This talk presents a new storage architecture, which is 2.0-5.2x power-and 2.3-8.1x floor space-efficient for flash … Application 2 Application N Application 3 Application 1 Application Servers Datacenter Network (e.g., 10G/40G Ethernet, InfiniBand) SQL access File access KV access Storage Services (e.g., SQL, NFS, RADOS ) Storage Nodes XeonCPUs Large DRAM SSD SSD SSD SSD Significant capital & operating cost!

  3. How do we achieve cost reduction? • One SSD per network port • KV interface by embedded-class storage nodes • Adapters in app servers … Application 2 Application N Application 3 Application 1 Application Servers Interoperability Adapter Adapter Datacenter Network (e.g., 10G/40G Ethernet, InfiniBand) Bottleneck!!! embedded nodes KV KV KV KV Storage Services (e.g., SQL, NFS, RADOS ) SSD SSD SSD Storage Nodes SSD XeonCPUs Large DRAM SSD SSD SSD SSD Current Architecture SSD Bandwidth: 2~10GB/s (>10GbE)

  4. Which KVS on embedded systems? Hash-based KVS LSM-tree-based KVS LightStore • Simple implementation • Unordered keys • limited RANGE & SCAN • Random==Sequential access • Unbound tail-latency • KV-SSDs (mounted on host) • Samsung KV-SSD • KAML [Jin et. al., HPCA 2017] • Multi-level search tree • Sorted keys • RANGE & SCAN • Fast sequential access  Adapter-friendly • Bounded tail-latency • Append-only batched writes  Flash-friendly

  5. Performance on ARM? • RocksDB on 4-core ARM + Samsung 960PRO SSD • Excessive memcpy overhead • High locking / context-switching overhead • RocksDB running on a filesystem – Deep I/O stack S- : Sequential R- : Random 3.6x-4.2x slow down! Utilizing only 10% of read bandwidth

  6. Our plan: software optimization • System optimization (1) specialized memory allocator: minimize data copies (2) lock-free queues between threads instead of locks • LSM-tree-specific optimization (1) decouple keys from KV pairs (keytable) for faster compaction (2) bloom filters and caching for keytables Wrote the entire software from scratch to run it on an embedded system

  7. SSD controller • Typically Flash Translation Layer (FTL) requires an embedded-class multicore + a few GB DRAM • We implemented the FTL in HW; and • Used the freed multicore and DRAM to implement Key-value Store & Network manager Network Interface Card (10Gb Ethernet) Host Interface Controller (SATA, PCIe) DRAM (>4 GB) DRAM (>4 GB) ARM core (~1GHz) ARM core (~1GHz) ARM core (~1GHz) ARM core (~1GHz) Flash Management ARM core (~1GHz) ARM core (~1GHz) ARM core (~1GHz) ARM core (~1GHz) Thanks to LSM-Tree Vendor-specific Accelerators HW FTL NAND I/O Controller NAND I/O Controller NAND NAND NAND NAND NAND NAND NAND NAND NAND NAND … …

  8. LightStore Overview Clients (Datacenter Applications) SQL File System KVS Block Application Servers fwrite() get() read() INSERT SQL Adapter FS Adapter Blk Adapter KV Request (GET, SET, DELETE, …) KV requests hashed to different nodes by adapters w/ Consistent Hashing • Datacenter Network NIC NIC NIC KV Store KV Store … KV Store … LightStore Cluster (Storage Pool) Flash Flash Flash Expansion Card Network LightStore Node (SSD-sized Drive) Exp. Net Flash

  9. Introduction LightStore Software KVS Performance LightStore HW FTL Applications and Adapters Conclusion

  10. LightStore software LightStore-Engine Datacenter Network KV Protocol Server Thread #1 Thread #2 KV Request Handler KV ReplyHandler Lock-free Queues LSM-Tree Engine Minimize Copy! Thread #3 Thread #4 Memtable LSM-Tree Manager Writer & Compaction Value read from flash READ/WRITE/TRIM READ Thread #5 Poller Zero-Copy Memory Allocator Direct-IO Engine Userspace poll () Kernel Device Ctrl. Interrupt Handler Interrupt Hardware LightStore HW FTL

  11. LightStore software LightStore-Engine Datacenter Network KV Protocol Server Supported RESP commands: SET, MSET, GET, MGET, DELETE, SCAN Thread #1 Thread #2 KV Request Handler KV ReplyHandler Lock-free Queues LSM-Tree Engine Thread #3 Thread #4 Memtable LSM-Tree Manager Writer & Compaction Value read from flash READ/WRITE/TRIM READ Thread #5 Poller Zero-Copy Memory Allocator Direct-IO Engine Userspace poll () Kernel Device Ctrl. Interrupt Handler Interrupt Hardware LightStore HW FTL

  12. LightStore software LightStore-Engine Datacenter Network KV Protocol Server Thread #1 Thread #2 KV Request Handler KV ReplyHandler Lock-free Queues LSM-Tree Engine Thread #3 Thread #4 Memtable LSM-Tree Manager Writer & Compaction Value read from flash READ/WRITE/TRIM READ Thread #5 Poller Zero-Copy Memory Allocator Direct-IO Engine Userspace poll () Kernel Device Ctrl. Interrupt Handler Interrupt Hardware LightStore HW FTL

  13. LSM-tree Basics Memtable (L0) KV Writes Flush on threshold/timer Memory Storage L1 SST L1 SST L1 SST … L1 L2 Merge multiple L1 SSTs Compaction: Merge L1 SSTs & Write L2 SST  HUGE OVERHEAD L2 SST Sorted String Table (SST), immutable Key Val Key Val Key Val

  14. LightStore LSM-tree Engine Cached Keytables Memtable (L0) KV Writes …. L2 Keytable Decouple Key and Value! Memory [Lu, WiscKey, FAST 2016] values sorted keys Storage L1 Keytable L1 Keytable L1 Keytable L1 keytables L2 keytables Val Compaction on keys L2 Keytable Bloom-filter per keytable Leveled keytables Persistent value-tables

  15. Introduction LightStore Software KVS Performance LightStore HW FTL Applications and Adapters Conclusion

  16. LightStore Prototype • Each LightStore Prototype node is implemented using a Xilinx ZCU102 evaluation board and a custom flash card Xilinx ZCU102 4GB DRAM Custom Flash Card Expansion Card Connectors Artix7 FPGA Raw NAND Flash chips (512GB) ZynqUltrascale+ SoC (Quad-core ARM Cortex-A53 with FPGA)

  17. KVS Experimental Setup • Clients and storage nodes are connected to the same 10GbE switch

  18. Experimental KVS Workloads • 5 synthetic workloads to evaluate KVS performance • YCSB for real-world workloads (in the paper) * The value size of 8-KB used to match the flash page size

  19. KVS Throughput (Local) • Throughput seen locally, i.e., w/o network; SSD Read BW (3.2 GB/s) Metadata fetching Metadata fetching Tree traversing SSD Write BW LightStore Flash Read BW Search overhead LightStore Flash Write BW Compaction! (10% Write) sequential random Almost saturating device BW

  20. KVS Throughput (Local) • Throughput seen locally, i.e., w/o network; SSD Read BW (3.2 GB/s) Metadata fetching Metadata fetching Tree traversing SSD Write BW LightStore Flash Read BW Search overhead LightStore Flash Write BW LightStore w/ flash as fast as x86 Compaction! (10% Write) sequential random

  21. KVS Throughput (Local) • Throughput seen locally, i.e., w/o network; SSD Read BW (3.2 GB/s) SSD Write BW LightStore Flash Read BW 20% faster LightStore core LightStore Flash Write BW LightStore w/ flash as fast as x86 (10% Write) sequential random

  22. KVS Throughput (Network) • Throughput seen by clients over the network x86 Ethernet BW LightStore Flash Read BW LightStore Ethernet BW Network + Compaction LightStore Flash Write BW LightStore Flash Write BW Almost saturating device BW Almost saturating network BW

  23. KVS Throughput (Network) • Throughput seen by clients over the network • Given the same flash/NIC, LightStore outperforms! x86 Ethernet BW LightStore w/ NIC as x86 20%+ core LightStore w/ NIC/flash same as x86 Almost saturating device BW Almost saturating network BW

  24. KVS Scalability • Random reads (R-GET) x86 Ethernet BW Network is bottleneck (as expected) Linear ! x86 Ethernet BW LightStore x86-ARDB with 2 NICs x86-ARDB with 1 NIC

  25. KVS IOPS-per-Watt • Assume that x86-ARDB scales with up to 4 SSDs • 4 times the performance seen previously • Peak power • x86-ARDB – 400W, LightStore-Prototype – 25W

  26. KVS Latency

  27. Introduction LightStore Software KVS Performance LightStore HW FTL Applications and Adapters Conclusion

  28. HW FTL and LSM-tree LSM-Tree Compaction Always Append Data

  29. LightStore HW FTL: data structure Application-Managed Flash (AMF) [FAST 2016] Segment mapping table • coarse-grained mapping translation Block management table • wear-leveling & bad-block management • Each table is ~1MB per 1TB flash • Commercial SSDs (FTL) require >1GB per 1TB flash • Adds very small latency • 4 cycles (mapped) or 140 cycles (not mapped) • at most 0.7 us @ 200 MHz (<1% of NAND latency)

  30. Effects of HW FTL • HW FTL > Lightweight SW FTL > Full SW FTL • Full SW: page mapping; garbage collection copying overhead • Read: 7-10% degradation • Write: 28-50% degradation • Compaction thread very active; More SW FTL tasks • Without FPGA (or HW FTL), we would need an extra set of cores • (Trade-off between Cost and Design Efforts)

  31. Introduction LightStore Software KVS Performance LightStore HW FTL Applications and Adapters Conclusion

  32. Application: Block and File Stores Block Store File Store Ceph Ethernet BW LightStore Ethernet BW LightStore Flash Write BW • LightStore: Block Adapter implemented in User Mode (BUSE) • File Adapter: filesystem + Block Adapter • Ceph: Ceph-Block & Ceph-filesystem • Ceph known to work much better with large (>1MB) objects

  33. Introduction LightStore Software KVS Performance LightStore HW FTL Applications and Adapters Conclusion

  34. Summary • Current storage servers are costly and prevent applications from exploiting the full performance of SSDs over the network • LightStore • Networked KV-Drives instead of x86-based storage system • Thin software KV adapters on clients • Delivers full NAND flash bandwidth to network • Benefits • 2.0x power- and 2.3x space-efficient (conservative) • Up to 5.2x power- & 8.1x space-efficient • 4-node prototype is 7.4x power-efficient

  35. Thank You ! Chanwoo Chung (cwchung@csail.mit.edu)

More Related