1 / 17

ARGUS : Toward Scalable Replication Systems with Predictable Tails using Programmable Data Planes

This study by Sean Choi, SeoJin Park, Muhammad Shahbaz, Balaji Prabhakar, and Mendel Rosenblum explores how to implement scalable replication systems with predictable tails using programmable data planes. Replication is crucial for enhancing availability, fault tolerance, localized data access, and is commonly used in distributed databases and consensus systems. However, replication introduces overheads such as increased CPU, memory, and disk usage, as well as higher latency due to the need for multiple round-trips per update. The approach of Commutative Unordered Replication Protocol (CURP) reduces replication latency and overheads while ensuring consistency by replicating commutative operations without strict ordering, and falling back to 2-round trip replication when necessary. The use of SmartNICs (Network Interface Cards with programmable NPUs) to implement CURP witnesses helps reduce latency, eliminate tail-at-scale issues, and minimize host resource usage. The study presents an experimental testbed setup and evaluation results showing higher throughput, lower latency, and reduced tail latency with ARGUS, thereby saving host CPU and memory resources. The study concludes that ARGUS offers significant improvements in replication performance.

rwaddell
Download Presentation

ARGUS : Toward Scalable Replication Systems with Predictable Tails using Programmable Data Planes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ARGUS : Toward Scalable Replication Systems with Predictable Tailsusing Programmable Data Planes Sean Choi, SeoJin Park, Muhammad Shahbaz, Balaji Prabhakar and Mendel Rosenblum

  2. Replication is Crucial • Increases Availability and Fault Tolerance • Localized Data Access • Distributed Databases, Consensus Systems, … Master Write Client Client Replicate Replicate Replicate Backup Backup Backup

  3. Replication Adds Overheads • Increases CPU / Memory / Disk Usage • Requires 2 Round-Trips per update(Higher Latency) Master Write X←2 Client Client X: 2 Y: 5 X: 1 Y: 5 X 2 Y 5 … … X 1 X 2 Ok Committed Current State Uncommitted Backup Backup Backup … Y 5 X 2 … Y 5 X 2 … Y 5 X 2 Ok Ok Ok

  4. Reasons for 2 RTTs Client X ← 1 X ← 2 Client X ← 3 Client Master Time to completean operation Backups 1 RTT for replication 1 RTTfor serialization

  5. CURP Enables 1 RTT Replication Totally ordered replication needs 2 RTTs Idea: Replicate for durability &Exploit commutativity to defer ordering Consistent Unordered Replication Protocol (NSDI 2019) Replicate commutative operations without ordering Fall back to 2 RTT replication otherwise

  6. CURP Enables 1 RTT Replication y←5 Client async z←7 Master garbage collection Backups Client z←7 y←5 • No ordering info • Temporary until async • Witness data used for recovery Witnesses Time to completean operation 1 RTT

  7. Shortcomings of CURP in User Space CURP witness is implemented in user space Highlatency due to network/OS layers Tail-at-Scale (More witness -> Worse tail latency) Added host resource usage

  8. Motivations for ARGUS ARGUS implements CURP Witnesses in SmartNICS to… Reduce latency by removing the network/OS layers Avoid Tail-at-Scale(No resource contention, RTC) Eliminate host resource usage z←7 y←5 SmartNIC Witnesses

  9. What are SmartNICs? • Network Interface Cards (NIC) can run user defined tasks that is originally run by a CPU • Categorized based on the type of processor

  10. NetronomeSmartNICs (ASIC-based) • Programmable NPUs capable up to 100G • Runs programs directly in the data plane • Contains up to 120 Cores @ 1.2Ghz and 8GB RAM • Programmable via P4 and Micro-C

  11. Overview of ARGUS

  12. Experiment Testbed Setup • 5x Dell R640 1U Server(1 Client, 1 Master, 3 Witnesses) • Intel Xeon 5117 14 Cores @ 2Ghz32GB DDR4 RAM • Netronome CX 10Gb SmartNIC56 Cores @ 633MHz 2GB RAM • 10Gb Arista Switch • Durable Redis writes to master and witnesses

  13. Evaluation: Higher Throughput, Lower Latency Throughput (Kops/s) Latencies (μs)

  14. Evaluation: Shorter Tails

  15. Evaluation: Lower Tail-at-Scale Effect

  16. Future Work • Client-side replication on SmartNICs • Test lightweight reliable data-transfer protocols • Try other domain-specific hardware accelerators

  17. Conclusion • ARGUS shows significant improvements in replication throughput, latency and taillatency • All the while saving host CPU & Memory usage!

More Related