Slingshot: Time-Critical Multicast for Clustered Applications

Slingshot: Time-Critical Multicast for Clustered Applications Mahesh Balakrishnan Stefan Pleisch Ken Birman Cornell University

The Contemporary Datacenter • Building-wide super-clusters: 1000s of commodity blade-servers • Typically used as commercial website back-ends: Amazon, etc. • Software Paradigms: SOA, Eventing, Publish/Subscribe… • … many-to-many communication, Multicast!

Multicast in the Datacenter • IP Multicast available: adding reliability to it is a well-researched technology… • Scalability dimensions • Number of receivers • Number of senders? • Number of groups? • Metrics • Throughput • Timeliness?

Time-Critical Applications • … dealing in perishable data: stock quotes, location updates • … willing to trade complete reliability for timeliness • … requiring tunable reliability/ timeliness/ overhead tradeoffs • Probabilistic Guarantee of Timeliness? • For x% overhead, y% of lost packets are recovered in time t. • Remainder can be optionally recovered in time t’.

Design Space • Reactive vs. Proactive • Reactive: Loss Discovery • ACK • Sender-Based Sequencing • If the multicast rate in a group is constant, the inter-multicast time at any sender goes up linearly with the number of senders • Gossip – Scalable • Proactive: FEC – Tunable

Slingshot Overview Receiver-Based FEC: Senders send initially via unreliable IP Multicast Phase 1: Receivers repair losses by proactively sending each other FEC repair packets Phase 2: Remaining losses are recovered from the sender Each receiver sends an error correction (XOR) packet to c randomly selected receivers with the last r packets it received Rate-of-fire parameter (r, c): Allows tuning of overhead-timeliness tradeoff

Protocol Details 0 • Two Packet Types: Repair Packet : List of Data Packet IDs: (sender1,seqno1), (sender2,seqno2)…. Data Packet : Packet ID (Sender, SeqNo) Less than Network MTU Application MTU: 1024 XOR of Data Packets Application Payload Terminology: Data packets are included in repair packet

Protocol Details 1 • Data Structures: • Data Buffer: received data packets • Repair Bin: pointers to last <r data packets • Arrival of Data Packet dp at Receiver: • dp is added to the data buffer • &dp is added to the repair bin • If repair bin size equals r, a repair packet rp is created from its contents, and the repair bin is cleared • rp is dispatched to c random receivers

Protocol Details 2 • Arrival of Repair Packet rp at Receiver: If #(missing included data packets) == • 0: rp is discarded • 1: it is recovered by XORing rp with the other r-1 data packets • >1: rp is stored in a special buffer, in case future data packet arrivals and recoveries make it usable

Evaluation Setup • 64 node rack-style cluster at Cornell • Loss rate fixed at 1%: packets dropped at end buffers • All nodes send and receive • Inter-node latencies = 50-100 microseconds • Group Data Rate: 1000 packets per second • Each node multicasts 64 packets per second; i.e one packet every 64 milliseconds

Slingshot Tunability For 27% overhead, 93.5% Lost Packets are recovered at an avg. of 3.5 milliseconds Example Tradeoff Points between Overhead, Timeliness, and Reliability Overhead and Recovered Packets plotted on left y-axis, Recovery Time on right

Slingshot vs SRM Slingshot recovers 93% in 10 ms, 97% in 25 ms Fastest SRM packet Recovery is 2.2 seconds 93% in 4.85 seconds, 97% in 5.1 seconds 2-3 Orders of Magnitude faster

Slingshot Scalability: Group Size Simulation Results: Gossip-Style Scalability: Insensitive to scale beyond a certain size

Conclusion • Slingshot provides a tunable, probabilistic guarantee of timeliness • Outperforms SRM by 2 orders of magnitude in a 64 node system • Insensitive to number of senders • Future Work: • Achieve scalability in other dimensions (number of groups) • Build a time-critical middleware layer that uses Slingshot as a generic primitive

Slingshot: Time-Critical Multicast for Clustered Applications