R2D2 Reliable and Rapid Data Delivery for DCs

Berk Atikoglu, Mohammad Alizadeh, Tom Yue, BalajiPrabhakar, Mendel Rosenblum R2D2Reliable and Rapid Data Delivery for DCs

Motivation • Unreliable packet delivery due to • Corruption • Dealt with via retransmission • Congestion • Particularly bad due to incastor fan-in congestion • These losses increase difficulty of reliable transmission • Loss of throughput • Increase in flow transfer times

Incast • The client sends a request to several servers. • The responses travel to the switch simultaneously. • The switch buffer overflows from the amount of data. Some packets are dropped. S S S S S S C S C S S S S S S S S C 3 2 1 1 2 3

Existing Approaches • High-resolution timers • Reduce retransmission timeouts (RTO) to hundreds of µs • Proposed in Vasudevan et al (Sigcomm 2009); see also Chen et al (WREN 2009) • Large number of CPU cycles on rapid interrupts or timer programming • In virtualized environments, high cost of processing hardware interrupts means even higher overhead • Large switch buffers • Reduce incastoccurences by caching enough packets • Increased packet latency • Complex implementation • Large caches are expensive • Increased power usage

Our Approach: R2D2 • R2D2: collapse all flows into a single “meta-flow” • Single wait queue holds packets sent by host that are not yet acked • Single retransmission timer, no per-flow state • Provides reliable packet delivery • Resides in Layer 2.5, a shim layer between Layer 2 and Layer 3 • Key observation: Exploit uniformity of Data Center environments • Path lengths between hosts are small (3 – 5 hops) • RTTs are small (100 – 400 µs) • Path bandwidths are uniformly high (1Gbps, 10Gbps) • Therefore, amount of data from a 1G/10G source “in flight” is less than 64/640 KB • Store source packets in R2D2 on-the-fly, rapidly retransmit dropped or corrupted packets

TCP L2 L3 L3 L2

R2D2 L2 L3 L3 L2.5 L2

R2D2 • When a flow times out: • Retransmit first un-ACKed packet (fill the hole). • Back-off: double the flow’s timeout value. • When an ACK comes in: • Reset the timeout back-off. • Outbound packet is intercepted by R2D2. • A timer is started. • A copy of the packet is placed in the wait queue. • The returned TCPack removes all ACKed packets held in the wait queue. Layer 3 1 2 2 Layer 2.5 R2D2 sender 3 4 1 4 3 Layer 2

Features • Reliable, but not guaranteed, delivery • Maximum number of retransmissions before giving up • State-sharing • Only one wait queue; all packets go in same queue • No change to network stack • Kernel module in Linux; driver in Windows • Hardware version is OS-independent • Incremental deployability • Possible to protect a subset of flows

Implementation • Implemented as a Linux Kernel Module on Kernel 2.6.* • No need to modify kernel • Can be loaded/unloaded easily • Incoming/outgoing TCP/IP packets are captured using Netfilter • Captured packets are put into a queue • just meta-data is kept in queue; packet is cloned • L2.5 thread processes the packets in the queue periodically

Test Setup • 48 Dell PowerEdge 2950 Servers • Intel Core 2 Quad Q9550 × 2 • 16GB ECC DRAM • Broadcom NetXtreme II 5708 1GbE NIC • CentOS 5.3 Final; Linux 2.6.28-10 • Switches • Netgear GS748TNA (48 ports, GbE) • Cisco Catalyst 4948 (48 ports, GbE) • BNT RackSwitch G8421 (24 ports, 10GbE) 1 rack 48 servers 1GbE / 10GbE …

Algorithms • R2D2 • Minimum timeout: 3ms • Max retransmissions: 10 • Delayed ack disabled • TCP: CUBIC TCP • minRTO: 200ms • Segmentation offloading: disabled • TCP timestamps: disabled

Workload – 1 GbE switches • Number of servers (N): 1, 2, 4, 8, 16, 32, 46 • File size (S): 1MB, 20MB • Client: • requests (S/N) MB from each server • Issues new request when all servers respond • Measurements: • Goodput • Retransmission ratio: Retransmitted packets Total packets sent by TCP

Netgear Test – Goodput 1MB 20MB

Netgear Test – Retransmission Ratio 1MB 20MB

Netgear Test – Multiple Clients • 6 clients (instead of 1 client) • 32 servers • Each client requests a file from each of the 32 servers 1MB 20MB

Catalyst 4948 Test – Goodput 1MB 20MB

Catalyst 4948 Test – Retransmission Ratio 1MB 20MB

Catalyst 4948 Test – Multiple Clients 1MB 20MB

10GbE test – Goodput • File size: 10MB • Number of servers: 1, 5, 9, 13, 17, 21

Conclusion • R2D2 is scalable and fast, provides reliable delivery • No need to modify kernel • Can be loaded/unloaded easily • Improves reliability in data center networks • Hardware implementation in NIC can be much faster • Work well with TCP offload options like segmentation and checksum offloading • Developing an FPGA implementation

R2D2 Reliable and Rapid Data Delivery for DCs

R2D2 Reliable and Rapid Data Delivery for DCs

Presentation Transcript

DTNLite: Reliable Data Delivery in Sensornets

Data Delivery

GOES DATA COLLECTION SYSTEM (DCS)

Rapid Testing in Labor and Delivery

Reporting and Data Delivery

Data delivery

Datacast : A Scalable and Efficient Reliable Group Data Delivery Service for Data Centers

Datacast : A Scalable and Efficient Reliable Group Data Delivery Service for Data Centers

Labor and Delivery Rapid HIV Test Counseling

COOL for DCS

Reliable high-speed Grid data delivery using IP Multicast

Reliable Delivery

High Impact Rapid Delivery approach for MDG4,5

DCS for ‘services’

A Scalable Approach for Reliable Downstream Data Delivery in Wireless Sensor Networks

WP3: ‘Rapid data delivery at 4 NDACC stations’

Reliable high-speed Ethernet and data services delivery

GOES DATA COLLECTION SYSTEM (DCS)

DCS for Services and Infrastructure

Courier Fort Worth for safe and reliable delivery service

Ensure Fast and Reliable Dry Ice Delivery

Argos Data Collection and Location System (DCS)