290 likes | 572 Views
Understanding TCP Incast Throughput Collapse in Datacenter Networks. Presenter: Aditya Agarwal Tyler Maclean. Motivation/Importance. Internet datacenters support a myriad of service and applications. Google, Microsoft, Yahoo, Amazon
E N D
Understanding TCP Incast Throughput Collapse in Datacenter Networks Presenter: Aditya Agarwal Tyler Maclean
Motivation/Importance • Internet datacenters support a myriad of service and applications. • Google, Microsoft, Yahoo, Amazon • Vast majority of datacenter use TCP for communication between nodes. • The unique workload, scale and environment of internet datacenter violate the WAN assumption on which TCP was originally designed. • RTO = 200ms (default value in Linux) • 2-3 order of magnitude greater than the RTT in the data center
What is the Problem • Incast communication pattern: • Try to understand TCP incast throughput collapse. • Prove this problem is general, • An analytical model • Modifications to TCP and make sure that it works server switch client server server
The Contributions • Reproduce the problem in our own experimental testbeds and demonstrate the generality of Incast. • Propose a quantitative model that accounts some of the observed Incast behavior. • Implement several intuitive modifications to the TCP stack in Linux, and prove that some modifications are more helpful than others.
Roadmap • Experiment setting: • Workload • Experiment results: • Initial Finding • Deep analysis • Quantitative Models • Conclusions
WorkLoad setting • Map Reduce like application: • Receiver requests k blocks of data from S storage servers. • Each block of data striped across S storage servers • Each server responses with a “fixed” amount of data. (fixed-fragment workload) • Client won’t request block k+1 until all the fragments of block k have been received. • Setting: • k=100 • S = 1-48 • fragment size : 256KB
Deter Network Security Testbed • 400 PCs, located at USC ISI and UC Berkeley • Supported operating systems include Linux, FreeBSD, Windows
Different sender experience long , synchronized TCP retransmission timeout (RTO) events. • RTO =200ms (default value in WAN environment)
Minor and intuitive modifications • Decrease the minimum RTO timer from 200ms • Randomize the minimum RTO timer • Smaller multiplier for the RTO exponential back off • Randomize the multiplier for the RTO exponential back off.
Initial Results • Smaller multiplier for the RTO exponential back off • Useless • Randomize the multiplier for the RTO exponential back off • Useless • There are only a tiny number of exponential back offs for the entire transfer
Initial Results • Randomize the RTO timer • Useless, but also no penalty • Because the servers share the same switch, all subsequent switch buffer overflow events will be synchronized for all sender.???
Analysis in depth • Different RTO Timers • Observations: • Initial goodput min occurs at the same number of servers. • Larger min RTO timer value, max goodput occurs at large number of senders. • Smaller RTO timer value has faster goodput “recovery” rate • The decrease rate after local max is the same between different min RTO settings.
Delay ACKs and High Resolution Timers • Improving methods proposed by [11] • Turn off the delay ACKs function (defaults delayed ACKs threshold is 40ms) • Use high resolution Timer.
Sub-optimal behavior with regards to delayed ACKs is workload independent.
Quantitative Models • Net good put: • D: total amount of data to be sent, 100 blocks of 256KB • L: total transfer time of the workload without and RTO events. • R: the number of RTO events during the transfer • S: number of server: • r: the value of the minimum RTO timer value
Equation of L I is the inter-packet waiting time
Further analysis on R and I • Number of RTO event is similar for different RTO values( 200ms and 1ms). • Interpkt waiting is vary different for different RTO value( 200ms and 1ms).
Qualitative refinement for their model • As the number of sender increase, the number of RTO event per sender increases. Beyond a certain number of sender, the number of RTO event is constant. • When a network resource becomes saturated, it is saturated at the same time for all senders. • After a congestion event, the senders enter the TCP RTO state. The RTO timer expires at each sender with a uniform distribution in time and a constant delay after the congestion event. • T is increase as the number of sender increase, however, T is bounded.
More explanations • A smaller minimum RTO timer value means larger goodput values for the initial minimum. • The initial goodput minimum occurs at the same number of senders, regardless the value of the minimum RTO times. • The second order goodput peak occurs at a higher number of senders for a larger RTO timer value • The smaller the RTO timer values, the faster the rate of recovery between the goodput minimum and the second order goodput maximum. • After the second order goodput maximum, the slope of goodput decrease is the same for different RTO timer values.
Conclusions • Study the dynamic of Incast. • Propose a simple mathematical model to explain the observed trends • Account for the difference between their observation and that in previous work.