160 likes | 315 Views
Understanding TCP Incast Throughput Collapse in Datacenter Network. Offense: Chang Seok Bae Yi Yang. Offense Outline. Challenge the contributions Challenge the methodology Challenge the conclusions Challenge the details. I nconsistence. Later definition of RTO
E N D
Understanding TCP Incast Throughput Collapse in Datacenter Network Offense: Chang SeokBae Yi Yang
Offense Outline • Challenge the contributions • Challenge the methodology • Challenge the conclusions • Challenge the details
Inconsistence • Later definition of RTO • Later definition of goodput.
Not well addressed the topic • larger switch buffers can delay the onset of Incast (doubling the buffer size doubles the number of servers that can be contacted). • Ethernet flow control is effective when the machines are on a single switch • Your solution lies not in the network but at the endhost. What do you think some other approach such as using traffic engineering? [2]
Repeated Work • Reproduce the results in prior work • Use other’s workload code • Use other’s Linux kernel modification unexpected result! why? Just because of the different operating system Linux 2.6.28.1 vs. Linux 2.6.18.8
Lack of different workloads and different environment • Refuse to use the latest, more representative workload. • The understanding of Incast should be evaluated under a wide variety of settings, i.e., different applications, environments, network equipment, and network topologies.
Too small a minimum RTO can lead to spurious timeouts for wide-area network traffic • Does not address the case where a large number of short-lived TCP burstand non-TCP traffic might share the Ethernet fabric, causing severe unfairness to TCP traffic [1]
Model • What’s model for variable-fragment workload • Model is incomplete and so limited • How much are you sure your model works for some other network
Weakness of Quantitative models We want to know the statistic result of measured and predicted results, rather than just saying the shapes of curves are identical.
Measurement • What’s timeline reconstruction and analysis tool you built • How to guarantee its correctness even though tools are not sufficiently polished to be released
Reference [1] V. S. Rajanna et al, XCo: Explicit Coordination to Prevent Network Fabric Congestion in Cloud Computing Cluster Platform [2] T. Benson et al, The case for fine-grained traffic engineering in data centers
Some Ethernet switches provide a per-hop mechanism for flow control that operates independently of TCP’s flow control algorithm. When a switch that supports Ethernet Flow Control is overloaded with data, it may send a “pause” frame to the interface sending data to the congested buffer, informing all devices connected to that interface to stop sending or forwarding data for a period of time. During this period, the overloaded switch can reduce the pressure on its queues.