240 likes | 338 Views
On Horrible TCP Performance over Underwater Links. Balaji Prabhakar. Abdul Kabbani, Balaji Prabhakar Stanford University. In Defense of TCP. Balaji Prabhakar. Abdul Kabbani, Balaji Prabhakar Stanford University. Overview. TCP has done well for the past 20 yrs It is continuing to do well
E N D
On Horrible TCP Performanceover Underwater Links Balaji Prabhakar Abdul Kabbani, Balaji Prabhakar Stanford University
In Defense of TCP Balaji Prabhakar Abdul Kabbani, Balaji Prabhakar Stanford University
Overview • TCP has done well for the past 20 yrs • It is continuing to do well • However, some performance problems have been identified • TCP needs long buffers to keep links utilized • It doesn’t perform well in large BWxDelay links • It is oscillatory, and it is sluggish • TCP takes too long to process short flows • Note: we’re not addressing TCP over wireless • We revisit some of the issues and find • Either we have demanded too much • Or, there are very close relatives to TCP that satisfy our demands
Some background I’ve recently become familiar with congestion control in two LAN networks Fibre Channel (for Storage Area Networks) Ethernet (part of a standardization effort) These networks are much more severe than the Internet in terms of the operating condition, and yet they function quite alright! Let’s look at them briefly
SAN: Fibre Channel • Fibre Channel: A standardized protocol for SANs; main features • Packet switching • Typical topology: 3-5 hop networks • No packet drops! Buffer-to-buffer credits used to transport data • No end-to-end congestion control! • Very wide deployment • The dominant technology for host-to-storage array data transfers • No congestion-related or congestion collapse problems reported to date! • Upon investigation, here’re some factors helping FC networks • Small file sizes (128KB, chopped as 64 2KB pkts, comparable to buffer size at the switches) • Light loads (30-40%) • Small topologies 5
Data Center Ethernet • Involved in developing a congestion management algorithm in the IEEE 802.1 Data Center Bridging standards activity, for Data Center Ethernet • With Berk Atikoglu, Abdul Kabbani, Rong Pan and Mick Seaman • Ethernet vs Internet (not an exhaustive list) • There is no end-to-end signaling in the Ethernet a la per-packet acks in the Internet • So congestion must be signaled to the source by switches • Not possible to know round trip time! • Algorithm not automatically self-clocked (like TCP) • Links can be paused; i.e. packets may not be dropped • No sequence numbering of L2 packets • Sources do not start transmission gently (like TCP slow-start); they can potentially come on at the full line rate of 10Gbps • Ethernet switch buffers are much smaller than router buffers (100s of KBs vs 100s of MBs) • Most importantly, algorithm should be simple enough to be implemented completely in hardware
Summary • We see a “hierarchy of harsh operating environments” • Low BWxDelay Internet, High BWxDelay Internet, Ethernet, Wireless, etc… • Based on experience with Ethernet and Fibre Channel, I’m convinced that • TCP is operating in an environment where sufficient information and flexibility exists to obtain good performance; with minimal changes • In the next part, we will illustrate the above claim • Since we’re not allowed to mention any schemes out there, we thought we’d invent a new one! • Will consider high BWxDelay networks • With short buffers (10-20% of BWxDelay) • Small number of sources (e.g. 1 source) • Assume: ECN marking • Show that utilization can be as high as 100% • Short flows need not suffer large transfer times
Single Link Topology Sampling probability 100% 1% A 400Mbps 30msec RTT B 25% 50% 100% Queue occupancy • Recall • A single TCP source needs BWxDelay amount of buffering to run at the line rate • With shorter buffers TCP loses tpt dramatically • This hurts TCP in very large BWxDelay networks • We consider a single long link to begin with • BWxDelay = 1000 pkts • Buffer used = 200 pkts • Marking probability 8
Improvement: Take One • Clearly cutting the window by a factor of 2 is harmful • The source takes a long time to build its window back up • So, let’s consider a “multibit TCP” which allows us to cut the window by smaller factors • cwnd <-- cwnd(1- ECN/2n+1) • E.g. with 6-bit TCP, smallest cut is by 127/128 Mark value Sampling probability 2n -1 100% 1 1% 25% 50% 100% 25% 50% Queue occupancy 100% Queue occupancy
Single-link: Queue Occupancy 2 sources 1 source
Multiple Link Topology R5 R4 R1 R2 R3 400Mbps 30msec RTT 0 1 2 3 • Want to see if improvements persist as we go to larger networks • Parking lot topology 14
Parking Lot Utilization: Single-hop Flows R5 R4 R1 R2 R3 400Mbps 30msec RTT 0 1 2 3 R1 R2 R3
Parking Lot Utilization: Multi-hop Flows R5: two-hop R4: three-hop
Summary • In both the single link and the multiple link topology • Ensuring that TCP doesn’t always cut its window by 2 is a good idea • Can we have this happen without using multiple bits?
Adaptive 1-bit TCP Source • We came up with this over the weekend, so it really is part of this talk and not “our favorite algorithm” • Source maintains an “average congestion seen” value AVE • Updating AVE: simple exponential averaging • AVE <-- (AVE + ECN)/2 • Note: AVE is between 0 and 1 • Using AVE: • cwnd <-- cwnd / (1 + AVE) • Decrease factor is between 1 and 2
Single-link: Queue Occupancy 2 sources 1 source
Improving the transfer time for short flows • Ran out of time for this one • But, if we just have a starting window size of 10 pkts as opposed to 1 pkt, most short flows will complete during 1 RTT of the slow start phase 22
Conclusion • The wide area Internet is quite a friendly environment compared to Ethernet, Fibre Channel and, certainly, Wireless • Simple fixes exist (and are well-known) for high BWxDelay networks • Relationship with buffer size useful to understand • But, short buffers quite adequate • Fake watermark on buffer (e.g. at 50% of buffer size) helps reduce packet drops drastical • Using fake watermark on router buffers enables using smaller buffers and reducing bursty drops 23