130 likes | 143 Views
Explore the challenges and solutions for achieving high TCP performance over wide area networks. Topics include TCP fairness, network buffers, MTU bias, and the impact of packet loss. Discover how to optimize TCP performance for the future of computational grids.
E N D
Performance EngineeringE2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom 2003 - Geneva October 15, 2003 ravot@caltech.edu
Agenda • High TCP performance over wide area networks : • TCP at Gbps speed • MTU bias • RTT bias • TCP fairness • How to use 100% of the link capacity with TCP Reno • Network buffers impact • New Internet2 Land Speed record
Single TCP stream performance under periodic losses • TCP throughput is much more sensitive to packet loss in WANs than in LANs • TCP’s congestion control algorithm (AIMD) is not suited to gigabit networks • Poor limited feedback mechanisms • The effect of packets loss is disastrous • TCP is inefficient in high bandwidth*delay networks • The future performance of computational grids looks bad if we continue to rely on the widely-deployed TCP RENO • Loss rate =0.01%: • LAN BW utilization= 99% • WAN BW utilization=1.2% Bandwidth available = 1 Gbps
Responsiveness (I) • The responsiveness r measures how quickly we go back to using the network link at full capacity after experiencing a loss if we assume that the congestion window size is equal to the Bandwidth Delay product when the packet is lost. C : Capacity of the link 2 C . RTT r = 2 . MSS
Responsiveness (II) The Linux kernel 2.4.x implements delayed acknowledgment. Due to delayed acknowledgments, the responsiveness is multiplied by two. Therefore, values above have to be multiplied by two!
Single TCP stream • Time to increase the throughout from 100Mbps to 900Mbps = 35 minutes • Loss occurs when the bandwidth reaches the pipe size • 75% of bandwidth utilization (assuming no buffering) • Cwnd<BDP : • Throughput < Bandwidth • RTT constant • Throughput = Cwnd / RTT TCP connection between Geneva and Chicago: C=1 Gbit/s; MSS=1,460 Bytes; RTT=120ms 35 minutes
Measurements with Different MTUs • In both cases: 75% of the link utilization • Large MTU accelerate the growth of the window • Time to recover from a packet loss decreases with large MTU • Larger MTU reduces overhead per frames (saves CPU cycles, reduces the number of packets) TCP connection between Geneva and Chicago: C=1 Gbit/s; RTT=120ms
R R MTU and Fairness • Two TCP streams share a 1 Gbps bottleneck • RTT=117 ms • MTU = 1500 Bytes; Avg. throughput over a period of 4000s = 50 Mb/s • MTU = 9000 Bytes; Avg. throughput over a period of 4000s = 698 Mb/s • Factor 14 ! • Connections with large MTU increase quickly their rate and grab most of the available bandwidth CERN (GVA) Starlight (Chi) Host #1 1 GE 1 GE Host #1 1 GE POS 2.5Gbps GbE Switch Host #2 Host #2 1 GE Bottleneck
R R R R RTT and Fairness CERN (GVA) Starlight (Chi) Sunnyvale Host #1 1 GE 10GE 1 GE GbE Switch POS 2.5Gb/s POS 10Gb/s Host #2 Host #2 1 GE 1 GE Bottleneck Host #1 • Two TCP streams share a 1 Gbps bottleneck • CERN <-> Sunnyvale RTT=181ms ; Avg. throughput over a period of 7000s = 202Mb/s • CERN <-> Starlight RTT=117ms; Avg. throughput over a period of 7000s = 514Mb/s • MTU = 9000 bytes • Connection with small RTT increases quickly there rate and grab most of the available bandwidth
How to use 100% of the bandwidth? • Single TCP stream GVA - CHI • MSS=8960 Bytes; Throughput = 980Mbps • Cwnd > BDP => Throughput = Bandwidth • RTT increase • Extremely Large buffer at the bottleneck • Network buffers have an important impact on performance • Have buffers to be well dimensioned in order to scale with the BDP? • Why not use the end-to-end delay as congestion indication. Bandwidth delay product
Single stream TCP performance NEW Submission (Oct-11): 5.65 Gbps from Geneva to Los Angeles across the LHCnet, Starlight, Abilene and CENIC.
Early 10 Gb/s 10,000 km TCP Testing • Single TCP stream at 5,65 Gbps • Transferring a full CD in less than 1s • Un-congestioned network • No packet loss during the transfer • Probably qualifies as new Internet2 LSR Monitoring of the Abilene traffic in LA
Conclusion • The future performance of computational grids looks bad if we continue to rely on the widely-deployed TCP RENO • How to define the fairness? • Taking into account the MTU • Taking into account the RTT • Larger packet size (Jumbogram : payload larger than 64K) • Is standard MTU the largest bottleneck? • New Intel 10GE cards : MTU=16K • J. Cain (Cisco): “It’s very difficult to build switches to switch large packets such as jumbogram” • Our vision of the network: “The network, once viewed as an obstacle for virtual collaborations and distributed computing in grids, can now start to be viewed as a catalyst instead. Grid nodes distributed around the world will simply become depots for dropping off information for computation or storage, and the network will become the fundamental fabric for tomorrow's computational grids and virtual supercomputers”