210 likes | 334 Views
Final EU DataTAG Review High Performance Network Demonstration March 24, 2004. Richard Hughes-Jones The University of Manchester, UK. It works ? what’s the Problem with TCP. TCP has 2 phases: SLowstart & Congestion Avoidance AIMD and High Bandwidth – Long Distance networks
E N D
Final EU DataTAG ReviewHigh Performance Network Demonstration March 24, 2004 Richard Hughes-Jones The University of Manchester, UK DataTAG is a project funded by the European Commission under contract IST-2001-32459
It works ?what’s the Problem with TCP • TCP has 2 phases: SLowstart & Congestion Avoidance • AIMD and High Bandwidth – Long Distance networks Poor performance of TCP in high bandwidth wide area networks is due in part to the TCP congestion control algorithm - cwnd congestion window • For each ack in a RTT without loss: cwnd -> cwnd + a / cwnd- Additive Increase, a=1 • For each window experiencing loss: cwnd -> cwnd – b (cwnd)- Multiplicative Decrease, b= ½ • Time to recover from 1 packet loss ~100 ms rtt: 2
Investigation of new TCP Stacks • High Speed TCP a and b vary depending on current cwnd using a table • a increases more rapidly with larger cwnd – returns to the ‘optimal’ cwnd size sooner for the network path • b decreases less aggressively and, as a consequence, so does the cwnd. The effect is that there is not such a decrease in throughput. • Scalable TCP a and b are fixed adjustments for the increase and decrease of cwnd • a = 1/1 00 – the increase is greater than TCP Reno • b = 1/8 – the decrease on loss is less than TCP Reno • Scalable over any link speed. • Fast TCP Uses round trip time as well as packet loss to indicate congestion with rapid convergence to fair equilibrium for throughput. • HSTCP-LP High Speed (Low Priority) – backs off if rtt increases • BiC-TCP – Additive increase large cwnd; binary search small cwnd • H-TCP – after congestion standard then switch to high performance • ●●● 3
Comparison of TCP Stacks • TCP Response Function • Throughput vs Loss Rate – steeper: faster recovery • Drop packets in kernel MB-NG rtt 6ms DataTAG rtt 120 ms 4
mmrbc 512 bytes mmrbc 1024 bytes mmrbc 2048 bytes CSR Access PCI-X Sequence Data Transfer Interrupt & CSR Update mmrbc 4096 bytes 5.7Gbit/s 10 Gigabit: Tuning PCI-X • 16080 byte packets every 200 µs • Intel PRO/10GbE LR Adapter • PCI-X bus occupancy vs mmrbc • Measured times • Times based on PCI-X times from the logic analyser • Expected throughput ~7 Gbit/s 5
Multi-Gigabit flows at SC2003 BW Challenge • Three Server systems with 10 GigEthernet NICs • Used the DataTAG altAIMD stack 9000 byte MTU • Send mem-mem iperf TCP streams From SLAC/FNAL booth in Phoenix to: • Pal Alto PAIX • rtt 17 ms , window 30 MB • Shared with Caltech booth • 4.37 Gbit hstcp I=5% • Then 2.87 Gbit I=16% • Fall when 10 Gbit on link • 3.3Gbit Scalable I=8% • Tested 2 flows sum 1.9Gbit I=39% • Chicago Starlight • rtt 65 ms , window 60 MB • Phoenix CPU 2.2 GHz • 3.1 Gbit hstcp I=1.6% • Amsterdam SARA • rtt 175 ms , window 200 MB • Phoenix CPU 2.2 GHz • 4.35 Gbit hstcp I=6.9% • Very Stable • Both used Abilene to Chicago 6
MB - NG Application design – Throughput • 2Gbyte file transferred RAID0 disks • Web100 output every 10 ms • Gridftp • See alternate 600/800 Mbit and zero • Apache web server + curl-based client • See steady 720 Mbit 7
Caltech PoP - Starlight - Chicago CERN PoP – DataTag - Geneva Dual Zeon 3 GHz Dual Zeon 3 GHz Cisco 7609 JuniperT320 JuniperT320 Cisco 7606 10 GEth OC-192 10 Gb SDH DataTAG Link 10 GEth Send data with TCP Drop Packets Monitor TCP with Web100 v11gva v11chi High Throughput Demo 9
HS-TCP • Slowstart then congestion phases Drop 1 in 106 10
HS-TCP limit slowstart • Drop 1 in 106 but Limited Slowstart 11
HS-TCP 2 • Drop 1 in 106 12
Scalable TCP • Drop 1 in 106 13
Standard Reno TCP • Drop 1 in 106 Transition highspeed to Standard TCP @ 520s 14
Standard Reno TCP • Drop 1 in 106 15
Helping Real Users: Throughput CERN -SARA • Using the GÉANT Backup Link • 1 GByte disk-disk transfers • Blue is the Data • Red is the TCP ACKs • Standard TCP • Average Throughput 167 Mbit/s • Users see 5 - 50 Mbit/s! • High-Speed TCP • Average Throughput 345 Mbit/s • Scalable TCP • Average Throughput 340 Mbit/s • Technology link to DataGrid & GÉANT EU Projects 16
Summary • Multi-Gigabit transfers are possible and stable • Demonstrated that new TCP stacks help performance • DataTAG has made major contributions to understanding of high-speed networking • There has been significant technology transfer between DataTAG and other projects • Now reaching out to real users. • But still much research to do: • Achieve performance – Protocol vs implementation issues • Stability / Sharing issues • Optical transports & hybrid networks 17
MB - NG Gridftp Throughput + Web100 • Throughput Mbit/s: • See alternate 600/800 Mbitand zero • Cwnd smooth • No dup Ack / send stall /timeouts 19
MB - NG http data transfers HighSpeed TCP • Apachie web server out of the box! • prototype client - curl http library • 1Mbyte TCP buffers • 2Gbyte file • Throughput 72 MBytes/s • Cwnd - some variation • No dup Ack / send stall /timeouts 20
The performance of the end host / disks BaBar Case Study: RAID BW & PCI Activity • 3Ware 7500-8 RAID5 parallel EIDE • 3Ware forces PCI bus to 33 MHz • BaBar Tyan to MB-NG SuperMicroNetwork mem-mem 619 Mbit/s • Disk – disk throughput bbcp40-45 Mbytes/s (320 – 360 Mbit/s) • PCI bus effectively full! • User throughput ~ 250 Mbit/s • User surprised !! Read from RAID5 Disks Write to RAID5 Disks 21