1 / 21

Final EU DataTAG Review High Performance Network Demonstration March 24, 2004

Final EU DataTAG Review High Performance Network Demonstration March 24, 2004. Richard Hughes-Jones The University of Manchester, UK. It works ? what’s the Problem with TCP. TCP has 2 phases: SLowstart & Congestion Avoidance AIMD and High Bandwidth – Long Distance networks

landis
Download Presentation

Final EU DataTAG Review High Performance Network Demonstration March 24, 2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Final EU DataTAG ReviewHigh Performance Network Demonstration March 24, 2004 Richard Hughes-Jones The University of Manchester, UK DataTAG is a project funded by the European Commission under contract IST-2001-32459

  2. It works ?what’s the Problem with TCP • TCP has 2 phases: SLowstart & Congestion Avoidance • AIMD and High Bandwidth – Long Distance networks Poor performance of TCP in high bandwidth wide area networks is due in part to the TCP congestion control algorithm - cwnd congestion window • For each ack in a RTT without loss: cwnd -> cwnd + a / cwnd- Additive Increase, a=1 • For each window experiencing loss: cwnd -> cwnd – b (cwnd)- Multiplicative Decrease, b= ½ • Time to recover from 1 packet loss ~100 ms rtt: 2

  3. Investigation of new TCP Stacks • High Speed TCP a and b vary depending on current cwnd using a table • a increases more rapidly with larger cwnd – returns to the ‘optimal’ cwnd size sooner for the network path • b decreases less aggressively and, as a consequence, so does the cwnd. The effect is that there is not such a decrease in throughput. • Scalable TCP a and b are fixed adjustments for the increase and decrease of cwnd • a = 1/1 00 – the increase is greater than TCP Reno • b = 1/8 – the decrease on loss is less than TCP Reno • Scalable over any link speed. • Fast TCP Uses round trip time as well as packet loss to indicate congestion with rapid convergence to fair equilibrium for throughput. • HSTCP-LP High Speed (Low Priority) – backs off if rtt increases • BiC-TCP – Additive increase large cwnd; binary search small cwnd • H-TCP – after congestion standard then switch to high performance • ●●● 3

  4. Comparison of TCP Stacks • TCP Response Function • Throughput vs Loss Rate – steeper: faster recovery • Drop packets in kernel MB-NG rtt 6ms DataTAG rtt 120 ms 4

  5. mmrbc 512 bytes mmrbc 1024 bytes mmrbc 2048 bytes CSR Access PCI-X Sequence Data Transfer Interrupt & CSR Update mmrbc 4096 bytes 5.7Gbit/s 10 Gigabit: Tuning PCI-X • 16080 byte packets every 200 µs • Intel PRO/10GbE LR Adapter • PCI-X bus occupancy vs mmrbc • Measured times • Times based on PCI-X times from the logic analyser • Expected throughput ~7 Gbit/s 5

  6. Multi-Gigabit flows at SC2003 BW Challenge • Three Server systems with 10 GigEthernet NICs • Used the DataTAG altAIMD stack 9000 byte MTU • Send mem-mem iperf TCP streams From SLAC/FNAL booth in Phoenix to: • Pal Alto PAIX • rtt 17 ms , window 30 MB • Shared with Caltech booth • 4.37 Gbit hstcp I=5% • Then 2.87 Gbit I=16% • Fall when 10 Gbit on link • 3.3Gbit Scalable I=8% • Tested 2 flows sum 1.9Gbit I=39% • Chicago Starlight • rtt 65 ms , window 60 MB • Phoenix CPU 2.2 GHz • 3.1 Gbit hstcp I=1.6% • Amsterdam SARA • rtt 175 ms , window 200 MB • Phoenix CPU 2.2 GHz • 4.35 Gbit hstcp I=6.9% • Very Stable • Both used Abilene to Chicago 6

  7. MB - NG Application design – Throughput • 2Gbyte file transferred RAID0 disks • Web100 output every 10 ms • Gridftp • See alternate 600/800 Mbit and zero • Apache web server + curl-based client • See steady 720 Mbit 7

  8. DataTAG Testbed 8

  9. Caltech PoP - Starlight - Chicago CERN PoP – DataTag - Geneva Dual Zeon 3 GHz Dual Zeon 3 GHz Cisco 7609 JuniperT320 JuniperT320 Cisco 7606 10 GEth OC-192 10 Gb SDH DataTAG Link 10 GEth Send data with TCP Drop Packets Monitor TCP with Web100 v11gva v11chi High Throughput Demo 9

  10. HS-TCP • Slowstart then congestion phases Drop 1 in 106 10

  11. HS-TCP limit slowstart • Drop 1 in 106 but Limited Slowstart 11

  12. HS-TCP 2 • Drop 1 in 106 12

  13. Scalable TCP • Drop 1 in 106 13

  14. Standard Reno TCP • Drop 1 in 106 Transition highspeed to Standard TCP @ 520s 14

  15. Standard Reno TCP • Drop 1 in 106 15

  16. Helping Real Users: Throughput CERN -SARA • Using the GÉANT Backup Link • 1 GByte disk-disk transfers • Blue is the Data • Red is the TCP ACKs • Standard TCP • Average Throughput 167 Mbit/s • Users see 5 - 50 Mbit/s! • High-Speed TCP • Average Throughput 345 Mbit/s • Scalable TCP • Average Throughput 340 Mbit/s • Technology link to DataGrid & GÉANT EU Projects 16

  17. Summary • Multi-Gigabit transfers are possible and stable • Demonstrated that new TCP stacks help performance • DataTAG has made major contributions to understanding of high-speed networking • There has been significant technology transfer between DataTAG and other projects • Now reaching out to real users. • But still much research to do: • Achieve performance – Protocol vs implementation issues • Stability / Sharing issues • Optical transports & hybrid networks 17

  18. 18

  19. MB - NG Gridftp Throughput + Web100 • Throughput Mbit/s: • See alternate 600/800 Mbitand zero • Cwnd smooth • No dup Ack / send stall /timeouts 19

  20. MB - NG http data transfers HighSpeed TCP • Apachie web server out of the box! • prototype client - curl http library • 1Mbyte TCP buffers • 2Gbyte file • Throughput 72 MBytes/s • Cwnd - some variation • No dup Ack / send stall /timeouts 20

  21. The performance of the end host / disks BaBar Case Study: RAID BW & PCI Activity • 3Ware 7500-8 RAID5 parallel EIDE • 3Ware forces PCI bus to 33 MHz • BaBar Tyan to MB-NG SuperMicroNetwork mem-mem 619 Mbit/s • Disk – disk throughput bbcp40-45 Mbytes/s (320 – 360 Mbit/s) • PCI bus effectively full! • User throughput ~ 250 Mbit/s • User surprised !! Read from RAID5 Disks Write to RAID5 Disks 21

More Related