280 likes | 475 Views
USN. Testbeds. Les Cottrell Site visit to SLAC by DoE program managers Thomas Ndousse & Mary Anne Scott April 27, 2005 www.slac.stanford.edu/grp/scs/net/talk05/testbeds-apr05.ppt. Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM).
E N D
USN Testbeds Les Cottrell Site visit to SLAC by DoE program managers Thomas Ndousse & Mary Anne Scott April 27, 2005 www.slac.stanford.edu/grp/scs/net/talk05/testbeds-apr05.ppt Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM)
CalTech/UMich lead, NSF funded project • Hybrid circuits (IP & dedicated)
UltraScienceNet • ORNL lead, DoE funded • Dedicated circuits
UL Testbed 10Gbits/s • Sunnyvale (interim until get ESnet 10Gbps circuits to SLAC, July 2005): • Currently UltraLIght • Cisco 6509 from UltraLight proposal • Four Sun v20z 1.8GHz Opterons loaned from BaBar • 10GE TOE NICs loaned from Chelsio • 4 10GE Neterion(S2io) 10GE NICs purchased • Installed with Solaris-10 and Linux 2.6 • Will get file server from Caltech • Remote management • Purchased/installed terminal server to provide console access • Purchased/installed remote power management • Connect Cisco to 10Gbps UltraLight circuit • Interim USN IP connection imminent
Sunnyvale set up • Hosts have Solaris 1-, Linux 2.6, Neterio & Chelsio 10GE NICs 10Gbits/s UltraLight (192.84.86.x) 10Mbps management(134.164.37.x) Console http://137.164.37.3 power management CENIC A2 Power .18 A3 .19 A4 Hub A5 Terminal Server A6 Compute servers .3 .6 .5 .8 .7 .4 UltraLight
Approaching 10Gbps performance • Jumbo frames (1500Bytes std => 9000Bytes), factor of 6 improvement in recovery rate • Not an IEEE standard • May break some UDP applications • Not supported on many LANs • Sender mods only, HENP model is few big senders, lots of smaller receivers • Simplifies deployment, only a few hosts at a few sending sites • So no Dynamic Right Sizing (DRS) at receiver • XCP/ECN needs router mods so hard to deploy a new Internet
Hardware Assists • For 1Gbits/s paths, cpu, bus etc. not a problem • For 10Gbits/s they are important • NIC assistance to the CPU is becoming popular • Checksum offload • Interrupt coalescence • Large send/receive offload (LSO/LRO) • TCP Offload Engine (TOE) • Several vendors for 10Gbits/s NICs, at least one for 1Gbits/s NIC • But currently restricts to using NIC vendor’s TCP implementation • Most focus is on the LAN • Cheap alternative to Infiniband, MyriNet etc.
10Gbps test • Sunfire vx0z, Linux & Solaris 10, Chelsio & Neterion • Back-to-back (LAN) testing at SLAC • SNV to LA • At SC2004 using two 10Gbps dedicated paths between Pittsburgh and Sunnyvale • Using Solaris 10 (build 69) and Linux 2.6 • On Sunfire Vx0z (dual & quad 2.4GHz 64 bit AMD Opterons) with PCI-X 133MHz 64 bit • Only 1500 Byte MTUs • Achievable performance limits (using iperf) • TOE (Chelsio) vs no TOE (Neterion(S2io)) • LSO vs no LSO support • Solaris 10 vs Linux • UDTv2 evaluation
CPU Utilization • Receiver needs 20% less CPU than sender for high throughput • Single stream limited by 1.8GHz CPU For Neterion with LSO & Linux: Sender appears to use more CPU than receiver as the throughput increases
Effect of Jumbos • Throughput SLAC-CENIC LA (1 stream, 2MB window with LSO Neterion(S2io)/Linux): • 1500B MTU 1.8 Gbps • 9000B MTU 6 Gbps • Sender CPU: GHz/Gbps (single stream with LSO Neterion/Linux): • 1500B MTU = 0.5 ± 0.13 GHz/Gbps • 9000B MTU = 0.3 ± 0.07 GHz/Gbps • Factor 1.7 improvement For Neterion with LSO &Linux on WAN, Jumbos have a huge effect on performance and also improve CPU utilization
Effect of LSO • v20z 1.8GHz, Linux 2.6, S2io, 2 streams SLAC to Caltech, 8MB window: • With LSO: 7.4Gbits/s, • Without LSO: 5.4Gbits/s, • LAN (3 streams, 164KB window) • Solaris => Linux: 6.4Gbps (No LSO support in Solaris 10 at the moment) • Linux => Solaris-10: 4.8Gbps (LSO turned off sender) • Linux => Solaris-10: 7.54Gbps (LSO turned on) 1 stream For Neterion with Linux on LAN LSO improves CPU utilization by a factor of 1.4. If one is CPU limited this will also improve throughput.
Solaris vs Linux • Send from one to the other single stream • Compare send from Linux Neterion + LSO with send from Solaris 10 without LSO • LSO support for Solaris coming soon • With one stream Solaris sender sends faster • Sol slightly better GHz/Gbps GHz/Gbps: Solaris 0.287+-0.001; Linux 0.303+-0.001
Solaris vs Linux multi-streams 7.5Gbps When optimize for multiple streams, Linux + LSO sender is better 6.4Gbps 1MB LAN MTU: 9400B S2io 2MB 4MB • Solaris without LSO performs poorly with multiple streams (LSO or OS related?) • Its GHz/Gbps is poorer than Linux+LSO for multiple streams
Chelsio • Chelsio to Chelsio (TOE) • With 2.4GHz V20zs from Pittsburgh to SNV • 1500Byte MTUs • Reliably able to get 7.4-7.5 Gbps (16 streams) • GHz/Gbps Chelsio(MTU=1500B) ~ Neterion (9000B) Chelsio(TOE)
SLAC Connection • Part of ESnet Bay Area MAN • Will be 4 * 10GE circuits, 2 in 2 out for ring • QWest will connect to Stanford in next fortnight • Then cross-connect to SLAC/Stanford fibers and thus to SLAC • Working with Stanford to ID fiber pairs
SC2004: Tenth of a Terabit/s Challenge • Joint Caltech, SLAC, FNAL, CERN, UF, SDSC, BR, KR, …. • 10 10 Gbps waves to HEP on show floor • Bandwidth challenge: aggregate throughput of 101.13 Gbps • FAST TCP
Bandwidth Challenge The prize! >100 Gbps aggregate Large collaboration of academia and industry Took a lot of “wizards” to make it work
Conclusions • UDT limit was ~ 4.45Gbits/s • Cpu limited • TCP Limit was about 7.5±0.07 Gbps, regardless of: • Whether LAN (back to back) or WAN • TCP Gating factor=PCI-X 133Mhz ≡ 7.5Gbps • One host with 4 cpus & 2 NICs sent 11.5±0.2Gbps to two dual cpu hosts with 1 NIC each • Two hosts to two hosts (1 NIC/host) on one 10Gbps link 9.07Gbps goodput forward & 5.6Gbps reverse
Conclusions • Jumbos can be a big help • LSO is helpful (Neterion) • For best throughput Linux+LSO sender better • Without LSO Solaris provides more throughput • Solaris without LSO has problems with multiple streams • TOE (Chelsio) allows one to avoid 9000Byte MTUs
Conclusions • Need testing on real networks • Controlled simulation & emulation critical for understanding • BUT need to verify, and results can look different than expected • Needs honest independent broker (SLAC) • Don’t care who wins, have the contacts, reputation, testbeds etc. • Not really funded for this
Next Steps • Evaluate various offloads(TOE, LSO, LRO ...), • Evaluate OS support: Solaris 10 support of LSO, untangle Solaris Linux, Chelsio/TOE on Solaris, leverage industry contacts • New buses: PCI-X 266Mhz and PCI-Express important, need NICs/hosts to support then evaluate • Install IEPM-BW on 10Gbps testbed • Evaluate existing tools at 10Gbits/s • Explore new tools for 10Gbits/s • Exploit relationships with Neterion/Chelsio to work with packet pair timing aided by NICs • Install Passive tools (on 10Gbps testbeds and work with BNL to help achieve mission)) • Evaluate Netflow measurement & analysis at 10Gbits/s • Privacy issues • Use SNMP to access MIBs utilization etc.
Acknowledgements • Gary Buhrmaster*, Parakram Khandpur*, Harvey Newmanc, Yang Xiac, Xun Suc, Dan Naec,Sylvain Ravotc, Richard Hughes-Jonesm, Michael Chen+, Larry McIntoshs, Frank Leerss, Leonid Grossmann, Alex Aizmann • SLAC*, Caltechc, Manchester Universitym, Chelsio+, Suns, Neterion(S2io)n
Further Information • Web site with lots of plots & analysis • www.slac.stanford.edu/grp/scs/net/papers/pfld05/ruchig/Fairness/ • Inter-protocols comparison (Journal of Grid Comp, PFLD04) • www.slac.stanford.edu/cgi-wrap/getdoc/slac-pub-10402.pdf • SC2004 details • www-iepm.slac.stanford.edu/monitoring/bulk/sc2004/
When will it have an impact • ESnet traffic doubling/year since 1990 • SLAC capacity increasing by 90%/year since 1982 • SLAC Internet traffic increased by factor 2.5 in last year • International throughput increase by factor 10 in 4 years • So traffic increases by factor 10 in 3.5 to 4 years, so in: • 3.5 to 5 years 622 Mbps => 10Gbps • 3-4 years 155 Mbps => 1Gbps • 3.5-5 years 45Mbps => 622Mbps • 2010-2012: • 100s Gbits for high speed production net end connections • 10Gbps will be mundane for R&E and business • Home broadband: doubling ~ every year, 100Mbits/s by end of decade • Aggressive Goal: 1Gbps to all Californians by 2010 Throughput from US Throughput Mbits/s
What was special? • End-to-end application-to-application, single and multi-streams (not just internal backbone aggregate speeds) • TCP has not run out of stream yet, scales from modem speeds into multi-Gbits/s region • TCP well understood, mature, many good features: reliability etc. • Friendly on shared networks • New TCP stacks only need to be deployed at sender • Often just a few data sources, many destinations • No modifications to backbone routers etc • No need for jumbo frames • Used Commercial Off The Shelf (COTS) hardware and software
What was Special 2/2 • Raise the bar on expectations for applications and users • Some applications can use Internet backbone speeds • Provide planning information • The network is looking less like a bottleneck and more like a catalyst/enabler • Reduce need to colocate data and cpu • No longer ship literally truck or plane loads of data around the world • Worldwide collaborations of people working with large amounts of data become increasingly possible
Who needs it? • HENP – current driver • Multi-hundreds Mbits/s and Multi TByte files/day transferred across Atlantic today • SLAC BaBar experiment already has a PByte stored • Tbits/s and ExaBytes (1018) stored in a decade • Data intensive science: • Astrophysics, Global weather, Bioinformatics, Fusion, seismology… • Industries such as aerospace, medicine, security … • Future: • Media distribution • Gbits/s=2 full length DVD movies/minute • 100 Gbits/s is equivalent to • Download Library of Congress in < 14 minutes • Three full length DVDs in a second • Will sharing movies be like sharing music today?