350 likes | 555 Views
Size Matters: Performance B enefits (and Obstacles) of J umbo Packets. 9k MTU Project. test global path MTU on Abilene, CA*net4, CUDI and other R & E networks, plus create a useful researcher mapping tool Internet2 ATEAM - Advanced Test Engineering and Measurement www.ateam.info
E N D
Size Matters:Performance Benefits (and Obstacles) of Jumbo Packets
9k MTU Project • test global path MTU on Abilene, CA*net4, CUDI and other R & E networks, plus create a useful researcher mapping tool • Internet2 ATEAM - Advanced Test Engineering and Measurement • www.ateam.info • Bill Rutherford (Rutherford Research/GAIT – Project Leader) • Kevin Walsh, Nathaniel Mendoza (San Diego Supercomputing Center/SDSC) • John Moore (Centaur Internet2 Technology Evaluation Center ITEC/NCSU North Carolina State University) • Loki Jorgenson (Apparent Networks/SFU) • Paul Schopis(Internet2 Technology Evaluation Center/ITEC-Ohio/ OARnet) • Jorge Hernandez Serran(CUDI2/UNAM Mexico) • Dave Hartzell (NASA Ames Research Center) • Bill Jones (University of Texas Austin) • Woojin Seok (Supercomputing Center Korea/KISTI)
9k MTU Project • Preliminary project flow • Several Internet2 Joint Techs presentations • Participation in HEP TRIUMF to CERN test run (Corrie Kost, Steven McDonald) • Collaboration with equipment vendors • Comprehensive testing on Abilene and CA*net4 • First international 9k connection between I2 and C4 via StarLight • Academic network and mapping system
9k MTU Project • Contributions • Matt Mathis (Pittsburg Supercomputing Center) • Theoretical considerations MTU role in TCP • http://www.psc.edu/~mathis/MTU • Joe St. Sauver (University of Oregon) • Practical MTU considerations for campus and equipment issues • http://darkwing.uoregon.edu/~joe/jumbos/jumbo-frames.ppt • Phillip Dykstra (Chief Scientist, WareOnEarth Communications Inc.) • MTU related network tuning issues • http://sd.wareonearth.com/woe/Briefings/tcptune.ppt • Bryan Caron (Network Manager Subatomic Physics, University of Alberta) • CA*net4 testing • http://www.phys.ualberta.ca/~caron/
Spirent Communications SmartBits 6000 series network analyzer • http://www.spirentcom.com • automated testing from scripts • high level of accuracy • Apparent Networks AppareNet network intelligence system • http://www.apparentNetworks.com 9k MTU Project - Tools and Equipment • NLANR Iperf • http://dast.nlanr.net/Projects/Iperf • tool to measure maximum TCP bandwidth • reports bandwidth, delay, jitter, datagram loss
Why Jumbo? • Performance • Benefits for high performance transfers • High Energy Physics – TRIUMF to CERN test run • National Light Rails/Paths • Grid Networks/Next Generation Clusters • Meteorology / Astrophysics / Bioinformatics • Collaborative/interactive/video – access grid • End-to-end path • From NIC-to-NIC MTU requirement • End station is typically the bottleneck • Advent of Gig-E to the desktop
0.7 * Max Segment Size (MTU) • e2e throughput < • Round Trip Time (latency) sqrt[loss] • M. Mathis, et.al. TCP Steady State • If TCP window size and network capacity are not rate limiting factors then (roughly): • Double the MSS, double the throughput • Halve the latency, double the throughput (shortest path matters) • Halve the loss rate, 40% higher throughput
About aNA • appareNet Network for Academics • Currently 16 sequencers across CA*net and Abilene • NIS in Vancouver, Canada • 10 Gig-E/Jumbo hosts • 4 nodes in Canada • BCNET • Netera Alliance • CA*net NOC • ACORN-NS
– network intelligence • Uses light, non-instrusive, adaptive active probing • ICMP or UDP packets in various configurations • Point-and-shoot to most IP addresses • Performs comprehensive network path characterization • Performs expert system diagnostics • Single-ended two-way measures (e.g. half-duplex different from full-duplex) • Samples network to generate same view as best effort application (pre-TCP)
Abilene & CA*net Testing - 2003 9000 MTU 8192 MTU 7168 MTU 6144 MTU 5120 MTU 4096 MTU 3072 MTU 2048 MTU 512 MTU
L2 Trends • Cisco ONS 15454 up to 10000 MTU • CA*net4 L2 is implemented with ONS 15454 • Cisco Catalyst 6000/3750 up to 9216/9018 MTU • Foundry BigIron MG8 up to 9000 MTU • “Jumbo frame support, up to 9 Kb, to expand data payload for network intense data transfer applications such as Storage Area Network (SAN) and Grid Computing.” • Nortel Bay Stack 380 up to 9216 MTU • “Jumbo frame support of up to 9,216 bytes is provided on each port for applications requiring large frames such as graphics and video applications.” • Intel gigE and 10 x gigE NICs up to 16128 MTU • Syskonnect gigE NICs up to 9000 MTU
L3 Trends • Cisco 12000/7300 up to 9180/9192 MTU • Juniper M & T series up to 9192 MTU • Abilene backbone mainly Juniper T640 • CA*net4 backbone are Juniper M20 or M40 • Extreme 10800 series up to 9126 MTU • “Jumbo Frames – Studies show server CPU utilization is reduced by as much as 50% with the use of jumbo frames in clustering applications. Extreme Networks has optimized around support for a 9K jumbo frame that delivers the most optimized performance for cluster applications.”
Scalability Issues • current code approach scalable? • strategy for minimizing memory footprint and processing overhead? • implications for protocols? • more stack tuning? (e.g. variable packet length?) • byte counters? (e.g. IPv6 has a 16 bit counter) • inter packet gaps? (e.g. IEEE 802.3z burst mode)
A Look Ahead • Next-generation optical network-based virtual memory(VM) • VM paging from disk scales with block transfer rate and mechanical seek latency • VM paging from network scales with packet transfer rate and round trip time • VM thrashing when OS is dominated by slow virtual memory swaps
Application Layer • e2e application layer sensitivity look ahead • Video or graphics (Nortel) • Throughput, CPU utilization, Jitter, Drops • Storage Area Network and Grid (Foundry) • Throughput, CPU utilization • Cluster applications (Extreme) • Throughput, CPU utilization
Initial CA*net4 Runs • SDSC to Halifax
Initial CA*net4 Runs • SDSC to CANARIE
Initial CUDI Runs • SDSC to UNAM
MTU handling via Fragmentation • Advantages: • commonly implemented • Disadvantages: • extreme load on router • some clients cannot reassemble packets • Applications: • ping • router advertisements
MTU handling via RFC 1191 PMTU discovery • Advantages: • Router is not loaded • Maximum performance achieved • Disadvantages: • reliance on ICMP • easy to mis-configure • Applications: • almost all modern applications
GigE Black Hole Hop • What is happening?: • RFC 1191 and “TCPSlow Start” are interacting • Packets are lost • Retransmission happens, causing performance degradation • Client responds to some packets, keeping connectionopen • Overall performance appears slow to client
MTU: 9000 Avoiding GigE MTU problems • Maintain logical Layer 3 diagrams • Assign MTUs based on a per-subnet basis • Be consistent with MTU values used • Use 1500 bytes for legacy Ethernet (no registry hacks) • We recommend 9000 bytes MTU for GigE when jumbo frames are used (standard for Internet2 Abilene Network) • Remember to add 18 bytes when adjusting frame size (e.g. set NIC to 9018 bytes frame size to maintain a 9000 byte MTU) • Remember not to arbitrarily filter out ICMP messages • Careful use of VLANs • Use of Layer 3 devices at MTU boundaries
Path MTU Map Service • Researcher tool to troubleshoot and help optimize path MTU
Resources • Some Path MTU tools: • ANA pMTU service – from ANA sequencers across I2/CA*nethttp://pathmtu.apparenet.com:8282/ana@apparenet.com:guest42 • NCNE MTU Discovery Service – uses service located at NCNE http://www.ncne.org/jumbogram/mtu_discovery.php • pMTU Applet - Java-based client for end-user station http://sourceforge.net/projects/pmtu/ • Jumbo MTU Performance whitepaper • http://www.apparentNetworks.com/wp/
Demo: pMTU Client Demo pMTU applet