1.09k likes | 1.46k Views
The Performance Bottleneck Application, Computer, or Network Richard Carlson Internet2 Part 1 Outline Why there is a problem What can be done to find/fix problems Tools you can use Ramblings on what’s next Basic Premise Application’s performance should meet your expectations!
E N D
The Performance BottleneckApplication, Computer, or Network Richard Carlson Internet2 Part 1
Outline • Why there is a problem • What can be done to find/fix problems • Tools you can use • Ramblings on what’s next
Basic Premise • Application’s performance should meet your expectations! • If they don’t you should complain!
Questions • How many times have you said: • What’s wrong with the network? • Why is the network so slow? • Do you have any way to find out? • Tools to check local host • Tools to check local network • Tools to check end-to-end path
Underlying Assumption • When problems exist, it’s the networks fault!
Simple Network Picture Bob’s Host Network Infrastructure Carol’s Host
Network Infrastructure Switch 2 Switch 3 R5 R4 R8 R1 R3 R6 Switch 1 R9 R2 R7 Switch 4
Possible Bottlenecks • Network infrastructure • Host computer • Application design
Network Infrastructure Bottlenecks • Links too small • Using standard Ethernet instead of FastEthernet • Links congested • Too many hosts crossing this link • Scenic routing • End-to-end path is longer than it needs to be • Broken equipment • Bad NIC, broken wire/cable, cross-talk • Administrative restrictions • Firewalls, Filters, shapers, restrictors
Host Computer Bottlenecks • CPU utilization • What else is the processor doing? • Memory limitations • Main memory and network buffers • I/O bus speed • Getting data into and out of the NIC • Disk access speed
Application Behavior Bottlenecks • Chatty protocol • Lots of short messages between peers • High reliability protocol • Send packet and wait for reply before continuing • No run-time tuning options • Use only default settings • Blaster protocol • Ignore congestion control feedback
TCP 101 • Transmission Control Protocol (TCP) • Provides applications with a reliable in-order delivery service • The most widely used Internet transport protocol • Web, File transfers, email, P2P, Remote login • User Datagram Protocol (UDP) • Provides applications with an unreliable delivery service • RTP, DNS
Summary – Part 1 • Problems can exist at multiple levels • Network infrastructure • Host computer • Application design • Multiple problems can exist at the same time • All problems must be found and fixed before things get better
Summary – Part 2 • Every problem exhibits the same symptom • The application performance doesn’t meet the users expectations!
Outline • Why there is a problem • What can be done to find/fix problems • Tools you can use • Ramblings on what’s next
Real Life Examples • I know what the problem is • Bulk transfer with multiple problems
Example 1 - SC’04 experience • Booth having trouble getting application to run from Amsterdam to Pittsburgh • Tests between remote SGI and local PC showed throughput limited to < 20 Mbps • Assumption is: PC buffers too small • Question: How do we set WinXP send/receive window size
SC’04 Determine WinXP info http://www.dslreports.com/drtcp
SC’04 Confirm PC settings • DrTCP reported 16 MB buffers, but test program still slow, Q: How to confirm? • Run test to SC NDT server (PC has Fast Ethernet Connection) • Client-to-Server: 90 Mbps • Server-to-Client: 95 Mbps • PC Send/Recv window size: 16 Mbytes (wscale 8) • NDT Send/Recv window Size: 8 Mbytes (wscale 7) • Reported TCP RTT: 46.2 msec • approximately 600 Kbytes of data in TCP buffer • Min window size / RTT: 1.3 Gbps
SC’04 Local PC Configured OK • No problem found • Able to run at line rate • Confirmed that PC’s TCP window values were set correctly
SC’04 Remote SGI • Run test from remote SGI to SC show floor (SGI is Gigabit Ethernet connected). • Client-to-Server: 17 Mbps • Server-to-Client: 16 Mbps • SGI Send/Recv window size: 256 Kbytes (wscale 3) • NDT Send/Recv window Size: 8 Mbytes (wscale 7) • Reported RTT: 106.7 msec • Min window size / RTT: 19 Mbps
SC’04 Remote SGI Results • Needed to download and compile command line client • SGI TCP window is too small to fill transatlantic pipe (19 Mbps max) • User reluctant to make changes to SGI network interface from SC show floor • NDT client tool allows application to change buffer (setsockopt() function call)
SC’04 Remote SGI (tuned) • Re-run test from remote SGI to SC show floor. • Client-to-Server: 107 Mbps • Server-to-Client: 109 Mbps • SGI Send/Recv window size: 2 Mbytes (wscale 5) • NDT Send/Recv window Size: 8 Mbytes (wscale 7) • Reported RTT: 104 msec • Min window size / RTT: 153.8 Mbps
SC’04 Debugging Results • Team spent over 1 hour looking at Win XP config, trying to verify window size • Single NDT test verified this in under 30 seconds • 10 minutes to download and install NDT client on SGI • 15 minutes to discuss options and run client test with set buffer option
SC’04 Debugging Results • 8 Minutes to find SGI limits and determine maximum allowable window setting (2 MB) • Total time 34 minutes to verify problem was with remote SGIs’ TCP send/receive window size • Network path verified but Application still performed poorly until it was also tuned
Example 2 – SCP file transfer • Bob and Carol are collaborating on a project. Bob needs to send a copy of the data (50 MB) to Carol every ½ hour. Bob and Carol are 2,000 miles apart. How long should each transfer take? • 5 minutes? • 1 minute? • 5 seconds?
What should we expect? • Assumptions: • 100 Mbps Fast Ethernet is the slowest link • 50 msec round trip time • Bob & Carol calculate: • 50 MB * 8 = 400 Mbits • 400 Mb / 100 Mb/sec = 4 seconds
Initial Test Results • This is unacceptable! • First look for network infrastructure problem • Use NDT tester to examine both hosts
NDT Found Duplex Mismatch • Investigating this it is found that the switch port is configured for 100 Mbps Full-Duplex operation. • Network administrator corrects configuration and asks for re-test
Intermediate Results • Time dropped from 18 minutes to 40 seconds. • But our calculations said it should take 4 seconds! • 400 Mb / 40 sec = 10 Mbps • Why are we limited to 10 Mbps? • Are you satisfied with 1/10th of the possible performance?
Calculating the Window Size • Remember Bob found the round-trip time was 50 msec • Calculate window size limit • 85.3KB * 8 b/B = 698777 b • 698777 b / .050 s = 13.98 Mbps • Calculate new window size • (100 Mb/s * .050 s) / 8 b/B = 610.3 KB • Use 1MB as a minimum
Steps so far • Found and fixed Duplex Mismatch • Network Infrastructure problem • Found and fixed TCP window values • Host configuration problem • Are we done yet?
Intermediate Results • SCP still runs slower than expected • Hint: SCP uses internal buffers • Patch available from PSC
Final Results • Fixed infrastructure problem • Fixed host configuration problem • Fixed Application configuration problem • Achieved target time of 4 seconds to transfer 50 MB file over 2000 miles
Why is it hard to Find/Fix Problems? • Network infrastructure is complex • Network infrastructure is shared • Network infrastructure consists of multiple components
Shared Infrastructure • Other applications accessing the network • Remote disk access • Automatic email checking • Heartbeat facilities • Other computers are attached to the closet switch • Uplink to campus infrastructure • Other users on and off site • Uplink from campus to gigapop/backbone
Other Network Components • DHCP (Dynamic Host Resolution Protocol) • At least 2 packets exchanged to configure your host • DNS (Domain Name Resolution) • At least 2 packets exchanged to translate FQDN into IP address • Network Security Devices • Intrusion Detection, VPN, Firewall
Network Infrastructure • Large complex system with potentially many problem areas
Why is it hard to Find/Fix Problems? • Computers have multiple components • Each Operating System (OS) has a unique set of tools to tune the network stack • Application Appliances come with few knobs and limited options
Computer Components • Main CPU (clock speed) • Front & Back side bus • Main Memory • I/O Bus (ATA, SCSI, SATA) • Disk (access speed and size)