1 / 64

The Performance Bottleneck Application, Computer, or Network

The Performance Bottleneck Application, Computer, or Network. Richard Carlson <rcarlson@internet2.edu> eVLBI Workshop – Performance Tuning Tutorial September 17, 2006. Outline. Why there is a problem What can be done to find/fix problems Tools you can use. Basic Premise.

Download Presentation

The Performance Bottleneck Application, Computer, or Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Performance BottleneckApplication, Computer, or Network Richard Carlson <rcarlson@internet2.edu> eVLBI Workshop – Performance Tuning Tutorial September 17, 2006

  2. Outline • Why there is a problem • What can be done to find/fix problems • Tools you can use

  3. Basic Premise • Application’s performance should meet your expectations! • If they don’t you should complain! • But you have to complain effectively.

  4. Questions • How many times have you said: • What’s wrong with the network? • Why is the network so slow? • Do you have any way to find out? • Tools to check local host • Tools to check local network • Tools to check end-to-end path

  5. Unfortunate Reality • Every problem, regardless of cause, exhibits the same symptom • The application performance doesn’t meet the users expectations!

  6. Possible Bottlenecks • Network infrastructure • Host computer/appliance • Application design

  7. Simple Network Picture Bob’s Host Network Infrastructure Carol’s Host

  8. Network Infrastructure Switch 2 Switch 3 R5 Switch 2 Switch 3 R4 R5 R8 R4 R8 R1 R1 R3 R6 R3 R6 Switch 1 Switch 1 R9 R9 R2 R2 R7 R7 Switch 4 Switch 4

  9. Network Infrastructure Bottlenecks • Links too small • Using FastEthernet instead of Gigabit Ethernet • Links congested • Too many hosts crossing this link • Scenic routing • End-to-end path is longer than it needs to be • Broken equipment • Bad NIC, broken wire/cable, cross-talk • Administrative restrictions • Firewalls, Filters, shapers, restrictors

  10. Host Computer Bottlenecks • CPU utilization • What else is the processor doing? • Memory limitations • Main memory and network buffers • I/O bus speed • Getting data into and out of the NIC • Disk access speed

  11. Application Behavior Bottlenecks • Chatty protocol • Lots of short messages between peers • High reliability protocol • Send packet and wait for reply before continuing • No run-time tuning options • Use only default settings • Blaster protocol • Ignore congestion control feedback

  12. Problems, Problems, Problems • Problems can exist at multiple levels • Network infrastructure • Host computer • Application design • Multiple problems can exist at the same time • All problems must be found and fixed before things get better

  13. Transport Protocols 101 • Transmission Control Protocol (TCP) • Provides applications with a reliable in-order delivery service • The most widely used Internet transport protocol • Web, File transfers, email, P2P, Remote login • User Datagram Protocol (UDP) • Provides applications with an unreliable delivery service • RTP, DVTS, DNS

  14. Outline • Why there is a problem • What can be done to find/fix problems • Tools you can use

  15. Remote Image Processing • Carol is analyzing astronomical images. Bob needs to send a data file containing digital images (50 MB per file) to Carol every ½ hour. Bob and Carol are 2,000 miles apart. How long should each transfer take? • 5 minutes? • 1 minute? • 5 seconds?

  16. What should we expect? • Assumptions: • 100 Mbps Fast Ethernet is the slowest link • 50 msec round trip time • Bob & Carol calculate: • 50 MB * 8 = 400 Mbits • 400 Mb / 100 Mb/sec = 4 seconds

  17. Initial Test Results

  18. Initial Test Results • 18 Minutes!!! This is unacceptable! • First look for network infrastructure problem • Use NDT tester to examine both hosts

  19. Initial NDT testing shows Duplex Mismatch at one end

  20. NDT Found Duplex Mismatch • Investigating this it is found that the switch port is configured for 100 Mbps Full-Duplex operation. • Network administrator corrects configuration and asks for re-test

  21. Duplex Mismatch Corrected

  22. SCP results after Duplex Mismatch Corrected

  23. Intermediate Results • Time dropped from 18 minutes to 40 seconds. • Is this acceptable??? • Remember your calculations said it should take 4 seconds. • 400 Mb / 40 sec = 10 Mbps • Why are we limited to 10 Mbps? • Are you satisfied with 1/10th of the possible performance?

  24. Default TCP window size

  25. Calculating the Window Size • Remember Bob found the round-trip time was 50 msec • Calculate window size limit • 85.3KB * 8 b/B = 698777 b • 698777 b / .050 s = 13.98 Mbps • Stated another way • 698777 b / 100 Mb/s = 6.99 msec • 43 msec of idle time every RTT

  26. Calculating the Window Size • Calculate new window size • (100 Mb/s * .050 s) / 8 b/B = 610.3 KB • Use 8MB for testing purposes

  27. Resetting Window Buffer

  28. Intermediate Results • Use application specific options to manually reset buffer size • Fixes problem for this application • Doesn’t fix problem for other applications • Need better ‘default behavior’ for all applications

  29. With TCP window size tuned

  30. Steps so far • Found and fixed Duplex Mismatch • Network Infrastructure problem • Found and fixed TCP window size values • Host configuration problem • Are we done yet?

  31. SCP results with auto-tuning enabled

  32. Intermediate Results • SCP still runs slower than expected • Hint: SSH uses internal buffers • Design choice by Application Developers limit performance • Patch available from PSC

  33. SCP Results with tuned SCP

  34. Final Results • Fixed infrastructure problem • Fixed host configuration problem • Fixed Application configuration problem • Achieved target time of 4 seconds to transfer 50 MB file over 2000 miles

  35. Follow-up questions • What would have happened if I tried the patched SCP version before fixing the TCP buffer problem? • Would not have been able to see improvement. • Discard patch because “it didn’t work”?

  36. Why is it hard to Find/Fix Problems? • Network infrastructure is complex • Network infrastructure is shared • Network infrastructure consists of multiple components

  37. Shared Infrastructure • Other applications accessing the network • Remote disk access • Automatic email checking • Heartbeat facilities • Other computers are attached to the closet switch • Uplink to facility infrastructure • Other users on and off site • Uplink from facility to gigapop/backbone

  38. Other Network Components • DHCP (Dynamic Host Resolution Protocol) • At least 2 packets exchanged to configure your host • DNS (Domain Name Resolution) • At least 2 packets exchanged to translate FQDN into IP address • Multiple addresses require a sequential search • Network Security Devices • Intrusion Detection, VPN, Firewall

  39. Why is it hard to Find/Fix Problems? • Computers have multiple components • Each Operating System (OS) has a unique set of tools to tune the network stack • Network Interface Cards also have tuning options • Application Appliances come with few knobs and limited options

  40. Computer Components • Main CPU (clock speed) • Front & Back side bus • Main Memory • I/O Bus (ATA, SCSI, SATA) • Disk (access speed and size)

  41. Computer Issues • Lots of internal components with multi-tasking OS • Lots of tunable TCP/IP parameters that need to be ‘right’ for each possible connection

  42. Why is it hard to Find/Fix Problems? • Applications depend on default system settings • Problems scale with distance • More access to remote resources • 80/20 % rule since the early 1990’s, 80% of your traffic leaves your local network

  43. Default System Settings • For Linux 2.6.13 there are: • 11 tunable IP parameters • 45 tunable TCP parameters • 148 Web100 variables (TCP MIB) • Currently no OS ships with default settings that work well over trans-continental distances • Some applications allow run-time setting of some options • 30 settable/viewable IP parameters • 24 settable/viewable TCP parameters • There are no standard ways to set run-time option ‘flags’

  44. Application Issues • Setting tunable parameters to the ‘right’ value • Getting the protocol ‘right’

  45. Outline • Why there is a problem • What can be done to find/fix problems • Tools you can use

  46. Tools, Tools, Tools • Ping • Traceroute • Iperf • Tcpdump • Tcptrace • BWCTL • NDT • OWAMP • AMP • Advisor • Thrulay • Web100 • MonaLisa • pathchar • NPAD • Pathdiag • Surveyor • Ethereal • CoralReef • MRTG • Skitter • Cflowd • Cricket • Net100

  47. Active Measurement Tools • Tools that inject packets into the network to measure some value • Available Bandwidth • Delay/Jitter • Loss • May require bi-directional traffic or synchronized hosts • May require running test program on both hosts

  48. Passive Measurement Tools • Tools that monitor existing traffic on the network and extract some information • Bandwidth used • Jitter • Loss rate • May generate some privacy and/or security concerns

  49. How do you set realistic Expectations? • Assume network bandwidth exists or find out what the limits are • Local LAN connection • Site Access link • Monitor the link utilization occasionally • Weathermap • MRTG graphs • Look at your host config/utilization • What is the CPU utilization

  50. Distance Matters • It’s harder to go fast over a long distance • TCP congestion control requires numerous round trips to prevent flooding network • TCP buffer limits can stop sender from injecting new data into the network • Application can exhibit poor behavior when used over long distances

More Related