450 likes | 554 Views
piPEs Tools in Action. Rich Carlson rcarlson@internet2.edu May 3, 2005. Outline. Brief Introduction to tools Using NDT Using BWCTL Using OWAMP PerfSonar in your future? Conclusions. Overview. Multiple tools provide Different perspectives into the problem Insight into the E2E path
E N D
piPEs Toolsin Action Rich Carlson rcarlson@internet2.edu May 3, 2005
Outline • Brief Introduction to tools • Using NDT • Using BWCTL • Using OWAMP • PerfSonar in your future? • Conclusions
Overview • Multiple tools provide • Different perspectives into the problem • Insight into the E2E path • What can these tools tell you • How hosts are (mis)tuned • How individual links are operating • What can’t these tools tell you • Exactly what parameters to change to fix the problem
piPEs Integration Internet2 Detective Network Detect “Detective” Monitoring Applet Discovery Analysis Interface Web Service Module Module Measurement Authorize DomainInterface (MDI) Performance Measurement Domain (PMD) Performance Schedule Measurement Controller (PMC) Performance Measurement Point (PMP) Test BWCTL OWAMP TraceRoute NDT Store Database
Basic Premise • Application’s performance should meet your expectations! • If they don’t you should complain!
Questions • How many times have you’re users said: • What’s wrong with the network? • Why is the network so slow? • Do they have any way to find out? • Tools to check local host • Tools to check local network • Tools to check end-to-end path
Underlying Assumption • When problems exist, it’s the networks fault!
Unfortunate Reality • Every problem, regardless of cause, exhibits the same symptom • The application performance doesn’t meet the users expectations!
Example – SCP file transfer • Bob and Carol are collaborating on a project. Bob needs to send a copy of the data (50 MB) to Carol every ½ hour. Bob and Carol are 2,000 miles apart. How long should each transfer take? • 5 minutes? • 1 minute? • 5 seconds?
What should we expect? • Assumptions: • 100 Mbps Fast Ethernet is the slowest link • 50 msec round trip time • Bob & Carol calculate: • 50 MB * 8 = 400 Mbits • 400 Mb / 100 Mb/sec = 4 seconds
Initial Test Results • This is unacceptable! • First look for network infrastructure problem • Use NDT tester to examine both hosts
NDT Found Duplex Mismatch • Investigating this it is found that the switch port is configured for 100 Mbps Full-Duplex operation. • Network administrator corrects configuration and asks for re-test
Intermediate Results • Time dropped from 18 minutes to 40 seconds. • But our calculations said it should take 4 seconds! • 400 Mb / 40 sec = 10 Mbps • Why are we limited to 10 Mbps? • Are you satisfied with 1/10th of the possible performance?
Calculating the Window Size • Remember Bob found the round-trip time was 50 msec • Calculate window size limit • 85.3KB * 8 b/B = 698777 b • 698777 b / .050 s = 13.98 Mbps • Calculate new window size • (100 Mb/s * .050 s) / 8 b/B = 610.3 KB • Use 8MB for testing purposes
Steps so far • Found and fixed Duplex Mismatch • Network Infrastructure problem • Found and fixed TCP window size values • Host configuration problem • Are we done yet?
Intermediate Results • SCP still runs slower than expected • Hint: SCP uses internal buffers • Patch available from PSC
Final Results • Fixed infrastructure problem • Fixed host configuration problem • Fixed Application configuration problem • Achieved target time of 4 seconds to transfer 50 MB file over 2000 miles
Using BWCTL Scenario 2: User complains that network is slow, FTP downloads taking too long. Suggestion 1: Use BWCTL to decompose E2E path
Using BWCTL: traceroute data Step 1: run traceroute to remote site. • traceroute to www.auburn.edu (131.204.2.251), 30 hops max, 38 byte packets • 1 nonprod-rtr.internet2.edu (207.75.164.65) 0.950 ms 0.665 ms 1.251 ms • 2 e0.aadl.mich.net (198.108.90.129) 0.721 ms 0.510 ms 0.493 ms • 3 192.122.182.18 (192.122.182.18) 7.097 ms 13.766 ms 18.293 ms • 4 chin-mren-ge (198.32.11.97) 50.335 ms 7.048 ms 6.896 ms • 5 iplsng-chinng (198.32.8.77) 12.394 ms 28.040 ms 10.946 ms • 6 atlang-iplsng (198.32.8.78) 21.830 ms 21.959 ms 21.727 ms • 7 sox-rtr.abilene.sox.net (199.77.193.9) 22.301 ms 22.234 ms 22.427 ms • 8 gatech-to-56marietta-rtr.sox.gatech.edu (199.77.194.41) 22.500 ms 22.178 ms 22.370 ms • 9 131.204.254.5 (131.204.254.5) 28.560 ms 28.292 ms 28.272 ms • 10 131.204.2.251 (131.204.2.251) 28.486 ms 28.290 ms 28.409 ms
Using BWCTL: testing strategy Step 3: Decompose path and run tests to various points • Ann Arbor to Abilene Ingress: • Ann Arbor to Abilene Egress: • Third party test between Ingress and Egress: • Look at Abilene web site for Abilene ingress to egress data.
Using BWCTL: commands bwctl -L90 -i2 -t20 -w8388608 -A AE AESKEY rcarlson .aeskey -c nms1-ipls.abilene.ucaid.edu bwctl -L90 -i2 -t20 -w8388608 -c nms1-sttl.abilene.ucaid.edu AE AESKEY rcarlson .aeskey
3rd party testing: command bwctl -L90 -i2 -t20 -w8388608 \ -A AE AESKEY rcarlson .aeskey \ -c nms1-sttl.abilene.ucaid.edu \ -s nms1-ipls.abilene.ucaid.edu
Using OWAMP Scenario 3: User complains that network is slow Run owping and bwctl to look for signs of congestion. Run owping command to Abilene ingress node Run bwctl to some distant Abilene node Run owstats –a99 command to see results
Caveat's • Are you testing the network or the server? • Diagnostic and testing servers may have tuning/performance problems too • Lots of tuning guides on-line (PSC, LBNL, EU, …) • Are you using auto or manual tuning? • Some OS’s may autotune, are these settings correct • User may disable autotuning • Do you have permission to run tests? • BWCTL requires pre-configuration of Authentication keys
PerfSonar – Next Steps in Performance Monitoring • New Initiative involving multiple partners • ESnet (DOE labs) • GEANT (European Research and Education network) • Internet2 (Abilene and connectors)
PerfSonar – Router stats on a path • Demo ESnet tool https://performance.es.net/cgi-bin/perfsonar-trace.cgi Paste output from Traceroute into the window and view the MRTG graphs for the routers in the path Author: Joe Metzger ESnet
Conclusions • Every problem seems to be a slow network • TCP hides lower layer problems • Each tool can help solve a part of the problem • NDT can identify configuration problems and can provide some tuning assistance • BWCTL can find poor performing links • OWAMP can identify when links are becoming congested