80 likes | 99 Views
The TCP-ESTATS-MIB. Matt Mathis John Heffner Raghu Reddy Pittsburgh Supercomputing Center Rajiv Raghunarayan Cisco Systems J. Saperia JDS Consulting, Inc IETF 62, March 2005. The TCP Extended Statistics MIB. Use TCP’s ideal diagnostic vantage point
E N D
The TCP-ESTATS-MIB Matt Mathis John Heffner Raghu Reddy Pittsburgh Supercomputing Center Rajiv Raghunarayan Cisco Systems J. Saperia JDS Consulting, Inc IETF 62, March 2005
The TCP Extended Statistics MIB • Use TCP’s ideal diagnostic vantage point • Observe what the path is doing to segments • Observe what the application is doing to TCP • TCP already measures many path properties • RTT, RTT variance, MTU, window size • Easily instrumented to measure other properties • Reordering, loss rate, congestion signals • Instrument why tcp_output() stops sending • Receiver window, congestion window or the sender? • Per-connection controls to support workarounds
Example: a hard diagnostic problem • Most symptoms scale with RTT • TCP Buffer Space, Network loss and reordering, etc • On a short path TCP compensates for the flaw • Local Client to Server: all applications work • Including all standard diagnostics • Remote Client to Server: all applications fail • Leading to faulty implication of other components • This is the essence of the “End-to-end Problem”
How extended TCP statistics can help • Without TCP instrumentation • Symptoms are reduced on short sections of long paths • Nearly all diagnostics yield a false pass on short paths • With TCP ESTATS • Measure key properties of a short section of the path • Extrapolate to the full path to pass judgment • Tools get more sensitive as you test shorter sections • Example uses Web100.org instrumented TCP • Target is a simple TCP discard service
Example diagnostic tool output End-to-end goal: 4 Mb/s over a 200 ms path including this sectionTester at IP address: xxx.xxx.115.170 Target at IP address: xxx.xxx.247.109Warning: TCP connection is not using SACKFail: Received window scale is 0, it should be 2.Diagnosis: TCP on the test target is not properly configured for this path.> See TCP tuning instructions at http://www.psc.edu/networking/perf_tune.htmlPass data rate check: maximum data rate was 4.784178 Mb/sFail: loss event rate: 0.025248% (3960 pkts between loss events)Diagnosis: there is too much background (non-congested) packet loss.The events averaged 1.750000 losses each, for a total loss rate of 0.0441836%FYI: To get 4 Mb/s with a 1448 byte MSS on a 200 ms path the total end-to-end loss budget is 0.010274% (9733 pkts between losses).Warning: could not measure queue length due to previously reported bottlenecks Diagnosis: there is a bottleneck in the tester itself or test target (e.g insufficient buffer space or too much CPU load)> Correct previously identified TCP configuration problems> Localize all path problems by testing progressively smaller sections of the full path.FYI: This path may pass with a less strenuous application: Try rate=4 Mb/s, rtt=106 ms Or if you can raise the MTU: Try rate=4 Mb/s, rtt=662 ms, mtu=9000Some events in this run were not completely diagnosed.
Changes with -06 draft • Overhauled listen table • Designed to instrument generic SYN flood defenses • Restructured per connection tables • Required “perf” table • Expose TCP state variables (no memory footprint) • Basic performance instrumentation • First tier diagnostic instrumentation • 3 optional tables for more detailed diagnosis • Path (loss, reordering, duplication, etc) • Stack (impact and state of control algorithms) • Application (Is the data motion timely?) • 1 table of writeable controls for workarounds • LimCwnd, LinRwnd, and LimSsthresh • Cleanup descriptions, references
Open Technical Issues • Too much required stuff? • There may be further juggling between tables • The proper SMI type for “Duration”, we want: • microsecond resolution for short flows • days (or months?) scale for exit stats • meaningful deltas at all scales • What have we forgotten?
Next steps • Last call for input from implementers, researchers • MIB doctor review • WG last call sometime this summer