450 likes | 595 Views
Internet Monitoring - Results. Les Cottrell SLAC < cottrell@slac.stanford.edu> Presented at the ICFA Meeting, CERN, Mar 1998 Partially funded by MICS joint SLAC/LBL proposal on Internet End-to-end Performance Monitoring (IEPM). Outline of Talk.
E N D
Internet Monitoring - Results Les Cottrell SLAC <cottrell@slac.stanford.edu> Presented at the ICFA Meeting, CERN, Mar 1998 Partially funded by MICS joint SLAC/LBL proposal on Internet End-to-end Performance Monitoring (IEPM) \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Outline of Talk • What, why & how are we (ESnet/HENP community) measuring? • What PingER measurement reports are available and what do they show • (short), intermediate & long term • grouping and multi-site visualization • Traffic volume & Traceroute measurements • Summary • Deployment/development, Internet Performance, Next Steps • Collaborations • NIMI/IPWT \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Why go to the effort? • Apparent quality of Internet getting worse as size and demands increase • Internet woefully under-measured & under-instrumented • Internet very diverse - no single path typical • Users need: • realistic expectations, planning information • guidelines for setting and validating SLAs • information to help in identifying problems • help to decide where to apply resources \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Importance of Response Time • Time is scarcest and most valuable commodity • Studies in late 70’s and early 80s showed the economic value of Rapid Response Time • 0-0.4s High productivity interactive response • 0.4-2s Fully interactive regime • 2-12s Sporadically interactive regime • 12s-600s Break in contact regime • >600s Batch regime • Threshold around 4-5s complaints increase rapidly. • Voice has threshold around 100ms \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Perception of Poor Packet Loss • Above 4-6% packet loss video conferencing becomes irritating, and non native language speakers become unable to communicate. • The occurrence of long delays of 4 seconds or more at a frequency of 4-5% or more is also irritating for interactive activities such as telnet and X windows. • Above 10-12% packet loss there is an unacceptable level of back to back loss of packets and extremely long timeouts, connections start to get broken, and video conferencing is unusable. \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Our Main Metric is Ping • “Universally available”, easy to understand • no software for clients to install • Low network impact • Provides useful real world measures of loss, response time, reachability, unpredictability \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Ping Response vs Web Response 1/2 HTTP GET Response (ms) Minimum Ping Response (ms) \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Ping Response vs Web Response 2/2 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Ranked packet loss for 3 months Stanford Rome UK Cincinnatti \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Sawtooth Effect 2 * capacity (+ 2Mbps) Added 45 Mbps (quadrupled capacity) 3 * capacity + 9 Mbps Holidays \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
RAL Last 180 Days plot Lines are simply cubic splines fits to aid eye Upper green and black points are response time in ms Red & blue are weekday loss Cyan are weekend loss Note weekend/weekday differences (cyan vs blue) Note Xmas/New Year lull Also note quick onset of saturation at end August & September \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Italian sites look similar to each other \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Representative International HENP Site Loss Jan-95 thru Nov-97 • Note RL (UK) saw-tooths as add UK-US bandwidth (Apr-96, Feb-97, Aug-97) \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Aggregation • Group measurements, for example: • by area (e.g. N. America E, N. America E, W. Europe/Japan, others, by country) • trans-oceanic links, intercontinental links • separation e.g. number of hops, time zones crossed, IXPs crossed • ISP (ESnet, vBNS/I2, ...) • by monitoring site • one site seen from multiple sites • common interest/affiliation (XIWT, HENP …) • user selectable \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Group Selection (all sites monitoring CERN) Select one of these groups CMU CMU CNAF RL FNAL SLAC DESY DESY Carelton RMKI RMKI CERN KEK \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Group Response Time Jan-95 Nov-97 • Improved between 1 and 2.5% / month • Response & Loss similar improvements • care with new sites \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Network Quiescence • Frequency of zero packet loss (for all time - not cut on prime time) \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Ping Loss Quality • Want quick to grasp indicator of link quality • Loss is the most sensitive indicator • loss of packet requires ~ 4 sec TCP retry timeout • Studies on economic value of response time by IBM showed there is a threshold around 4-5secs where complaints increase. • 0-1% = Good 1-2.5% = Acceptable • 2.5%-5% = Poor 5%-12% = Very Poor • > 12% = Bad \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Quality Distributions • ESnet median good quality • All other groups poor or very poor • Critical to have good peering \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Multi Collection Site Visualization Collection Sites Remote Sites \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Intercontinental Grouping (Loss) • Move mouse over ? to see # links Looks pretty bad for intercontinental use \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Top Level Domain Grouping (Loss) Mouseover red dots gives more information on TLD (e.g. ch=Switzerland) Diagonals are within TLD \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
TLD (Response Time) \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Grouping Details Select metric Select group Sort Color for quality Also provides Excel for DIY at bottom \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Recent Transoceanic trends \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
By Monitoring Site \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
CERN Monitoring TLDs \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
ESnet bytes accepted by site for Jan ‘98 Exchanges LBL/ESnet \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
US HENP Traffic Growth Exponential growth from 3-6% \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Multi Router Traffic Grapher (MRTG) CERN-US E1(2Mbps) link Added 2nd 2Mbps link \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Traffic Volume for Germany (DFN) DFN T1 Utilization 15 Jan ‘98 (5 min averages) Green = to US Blue = from US DFN T1 Utilization for 15 Jan ‘98 (5 min averages) # of 2 min periods in Dec-96 with peak utilization > y % From US # Samples \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt To US
Capacity/Load Ratios • Looking at the link capacity/average load • Most ESnet links show ratios of a few to several tens • The international links (CERN-Perryman (~4), DFN (~5), Italy (~4), KEK (~10), Canada (15)) show ratios of 4-15 • The worst link appears to be the MAE-W-ESnet link at about 1.5 ratio • However this may not be the bottleneck link \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Bottlenecks • Identification • Traceroute • from/to multiple sites can identify common path segments in the maps • Can see onset of losses with traceping • Pathchar can identify bottlenecks • Then need to work on: • avoiding bottlenecks (new peering) • getting bottleneck owners to improve • this is difficult, lots of potential bottlenecks, bottlenecks move, not under our control \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
TracePing (Oxford) Muliple routes seen \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Traceroute From TRIUMF • Reverse traceroute servers • Traceping • TopologyMap • Ellipses show node on route • Open ellipse is measurement node • Blue ellipse no reachable • Keeping history \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
GUI Traceroute (e.g. VisualRoute) \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Summary • Deployment Development • ESnet/HENP has 14 Collection sites in 8 countries collecting data on > 500 links involving 22 countries • XIWT/IPWT deployed ~ 10 collection sites using PingER tools • 600MB/month/link, 6 bps/link, .25 FTE @ analysis site, 1.5-2.5 FTE on analysis • HEPNRC gathering, archiving • Long term reports being ported to HEPNRC from SLAC • Long term analysis today usually requires tool like SAS \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Summary • Deployment Development • Internet Performance • Performance within ESnet is good • Performance between ESnet & other sites is poor to very poor on average • one of main causes is congestion points, so peering is critical • Intercontinental performance is very poor to bad • ESnet traffic accepted from major HENP labs growing by 3-6% per month • Response time improving by 1-2% / month • Packet loss improving between SLAC & other sites by 3% / month \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Summary • Deployment Development • Internet Performance (continued): • Links to sites outside N. America vary from good (KEK) to bad • Some of the bad sites are to be expected, e.g. FSU, China, Czeck Republic, some surprises such as UK • CERN, France, Germany acceptable to poor \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Summary • Deployment Development • Internet Performance • Next Steps • Improve tools: • Make long term reports at Analysis site available & understandable • Look into prediction (extrapolations, develop models, configure and validate with data) • Pursue IETF Surveyor & NIMI deployment \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
National Internet Measurement Infrastructure (NIMI) • Secure, scalable infrastructure for scheduling monitoring, gathering data • Minimal amount of human intervention • Inexpensive probe built on PC FreeBSD platform • Dynamic - can add/modify measurement suites, initially includes: • Traceroute • TReno - measures bulk transfer thruput • Poip - one way ping \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Asymmetric One-way Delays 20% U Chicago to Advanced Advanced to U Chicago Loss Loss 0% 300ms Delay Delay 0ms 0 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt 24
NIMI • Deployed at PSC, LBL, FNAL, platforms being configured at SLAC & CERN • As NIMI becomes more real will start to use as infrastructure for IPPM Surveyors • Security • allows full policy control over any box you own or delegation of all or subsets • uses ACLs with authentication for requests, and encryption to prevent sniffing \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
Summary • Deployment Development • Internet Performance • Next Steps • Lots of collaboration: • SLAC & HEPNRC • 14 collection sites, ~ 400 remote sites • Collection site tools CERN & CNAF/ICFA • Oxford/TracePing • MapPing/MAPNet/NLANR • TRIUMF Traceroute topology Map • NIMI/LBNL & Surveyor/IETF • XIWT/IPWT • Talks at IETF, XIWT, ICFA, ESCC ... \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt
More Information • ICFA Monitoring WG home page (links to status report, meeting notes, how to access data, and code) • http://www.slac.stanford.edu/xorg/icfa/ntf/home.html • WAN Monitoring at SLAC has lots of links • http://www.slac.stanford.edu/comp/net/wan-mon.html • Tutorial on WAN Monitoring • http://www.slac.stanford.edu/comp/net/wan-mon/tutorial.html • MapPing Tool: • http://www.slac.stanford.edu/~warrenm/work/java/newjava/mapping.html • NIMI http://www.psc.edu/~mahdavi/nimi_paper/NIMI.html \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt