470 likes | 574 Views
How is the Internet Performing?. Les Cottrell – SLAC Lecture # 2 presented at the 26 th International Nathiagali Summer College on Physics and Contemporary Needs, 25 th June – 14 th July, Nathiagali, Pakistan.
E N D
How is the Internet Performing? Les Cottrell – SLAC Lecture # 2 presented at the 26th International Nathiagali Summer College on Physics and Contemporary Needs, 25th June – 14th July, Nathiagali, Pakistan Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), also supported by IUPAP
Overview • Internet characteristics • packet sizes, protocols, hops, hosts … • complexity, flows, applications • Application requirements • How the Internet worldwide is performing as seen by various measurements and metrics • How well are requirements met? • Many sources of measurements Matrix Surveyor CAIDA/Skitter PingER/IEPM
Packet size • primarily 3 sizes: • close to minimum=telnet and ACKs, 1500 (max Ethernet payload, e.g. FTP, HTTP); ~ 560Bytes for TCP implementations not using max transmission unit discovery Mean ~ 420Bytes, median ~ 80Bytes Measured Feb 2000 at Ames Internet eXchange Packets ~ 84M packets, < 0.05% fragmented Cu,mulative probability % Bytes Packet size (bytes)
Internet protocol use • There are 3 main protocols in use on the Internet: • UDP (connectionless datagrams, best effort delivery), • TCP (Connection oriented, “guaranteed” delivery) • ICMP (Control Message protocol) TCP dominates today SLAC protocol flows ICMP In TCP Flows/10min UDP Out Time Feb-May 2001
Web use characteristics • Size of web objects varies from site to site, server to server and by time of day. • Typical medians vary from 1500 to 4000 bytes • Also varies by object type, e.g. medians for • movies few 100KB to MBs, postscript & audio few 100KB • text, html, applets and images few thousand KB Big peaks for error messages Bytes
Hops • Hop counts seen from 4 Skitter sites (Japan, S. Cal, N. Cal, E. Canada, i.e. 10-15 hops on average Weak RTT dependence on hop count 95% RTT 50% 5% Hops Hop Count
Autonomous Systems (AS) Disperson • Color indicates the AS responsible for the router at the hop, height is number of probes for that route • Seen by Skitter at Palo Alto US (F root name server) Hop number
Country dispersion • Seen from Japan • After 3 to 4 hops most goes to US. • In some cases goes US & back to jp • Some goes to UK & onto other European countries Probes Hops
Route maps • Simple routes from TRIUMF, Canada to several sites already gets quite complex TRIUMF DESY SLAC UW CERN FNAL KEK
Getting more complex • PingER Beacon sites in US seen from TRIUMF, Vancouver (from Andrew Daviel, TRIUMF)
Connections by country NL Unknown IT RU US UK JP DE
Richness of connectivity • Angle = longitude of AS HQ in whois records • Radius=1-log(outdegree(AS)+1)/(maxoutdegree + 1) • Outdegree = number of next Hops As’ accepting traffic • Deeper blue & red more connections • All except 1 of top 15 AS’ are in US, exception in Canada • Few links between ISPs in Europe and Asia
Hosts by regions • Jan 2001, 109 Million hosts • Source: Internet Software Consortium (www.isc.org) • see web site also for hosts/population • Notes: • Many .com are in N. America • S. Asia = in (36K), pk (6K), lk, bd • E. Asia=jp, cn, my, sg, tw, hk, th, id, bn, mm • Mid East=il, kw, lb, ae, tr, sa • TLDs with hosts~238 • Total TLDs~258
Backbone utilization Shows utilization of I2/Abilene backbone links, NB Backbone < 30% loaded Most losses at exchange points & edges
Flow sizes SNMP Real A/V AFS file server Heavy tailed, in ~ out, UDP flows shorter than TCP, packet~bytes 75% TCP-in < 5kBytes, 75% TCP-out < 1.5kBytes (<10pkts) UDP 80% < 600Bytes (75% < 3 pkts), ~10 * more TCP than UDP Top UDP = AFS (>55%), Real(~25%), SNMP(~1.4%)
Flow lengths • 60% of TCP flows less than 1 second • Would expect TCP streams longer lived • But 60% of UDP flows over 10 seconds, maybe due to heavy use of AFS at SLAC • Another (CAIDA) study indicates UDP flows are shorter than TCP flows Measured by Netflow flows tied off at 30 mins TCP outbound flows Active time in secs
Typical Internet traffic by Application • CERFnet link • Dominated by WWW (http) Mail WWW FTP RealAudio
SLAC Traffic profile SLAC offsite links: OC3 to ESnet, 1Gbps to Stanford U & thence OC12 to I2 OC48 to NTON Profile bulk-data xfer dominates HTTP Mbps in iperf 2 Days Last 6 months Mbps out SSH FTP bbftp
SLAC Internet Application usage Ames IXP: approximately 60-65% was HTTP, about 13% was NNTP Uwisc: 34% HTTP, 24% FTP, 13% Napster
What does performance depend on? • End-to end internet performance seen by applications depends on: • round trip times • packet loss • jitter • reachability • bottleneck bandwidth • implementation/configurations • application requirements • Data transmitted in packets
Application requirements • Based on ITU Y1541 • The VoIP loss of 10^-3 used to be 0.25 but that assumed random flat loss • actual loss is often bursty • Tail drop in routers • Sync loss in circuits, bridge spanning tree reconfiguration, route changes
RTT from ESnet to Groups of Sites RTT ~ distance/(0.6*c) + hops * router delay Router delay = queuing + clocking in & out + processing ITU G.114 300 ms RTT limit for voice 20%/year
RTT Region to Region OK White 0-64ms Green 64-128ms Yellow 128-256ms NOT OK Pink 256-512ms Red > 512ms OK within regions, N. America OK with Europe, Japan
RTT from California to world Europe E. Coast Brazil E. Coast US W. Coast US 300ms RTT (ms) Europe & S. America 0.3*0.6c Longitude (degrees) 300ms Frequency Source = Palo Alto CA, W. Coast RTT (ms.) Data from CAIDA Skitter project
RTT from Japan to world RTT(ms) Longitude Seen from Japan
Cumulative RTT distributions • Gives quality measure • Seen from San Diego, US Skitter • Steeper = less jitter, i.e. better • Small values better Cumulative % RTT ms
Routes are not symmetric Advanced to U. Chicago • Min, 50% & 90% RTT measured by Surveyor • Notice big differences in RTTs • May be due to different paths in the 2 directions or to different loading RTT ms U. Chicago to Advanced RTT ms
Loss seen from US to groups of Sites 50% improvement / year ETSI DTR/TIPHON-05001 V1.2.5 threshold for good speech
Detailed example of improvements Increase of bandwidth by factor of 460 in 6 years, more than kept pace - factor of 50 times improvement in loss Note valleys when students on vacation
Loss to world from US Using year 2000, fraction of world’s population/country from www.nua.ie/surveys/how_many_online/
How are the U.S. Nets doing? In general performance is good (i.e. <= 1%) ESnet holding steady, still better than others Edu (vBNS/Abilene) & .com improving
Losses for 28 days in May 2001 • Measured by MIDS to 583 DNS services, 383 Web services, 1367 Internet (ping) hosts, & 1225 ISPs (routers) DNS % Loss WWW Internet ISP
Bulk throughput • Important for long TCP flows where we want to copy large amounts of data from one site to another in a relatively short time, e.g. file transfer • Depends on RTT, loss, timeouts, window sizes
Throughput quality TCPBW < 1/(RTT*sqrt(loss)) Note E. Europe catching up Macroscopic Behavior of the TCP Congestion Avoidance Algorithm, Matthis, Semke, Mahdavi, Ott, Computer Communication Review 27(3), July 1997
Throughput also depends on window ACK • Optimal window size depends on: • Bandwidth end to end, i.e. min(BWlinks) AKA bottleneck bandwidth • Round Trip Time (RTT) • For TCP keep pipe full • Window (sometime called pipe) ~ RTT*BW • Can increase bandwidth by orders of magnitude • If no loss Throughput ~ Window/RTT Src Rcv t = bits in packet/link speed RTT
“Jitter” from N. America to W. Europe “Jitter” = IQR(ipdv), where ipdv(i) =RTT(i) – RTT(i-1) 214 pairs ETSI: DTR/TIPHON-05001 V1.2.5 (1998-09) good speech < 75ms jitter
“Jitter” between regions ETSI: DTR/TIPHON-05001 V1.2.5 (1998-09) 125ms=Med 225ms=Poor 75ms=Good Jitter varies with loading
SLAC-CERNJitter ETSI/TIPHON delay jitter threshold (75 ms)
Reachability But what about reachability Within N. America, & W. Europe loss, RTT and jitter is acceptable for VoIP
Reachability – Outage Probability Surveyor probes randomly 2/second Measure time (Outage length) consecutive probes don’t get through Heavy tailed outage lengths (packet loss not Poisson) http://www-iepm.slac.stanford.edu/monitoring/surveyor/outage.html
Europe seen from U.S. Monitor site Beacon site (~10% sites) HENP country Not HENP Not HENP & not monitored 200 ms 650ms 1% loss 7% loss 10% loss
Asia seen from U.S. 10% loss 3.6% loss 0.1% loss 250ms 640 ms 450 ms
Latin America, Africa & Australasia 4% Loss 170 ms 220 ms 700ms 2% Loss 350 ms
Animated monthly 2000 20% loss Big is Bad 200ms RTT 20% unreachable
More Information • IEEE Communications, May 2000, Vol 38, No 5, pp 120-159 • IEPM/PingER home site • www-iepm.slac.stanford.edu/ • CAIDA/Skitter home site • www.caida.org/home/ • Matrix Net home site • www.matrix.net/index.html • Surveyor home site: • www.advanced.org/csg-ippm/