A survey of Internet infrastructure reliability

A survey of Internet infrastructure reliability Presented by Kundan Singh PhD candidacy exam May 2, 2003

Agenda • Introduction • Routing problems • Route oscillations, slow convergence, scaling, configuration • Reliability via • DNS, transport, application • Effect on VoIP http://www.cs.columbia.edu/~kns10/research/readings/

Overview of Internet routing AT&T (inter-national provider) Regional provider MCI OSPF (optimize path) Autonomous systems Regional provider BGP (policy based) Campus Cable modem provider Campus

1 2 5 4 3 6 7 0 Border gateway protocol • TCP • OPEN, UPDATE, KEEPALIVE, NOTIFICATION • Hierarchical peering relationship • Export • all routes to customers • only customer and local routes to peers and providers • Path-vector • Optimal AS path satisfying policy d: 47 d: 247 d: 247 d: 1247 Provider Customer d Peer Peer d: 31247 e: 3125 . . . Backup [1] A border gateway protocol (BGP-4), RFC 1771

Route selection • Local AS preference • AS path length • Multi-exit discriminator (MED) • Prefer external-BGP over internal-BGP • Use internal routing metrics (e.g., OSPF) • Use identifier as last tie breaker B4 R1 B1 R2 AS1 B3 B2 C2 C1 AS3 AS2 AS4

1 2 0 Route oscillation • Each AS policy independent • Persistent vs transient • Not if distance based • Solution: • Static graph analysis • Policy guidelines • Dynamic “flap” damping [2] Persistent route oscillations in inter-domain routing

Static analysis • Abstract models: • Solvable? • Resilience on link failure? • Multiple solutions? • Sometimes solvable? • Does not work • NP complete • Relies on Internet routing registries [7] An analysis of BGP convergence property

Policy guidelines • MUST • Prefer customer over peer/provider • Have lowest preference for backup path • “avoidance level” increases as path traverses • Works even on failure and consistent with current practice • Limits the policy usage [3] Stable internet routing without global co-ordination [4] Inherently safe backup routing with BGP

IS-IS – millisecond convergence Detect change (hardware, keep-alive) Improved incremental SPF Link “down” immediate, “up” delayed Propagate update before calculate SPF Keep-alive before data packets Detect duplicate updates OSPF stability Sub-second keep-alive Randomization Multiple failures Loss resilience Distance vector Count to infinity Convergence in intra-domain [5] Towards milli-second IGP convergence [6] Stability issues in OSPF routing

0 1 2 R BGP convergence ( R, 1R, 2R) (0R, 1R, R) (0R, R, 2R) [7] An analysis of BGP convergence properties [8] Experimental study of delayed internet routing convergence

0 1 2 R BGP convergence 0->1: 01R 0->2: 01R ( - , 1R, 2R) 2->0: 20R 2->1: 20R 1->0: 10R 1->2: 10R (0R, 1R, - ) (0R, - , 2R)

0 1 2 R BGP convergence ( - , 1R, 2R) 01R 01R 1->0: 10R 1->2: 10R 1->0: 12R 1->2: 12R 2->0: 20R 2->1: 20R 2->0: 21R 2->1: 21R (01R,1R, - ) ( - , - , 2R)

0 1 2 R BGP convergence 0->1: W 0->2: W ( - , - , 2R) 2->0: 20R 2->1: 20R 2->0: 21R 2->1: 21R 2->0: 201R 2->1: 201R 10R 1->0: 12R 1->2: 12R 10R (01R,10R, - ) ( - , - , 2R)

0 1 2 R BGP convergence • MinRouteAdver • To announcements • In 13 steps • Sender side loop detection • One step ( - , - , - ) After 48 steps ( - , - , - ) ( - , - , - )

BGP convergence [2] • Latency due to path exploration • Fail-over latency = 30 n • Where n = longest backup path length • Within 3min, some oscillations up to 15 min • Loss and delay during convergence • “up” converges faster than “down” • Verified using experiment [8] An experimental study of delayed internet routing convergence [9] The impact of internet policy and topology on delayed routing convergence

BGP convergence [3] • Path exploration => latency • More dense peering => more latency • Large providers, better convergence • Most error path due to misconfiguration or software bugs [9] The impact of internet policy and topology on delayed routing convergence

BGP convergence [4] • Route flap damping • To avoid excessive flaps, penalize updated routes • Penalty decays exponentially. • “suppression” and “reuse” threshold • Worsens convergence • Selective damping • Do not penalize if path length keeps increasing • Attach a preference with route [10] Route flap damping exacerbates Internet routing convergence,

1 2 0 R 3 5 BGP convergence [5] • 12R and 235R are inconsistent. Prefer directly learnt 235R • Order of magnitude improvement • Distinguish failure with policy change 12R 2R 235R [11] Improving BGP convergence through consistency assertions

BGP scaling • Full mesh logical connection within an AS • Add hierarchy

BGP scaling [2] • Route reflector • More popular • Upgrade only RR • Confederations • Sub-divide AS • Less updates, sessions [12] A comparison of scaling techniques for BGP

RR C2 RR C1 BGP scaling [3] P • May have loop • If signaling path is not forwarding path • Persistent oscillations possible • Modify to pass multiple route information within an AS Signaling path Choose Q Choose P Logical BGP session Physical link Q [13] On the correctness of IBGP configuration [14] Route oscillations in I-BGP with route reflections

BGP stability • Initial experiment (’96) • 99% redundant updates <= implementation or configuration bug • After bug fixes (97-98) • Well distributed across AS and prefix [15] Internet routing instabilities [16] Experimental study of Internet stability and wide-area backbone failures

BGP stability [2] • Inter-domain experiment (’98) • 9 months, 9GB, 55000 routes, 3 ISP, 15 min filtering • 25-35% routes are 99.99% available • 10% of routes less that 95% available [16] Experimental study of Internet stability and wide-area backbone failures

BGP stability [3] • Failure • More than 50% have MTTF > 15 days, 75% failed in 30 days • Most fail-over/re-route within 2-days (increased since ’94) • Repair • 40% route failure repaired in < 10min, 60% in 30min • Small fraction of routes affect majority of instability • Weekly/daily frequency => congestion possible [16] Experimental study of Internet stability and wide-area backbone failures [24] End-to-end routing behavior in the Internet

BGP stability [4] • Backbone routers • Interface MTTF 40 days • 80% failures resolved in 2 hr • Maintenance, power and PSTN are major cause for outages (approx 16% each) • Overall uptime of 99% • Popular destinations • Quite robust • Average duration is less than 20s => due to convergence [16] Experimental study of Internet stability and wide-area backbone failures [17] BGP routing stability of popular destinations

Congestion Prioritize routing control messages over data Routing table size AS count, prefix length, multi-home, NAT Effects: Number of updates; convergence Configuration, no universal filter Real routers “malloc” failure Cascading effect Prefix limiting option Graceful restart CodeRed/Nimda Quite robust Some features get activated during stress Cascading failures BGP under stress [18] Routing Stability in Congested Networks: Experimentation and Analysis [19] Analyzing the Internet BGP routing table [20] An empirical study of router response to large BGP routing table load [21] Observation and analysis of BGP behavior under stress [22] Network Resilience: Exploring Cascading Failures within BGP

BGP misconfiguration • Failure to summarize, hijack, advertise internal prefix, or policy. • 200-1200 prefix each day • ¾ of new advertisement as a result • 4% prefix affect connectivity • Cause • Initialization bug (22%), reliance on upstream filtering (14%), from IGP (32%) • Bad ACL (34%), prefix based (8%) • Conclusion • user interface, authentication, consistency verification, transaction semantics for command [23] Understanding BGP misconfiguration

Reactive routing • Resilient overlay network • Detect failure (outages, loss) and reroute • Application control of metric, expressive policy • Scalability suffers • Failure often, everywhere • 90% of failure last 15min, 70% less than 5min, median is just over 3min • Many near edge, inside AS • Helps in case of multi-homing • Failures in core more related with BGP [26] Resilient overlay networks [27] Measuring the effect of Internet path faults on reactive routing

Reliable multicast • Reliable, sequenced, loosely synchronized • Existing TCP • ACK aggregation • Local recovery possible • Performance • Linux-2.0.x • BSD packet filter, IP firewall and raw socket [30] IRMA: A reliable multicast architecture for the Internet

Transport layer fail-over • Server fail-over • Front-end bottleneck or forge IP address • Migrate TCP • Works for static data (http pages) • Needs application stream mapping • Implemented in Apache 1.3 • Huge overhead for short service [29] Fine grained failover using connection migration

DNS performance • Low TTL • Latency grows by 2 orders • Client and local name server may be distant • Embedded object • 23% no answer, 13% failure answer • 27% sent to root server failed • TTL as low as 10min • Share DNS cache by < 10-20 clients [33] On the effectiveness of DNS-based server selection [32] DNS performance and effectiveness of caching

RNS RNS RNS RNS DNS replication • Replicate entire DNS in distributed servers AS AS AS AS AS Network AS AS AS AS AS AS [31] A replicated architecture for the domain name system

Reliable server pooling [34] Architecture for reliable server pooling [35] Requirements for reliable server pooling [36] Comparison of protocols for reliable server pooling

Switch vendors aim for 99.999% availability Network availability varies (domestic US calls > 99.9%) Study in ‘97 Overload caused 44% customer-minutes Mostly short outages Human error caused 50% outages Software only 14% No convergence problem PSTN failures [37] Sources of failures in PSTN

Backbone links underutilized Tier-1 backbone (Sprint) have good delay, loss characteristics. Average scattered loss .19% (mostly single packet loss, use FEC) 99.9% probes have <33ms delay Most burst loss due to routing problem Mean opinion score: 4.34 out of 5 Customer sites have more problems Internet backbone Can be provided by some ISP But many lead to poor performance Adaptive delay is needed for bad paths Mostly due to reliability and router operation, not traffic load Choice of audio codec VoIP [28] Understanding traffic dynamics at a backbone POP [39] Impact of link failures on VoIP performance [38] Assessing the quality of voice communications over Internet backbones

VoIP [2] • Prevalent but not persistent path • Very asymmetric loss; bursty • Outages = more than 300ms loss • More than 23% losses are outages • Outages are similar for different networks • Call abortion due to poor quality • Net availability = 98% [25] Measurement and interpretation of Internet packet loss [41] Assessment of VoIP service availability in the current Internet

Future work • End system and higher layer protocol reliability and availability • Mechanism to reduce effect of outages in VoIP • Redundancy of VoIP systems during outages • Convergence and scaling of TRIP, which is similar to BGP • Scaling (DNS) + Reliable (server pool)

A survey of Internet infrastructure reliability