180 likes | 341 Views
Traffic-aware Inter-Domain Routing for Improved Internet Routing Stability. Zhenhai Duan Florida State University. Outline. Introduction and Background Motivation and Intuition Traffic-Aware Inter-Domain Routing (TIDR) Performance Studies Summary. Introduction and Background.
E N D
Traffic-aware Inter-Domain Routing for Improved Internet Routing Stability Zhenhai Duan Florida State University
Outline • Introduction and Background • Motivation and Intuition • Traffic-Aware Inter-Domain Routing (TIDR) • Performance Studies • Summary
Introduction and Background • Internet consists of large number of network domains • Or Autonomous Systems (ASes) • Currently about 26K • Exchange network prefix reachability information using BGP • In a system this big, things happen all the time • Fiber cuts, equipment outages, operator errors • Direct consequence on routing system • Large number of BGP updates exchanged between ASes • Re-computing/propagating best routes • Events may propagated through entire Internet • Effects on user-perceived network performance • Long network delay, packet loss, even loss of network connectivity
Introduction and Background • Implicit design assumption in BGP • Failure events of same importance to all users • No explicit mechanisms to localize failure in BGP • Internet global reachability == global propagation of failure • Is this valid? • A user (AS) in US may not be interested in failure in Asian country • Design of BGP failed to recognize two Internet properties • Internet access non-uniformity • Prevalence of transient failures
Motivation and Intuition • Internet access non-uniformity • APRANET(1970, Kleinrok and Naylor) • Top 12.6% responsible for 90% of traffic • NSFNET(1980,Rekhter and Chinoy) • Top 10% responsible for 85% of traffic • Fang and Peterson (1999), and Rexford(2002) • Non-uniform distribution nature of Internet traffic • Model on network value [IEEE/SPECTRUM2006] • Zipf’s law
Internet Access Non-Uniformity • FSU Study • Study if Internet access locality holds from viewpoint of edge network • Bidirectional data traffic collected at border router at FSU for 16 days
BGP Updates (RouteViews Project) Most of updates are from rest of the prefixes Only a few updates are related to top prefixes at FSU
Motivation and Intuition • Prevalence of transient failures • Sprint backbone measurement (2002) • BGP misconfigurations • 50% misconfigurations lasted less than 10 minutes • 50% < 1 minute • 80% < 10 minutes • 90% < 20 minutes Majority of network failures are transient
Motivation and Intuition TIDR Prevalence of Transient Failure Majority of the network failures on the Internet are transient Internet Access Non-Uniformity Users (networks) normally communicates with small set of other network domains
Insignificant v Significant Traffic-aware Inter-Domain Routing (TIDR) • Prefix classified into either significant or insignificant • At AS v, with respect to neighbor n • Treat differently propagation of sign/insign prefixes • Propagating BGP updates of sign prefixes with high priority • Aggressively slow down propagation of BGP updates of insign prefixes • Localizing effect of transient failures on insign prefixes • Hold propagation of transient failures if valid alternative route exists • BGP withdrawals always propagated n
TIDR Timers Recovery AS 10 MIN. 15/30 SEC. TIDR TIMER MRAI TIMER
TIDR Design • How to avoid traffic black-holes? • If the alternative route that is held by Timer is invalid, node will be the black-hole that drops all the packets that it receives • Utilizing Root Cause Information (RCI) • Similar to EPIC and RCN • flush out all local invalid alternative routes • Alternative route chosen can be guaranteed to be valid • How to avoid slow propagation of long-term failure of insign pref • Every node will hold propagation of BGP update, if not design carefully • Only one node will apply TIDR timer to insign prefixes • Nodes neighboring to failure • First node to have valid alternative route
Performance Studies • Used simBGP simulator • With both clique and Waxman random network topologies • Simulated both link fail-down and fail-over events • Only dummy node announce prefixes • 20% to be significant, 80% to be insignificant • Link failure • 20% to be long-term, 80% to be transient • Settings • Link delay: randomly from 0.01 to 0.1 seconds • Processing delay: randomly from 0.001 to 0.01 seconds • MRAI timer: 30 seconds • TIDR timer: 10 minutes
Summary and On-going Work • TIDR: Traffic-aware Inter-Domain Routing • Capitalizing on two important properties • Internet access non-uniformity • Prevalence of transient failure • Differentiated BGP update propagation for sign and insign prefixes • Propagating updates of sign prefixes with higher priority • Aggressively slow down propagation of updates of insign prefix • Performed simulation studies • Outperforms BGP and other existing enhancements