360 likes | 471 Views
Improved BGP convergence via Ghost Flushing. Yehuda Afek. Anat Bremler-Barr Shemer Schwarzd המרכז הבינתחומי הרצליה. Problem: BGP Convergence. [ Labovitz,Ahuja,Bose,Jahanian ] BGP may take up to 15 minutes to converge.
E N D
Improved BGP convergence via Ghost Flushing Yehuda Afek Anat Bremler-Barr ShemerSchwarzd המרכז הבינתחומי הרצליה
Problem: BGP Convergence • [Labovitz,Ahuja,Bose,Jahanian] BGP may take up to 15 minutes to converge. • Here: Reduce the worst case from minutes to seconds, in a practical way
Problem: BGP Convergence • [Labovitz,Ahuja,Bose,Jahanian] Events Time (sec’s, minRouteAdver=30) • E-Down 30•n n 10,000, up to 15 minutes • E-Up 30•d d 30, d=diameter • E-Longer 2•30•l l == path length • E-Shorter 30•d • Here: E-down = l time units (unit = link delay) • E-Longer = 30•d
Agenda • BGP overview • The BGP convergence problem • Ghost buster rule • Ghost flushing rule • Simulation results
BGP protocol • Distance (Path) vector protocol • Receive AS-path from the neighbors • Chooses the best one (shortest) • Eliminates Routing loops using the AS-path • Two kinds of messages: Announcements and Withdrawal
Problem: Ghost information One Ghost (old information) makes many, and in the network it continues recursively dst: 0 dst: 0 2 4 1 3 dst: 0 dst: 0 withdraw 0 dst: 0 t=0 dst
Problem: Ghost information One Ghost (old information) makes many, and in the network it continues recursively dst: 1 0 dst: 1 0 2 4 annc:1 0 annc:1 0 dst: 2 0 1 3 dst: 1 0 annc:2 0 annc:1 0 0 dst: {} t=1 dst
Problem: Ghost information One Ghost (old information) makes many, and in the network it continues recursively dst: 3 1 0 dst: 1 2 0 2 4 dst: {} 1 3 dst: 1 2 0 withdraw 0 dst: {} t=2 dst
Problem: Ghost information minRouteAdver: Wait 30 sec’s before sending the next announcement (BGP) annc: 3 1 0 annc: 2 1 0 annc: 2 1 0 One Ghost (old information) makes many, and in the network it continues recursively dst: 3 1 0 dst: 2 1 0 2 4 dst: {} 1 3 dst: 2 1 0 0 dst: {} t=17 t=3 t=4 t=5 t=6 t=28 t=24 t=20 t=31 t=16 t=14 t=11 t=10 t=8 t=7 t=27 t=22 dst
E_Down convergence In the clique (size 4) example the scenario ends after 62 sec (= 30(n-2) )
Without MinRouteAdver • Avalanche of Messages O(n!) • Explore all possible paths of length 1, 2 … dst: 0 dst: 0 2 4 1 : 1 0 2 : 2 0 3 : 3 0 1 : 1 0 3 : 3 0 4 : 4 0 dst: 0 1 3 dst: 0 1 : 1 0 2 : 2 0 4 : 4 0 2 : 2 0 3 : 3 0 4 : 4 0 withdrawal 0 dst: {} t=0 dst
Without MinRouteAdver • Avalanche of Messages O(n!) • Explore all possible paths of length 1, 2 … dst: 1 0 dst: 1 0 2 4 1 : 1 0 3 : 3 0 4 : 4 0 1 : 1 0 3 : 3 0 4 : 4 0 annc: 1 0 annc:1 0 dst: 2 0 1 3 dst: 1 0 1 : 1 0 3 : 3 0 4 : 4 0 2 : 2 0 3 : 3 0 4 : 4 0 annc: 2 0 annc: 1 0 0 dst: {} t=0.1 dst
Without MinRouteAdver • Avalanche of Messages O(n!) • Explore all possible paths of length 1, 2 … dst: 3 0 dst: 2 0 2 4 1 : 1 2 0 2 : 2 0 3 : 3 0 1 : 1 2 0 3 : 3 0 4 : 4 0 annc:2 0 annc:3 0 dst: 20 1 3 dst: 2 0 annc: 2 0 annc:2 0 1 : 1 2 0 2 : 2 0 4 : 4 0 2 : 2 0 3 : 3 0 4 : 4 0 0 dst: {} t=0.2 dst
Without MinRouteAdver • Avalanche of Messages O(n!) • Explore all possible paths of length 2, 3 … dst: 3 0 dst: 3 0 1 : 1 2 0 3 : 3 0 4 : 4 0 2 4 1 : 1 2 0 2 : 2 1 0 3 : 3 0 annc:3 0 annc:3 0 dst: 3 0 1 3 dst: 4 0 annc:3 0 annc:4 0 1 : 1 2 0 2 : 2 1 0 4 : 4 0 2 : 2 3 0 3 : 3 0 4 : 4 0 0 dst: {} t=0.3 dst
Without MinRouteAdver • Avalanche of Messages O(n!) • Explore all possible paths of length 2, 3 … dst: 4 0 dst: 1 2 0 1 : 1 2 0 3 : 3 1 0 4 : 4 0 2 4 1 : 1 2 0 2 : 2 1 0 3 : 3 1 0 annc: 1 2 0 annc:4 0 dst: 4 0 1 3 dst: 4 0 annc:4 0 annc:4 0 1 : 1 2 0 2 : 2 1 0 4 : 4 0 2 : 2 3 0 3 : 3 1 0 4 : 4 0 0 dst: {} t=0.4 dst
Related Work • Introducing the problem [Labovitz,Ahuja,Bose,Jahanian], [Labovitz,Wattenhofer,Venkatachary,Ahuja] • real life evidence • theoretical analysis • Experimental analysis [Griffin,Premore] • Solution • Works in Counting to Infinity: • Adding states [Garcia-Luna-Aceves] – EIGRP like… • Route Poisoning with Hold-down [Cisco:Rutgers]– IGRP like... • Routes consistency [Pei,Zhao,Wang,Massey,Mankin,Wu,Zhang]
Ghost flushing rule • If ASpath to dst is longer and cannot send annoucement (due to minRouteAdver rule ) then send withdrawal • Motivation: Flush the ghost information ASAP
Ghost Flushing example dst: 0 dst: 0 2 4 1 3 dst: 0 dst: 0 withdraw 0 dst: 0 t=0 dst
Ghost Flushing example dst: 1 0 dst: 1 0 2 4 annc:1 0 annc:1 0 dst: 2 0 1 3 dst: 1 0 annc:2 0 annc:1 0 0 dst: {} t=1 dst
Ghost Flushing example withdraw withdraw withdraw Longer ASpath & minRouteAdver timer Send “flushing” withdrawal dst: 3 1 0 dst: 1 2 0 2 4 dst: {} 1 3 dst: 1 2 0 withdraw 0 dst: {} t=2 dst
Ghost Flushing example dst: {} dst: {} 2 4 withdraw withdraw dst: {} 1 3 dst: {} withdraw 0 dst: {} t=3 dst
Analysis: Time convergence of ghost flushing rule, E_down • In each time unit (=h, maximum link delay), ghost information is erased to a distance greater by one • After k time units, ghost information ASpath with length < k has disappeared. • Longest Ghost ASpath = n (in theory). • Hence (worst case) time convergence: nh
Ghost Buster Rule • The convergence time is better than expected !!!! • Explanation: The minRouteAdver blocks the propagation of ghost information, while the flushing withdrawal “eats” the ghost information. • Bad (wrong) news propagate slowly
Analysis: Ghost buster rule • Add to the ghost flushing rule: • Router sends announcement, only after delta time • MinRouteAdver similar to delta: • Common implementation: MinRouteAdver per peer • And, timer almost always on (lots of BGP announcements !)
Analysis: Time convergence of ghost buster rule • The ghost information disappears at time t: d+t/(delta+h) = t/h • Every delta+h time the length of the maximum ghost ASpath is increased by one. • Every h time, the length of the minimum ghost ASpath is increased by one. • After the failure the length of the maximum ghost ASpath is d (diameter). • Hence: t = kdh/(k-1) d, where k = (delta+h)/h is the rate of the algorithm
The effect on E_longer 7 2 4 1 3 6 0 5 dst • BGP: Convergence time dominated: • Time until ghost information vanishes • Time until backup path propagates in • Ghost flushing: helps the first factor
The effect on E_longer • Original BGP may err: • MinRouteAdver peer stores wrong ASPath • BGP may err and send the packet in the wrong direction • Ghost flushing: send withdrawal to a peer. Perhaps by a chance there may be an alternative path there.
Simulation: BGP code • Shortest path metric • Delay on link between 0.2 to 2 sec • MinRouteAdver randomly in 0 to 30 sec
Simulation: ISP topology 9 4 8 5 1 7 3 dst
Example: Core Internet (ASes) Out-degree In-degree BGP Ghost Flushing 1 45 10 963 22 2 52 17 898 51 3 3 4 1031 36 4 112 27 1017 50 5 61 11 1034 36 6 20 24 920 33 7 1 6 2 2.5 8 18 2 1111 54 9 1 1111 981 62 10 1 98 4 5.1
E_longer: Convergence Time 7 2 4 1 3 6 0 5 dst
Conclusion • Reduced convergence time from minutes to sec’s. • Does not hurt in other cases • Ghost flushing - no change at BGP messages • Ghost buster solution – a new counting to infinity solution • BGP very sensitive to minor modifications.