230 likes | 490 Views
Avoiding Instability during Graceful Shutdown of OSPF. Aman Shaikh, UCSC Joint work with Rohit Dube, Xebeo Communications Inc. Anujan Varma, UCSC INFOCOM – June 2002. Software Upgrade is a Pain. Upgrade of routing software on routers is a fact of life
E N D
Avoiding Instability during Graceful Shutdown of OSPF Aman Shaikh, UCSC Joint work with Rohit Dube, Xebeo Communications Inc. Anujan Varma, UCSC INFOCOM – June 2002 INFOCOM 2002
Software Upgrade is a Pain • Upgrade of routing software on routers is a fact of life • Extensions to routing protocols, new functionality, version upgrades, bug fixes • Critical need for seamless upgrades • Current practice • During upgrade, network operators withdraw “router-under-upgrade” from forwarding service • Route flaps, traffic disruption, instability • Operators have to carefully schedule upgrades • Schedule them during night when load is moderate • Stagger upgrades of different routers • A painful job INFOCOM 2002
We Can do Better • Router can continue forwarding even while its routing process is inactive, at least for a while • Current routers have separate routing and forwarding paths • Routing in software (CPU), forwarding in hardware (switching) • Routing protocols need to be extended since they always try to route around inactive router • Our proposal: IBB (I’ll Be Back) Extension to OSPF • Other proposals • OSPF: Hitless restart proposal by Jonh Moy • Internet draft: draft-ietf-ospf-hitless-restart-02.txt • BGP: Graceful restart proposal by Sangli et al. • Internet draft: draft-ietf-idr-restart-05.txt INFOCOM 2002
Shortest Path Tree (SPT) LSA LSA Data packet Data packet Router Model Route Processor (CPU) OSPF Process Topology view Forwarding Info. Base (FIB) Forwarding Forwarding Switching Fabric Interface card Interface card INFOCOM 2002
IBB Proposal in a Nutshell • OSPF process on router R needs to be shutdown • Before shutdown, R informs other routers that • it is going to be inactive for a while • R specifies a time period (IBB Timeout) by which it • expects to become operational again • Other routers continue using R for forwarding during • IBB Timeout period • If R comes back within IBB Timeout period, • no routing instability or flaps • Else other routers start forwarding packets around R INFOCOM 2002
A A 10 3 6 6 B R B R 2 2 (b) Topology changes while R is inactive • Topology when • R went down What if Topology Changes • R cannot update its forwarding table to reflect the change • Can lead to loop or black holes INFOCOM 2002
Handling Changes: Options • Don’t do anything • Stop using R: Moy’s proposal • Inadvertent changes during upgrade are likely • Flapping due to a bad interface somewhere • But all changes are not bad • Do not always lead to loops or black holes • Stop using R only when loop or black hole gets formed • And only for those destinations for which there is a problem • Need algorithms which is what the bulk of the paper is about Our approach INFOCOM 2002
Roadmap of Algorithm • Single area, single inactive router case • Loop formation • Black hole formation • Single area, multiple inactive routers case • Multiple areas INFOCOM 2002
Single Area, Single Inactive Router • Problem Formulation • Inactive Router = R • All routers other than R have the same image of the topology graph • R’s image is that of a past - the time at which it went down • Source = S, Destination = D • Next hop(R, D) = Y • Actual path a packet takes from S to D = P(S->D) INFOCOM 2002
S S S Y 1 1 1 2 20 20 R R R R 2 6 2 6 6 2 6 1 Y Y D D D D Y S 3 10 Topology changes while R is inactive S and Y have R on their paths to D in their SPT Topology when R went down Loop Detection • P(S->D) has a loop • iff S and Y have R on their paths to D in their SPTs • (Shortest Path Trees) If there is a loop, neighbor can always detect it INFOCOM 2002
S S Y 1 20 10 20 D D R 2 6 Y D 10 Changed topology while R is inactive S and Y calculate paths to D w/o R on it Loop Prevention • Every router needs to calculate a • path to D such that R does not appear on it INFOCOM 2002
Loop Avoidance Procedure • R sends forwarding table to neighbors before • shutdown • -Thus, Y knows that next hop(R, D) is Y • Detection: during SPF (Shortest Path First) • calculation neighbors detect loops • -Y checks if R exists on the path to D or not • Upon detection, neighbors send avoid messages • to other routers in the domain • -avoid(R, D) = avoid using R for reaching D • Prevention: upon receiving the avoid(R, D) • message, other routers calculate a new path to D • such that R does not appear on it INFOCOM 2002
Multiple Inactive Routers • Set of inactive routers: R1, R2, …, Rn • Loop avoidance procedure applies for each inactive router • Detection • Router detects loops for all its inactive neighbors • Prevention • A router can get avoid(Ri, D) messages for j inactive routers (j <= n) • The router avoids these j forbidden routers on its path to D • Problem: Set of forbidden routers can be different for different destinations • O(n) shortest path calculations • n = number of vertices INFOCOM 2002
Simplification • Router avoids all inactive routers if it has some forbidden routers on its path to D • Calculate two SPTs: • SPT with all inactive routers on it • SPT w/o any inactive router on it • If the path to D does not contain any forbidden routers on it, • pick next hop for D from the first SPT • Else, • pick next hop for D from the second SPT INFOCOM 2002
Performance • Maximum effect on the SPF calculation • Quantify overhead • Impact of • Topology size • Number of inactive routers • Prototype Implementation • IBB extension incorporated into GateD 4.0.7 INFOCOM 2002
SUT SUT LAN 1 1 1 1 R’1 R’2 R’m TopTracker TopTracker 1 1 1 TT TT R1 R2 Rm 20 1 1 1 M1 LSAs Complete graph With n nodes Routers under upgrade Emulated topology Testbed Setup Physical Topology SUT’s view of the Topology INFOCOM 2002
Time (mins) GateD on SUT IBB-GateD on SUT Case B m inactive rtrs, avoid them T = 0 Bring m rtrs down Bring m rtrs down in IBB mode Case A minactive rtrs Send avoid(Ri, Mj) messages to SUT (1<=i<=m, 1<=j<=n) T = 4 mean SPF time in Case B Overhead = mean SPF time in Case A T = 8 Bring m inactive rtrs up Bring m inactive rtrs up Experiment Sequence INFOCOM 2002
Result • Sources of overhead: • Second SPF calculation • Graph in case B is larger than in case A • Gets larger as m increases INFOCOM 2002
Conclusions • IBB proposal: extend OSPF so that a router can be used for forwarding even while its OSPF process is inactive • Main contribution: an algorithm that gracefully handles topological changes • Stops using the inactive router for a destination when using the router can lead to loops or black holes • Overhead of the algorithm is modest • Shows good scaling behavior in terms of topology size and number of inactive routers INFOCOM 2002
Future Directions • Incremental deployment • Can the algorithm be modified so that only a subset of routers need to support it? • Measuring other aspects of overhead • Messaging • Reducing the overhead • SPF calculation: incremental algorithm for second pass • Better data structures in prototype • Other protocols … INFOCOM 2002
Backup INFOCOM 2002
OSPF Background • Link-state routing protocol • all routers in the domain come to a consistent view of the topology by exchange of Link State Advertisements (LSAs) • set of LSAs (self-originated + received) at a router = topology • SPF Calculation • each router calculates a single source shortest path tree • Forwarding Information Base (FIB) • each router uses the tree to build its FIB, which governs packet forwarding INFOCOM 2002
OSPF Overview : Example A A 1 1 B B 1 1 1 1 1 E D D E C C 1 1 2 1 1 1 1 G 3 F F G 1 2 1 2 I H I H 1 1 1 J J SPT at G OSPF Domain (single area) INFOCOM 2002