Multi-topology protection: promises and problems

Multi-topology protection: promises and problems G. Apostolopoulos Institute of Computer Science Foundation Of Research and Technology Hellas (FORTH)

Basic concept of MT protection • Based on IETF proposed MT extensions to IGPs • Routers have multiple-routing tables • Need to pick a routing table for each incoming packet • Different addresses • Various types of packet marking • Use MT to repair failures • When a link/node fails affected traffic is locally switched to a pre-computed “backup” topology • Each destination in the FIB has a backup next-hop that is activated when a local link fails • Traffic reaches the destination over the backup topology without loops

MT protection Traffic reaches dest over backup top Backup s Mark traffic to send to backup top Primary d

Advantages of MT • Fast local repair of failure • Can repair all possible failures (single link or node) • Multiple failures can be detected and addressed • No need to distinguish between link and node failures • ECMP, SRLG, lan failures, and multi homed prefixes can be handled easily • No need for tunneling • But must mark packets instead • Can optimize how traffic is routed after the failure by manipulating link weights on the backup topologies • Failure may not last long but even so traffic impact is undesirable

But there are issues • Basic operation and optimization have been worked out • Rough overview of some remaining issues: • How to differentiate traffic • Premium versus regular and BE • How to use MT in a real network: • Multiple areas, Inter-AS, Hot-potato routing with BGP transit traffic • How to return to normal after failure is repaired • Operational issues • How complex is to configure? • How expensive is to monitor/troubleshoot? • Incremental deployment? • How to optimize link weights • What to optimize for? • Need to know traffic matrix

Traffic differentiation • Premium and regular/BE traffic • I may be willing to preempt non-premium traffic to make sure “premium” traffic is still ok • Standard practice with existing CSPF-FRR architectures • Different topologies for each traffic class • One of the envisioned uses of MT anyway • scalability? • Traffic optimization goals may be different now • Have to consider the interaction between the traffic types • Minimize effect on premium traffic • Do not starve BE traffic • …

How to return to normal after repair? • Failure is repaired and IGP is re-converging • How to switch traffic back to the initial (no-failure) topology without micro-loops • How to avoid micro-loops in general • After each IGP convergence event • Solution • Use a “fixed” topology – same as the primary topology • Continue routing traffic over the fixed/backup topology • Let IGP converge in the primary topology • After “convergence is complete”, switch all traffic to the primary topology

Converge in a separate topology Backup s Fixed d s Switch traffic after IGP has converged Primary d

How to tell when IGP has converged • Use a “convergence” timer in IGP • Start it when a change that will require IGP re-convergence is detected • All traffic is forwarded over fixed/backup topologies • Must move traffic from primary to fixed • During convergence • New routes are installed in primary topology • After timer expires (after IGP has converged) • Switch all fixed and backup traffic to primary topology • Since no topology is in flux no micro-loops will occur from switching topologies

MT with multiple areas • Some of the destinations may be summary routes coming from outside areas • Need to map these summary routes to backup topologies in order to compute the backup next-hop for them • IGP can do this mapping • Link and non-ABR node failures • Backup topologies for each area cover failures inside the area • No need to coordinate with other areas • Need to unmark the repaired packets when they leave the area • Remote area does not know about local backup topologies

MT with areas • ABR failures • Failure affects two areas • Need a primary and a backup ABR for each summary route • Simple case: primary and backup ABR connect to the same area • Can handle with local backup topology • unmark packet when it leaves the area • Hard case: primary and backup ABR connect to different areas • Need to coordinate backup topologies among these areas else packet may loop

Multi-area MT example Unmark the packet Packet will reach dest without issue Route-1

Multi-area MT example Unmark the packet Route-2 Packet will not reach Dest needs coordination

Other reasons for looking at all areas together • SRLGs may be different in each area • E.g. area 1 can not use ABR 2 as backup due to SRLG constraints in area 2 • May be necessary if I want to optimize routing • Backup topologies for different areas will have to coordinate their link weights for most effective routing after a failure • But it may be too expensive to optimize such a large topology

Inter-AS traffic • Cover failures of border routers and peering links • Peering links do not belong to IGP • Need extensions to let IGP know about these links • Stub links • Stub (potentially multi-homed) ISP and outgoing traffic • Similar to the area problem • IGP can compute the backup topologies • Can compute few, independent of the number of BGP prefixes • Need to map BGP prefixes to these topologies to compute their backup next-hops • Should not have to import all BGP routes into IGP • Repaired packets need to be unmarked as they leave the AS

Inter-AS operation Special node Prefix

BGP-IGP interactions • How to map the BGP routes to the backup topologies and compute the backup next-hops for the BGP routes • Backup topologies are computed by IGP • Prefix reachability is controlled by BGP policy decisions • One approach • BGP will have to tell RIB which two border routers can be used for reaching a prefix • BGP must have a concept of a “backup” border router for each prefix • IGP will tell RIB about the backup topologies • RIB will compute the backup next-hop for BGP routes on their way to the FIB

MT with hot-potato BGP traffic • Problem: • Changes in the IGP weights/topology can cause massive shifts to transit BGP traffic • MT can help • By avoiding micro-loops during IGP convergence • By creating a BGP forwarding topology that is engineered and protected with MT and insensitive to some of the changes in the IGP layer • This topology can be applied to only selected transit BGP prefixes • Optimization of traffic routing after failures becomes quite useful now

Other concerns • What is the administrative overhead of MT? • What is the performance overhead of MT • Storage? • IGP signaling?

Administrative overhead • Need to manage multiple IGP topologies • OSS tools will need to be extended • If backup topologies are optimized then I need to manage multiple sets of IGP link weights • Quite a bit of effort • But done by automated offline tools anyway • Troubleshooting and monitoring • Over which topology this prefix is routed? What is the connectivity status of topology T? • All tools (ping, traceroute) may have to be upgraded depending on the topology de-multiplexing method • Does not look too good but compare: • With full mesh of statically configured and optimized LSPs for TE • With statically configured FRR tunnels • Incremental deployment is tricky! • May not be able to guarantee protection from all failures if not supported by all nodes

So how bad is scalability • Simulations show that • Can repair failures with 3-4 backup topologies • Can optimize routing after failure with 6-8 topologies • How much does each topology cost? • SPF computation • One SPF per topology, few topologies so not an issue • IGP signaling • No extra cost, single adjacency for all topologies • IGP RIB space • Separate routing tables for each topology • Can share next-hops • System RIB space • Separate routing table for each topology • Only for IGP routes • BGP routes will not have to be replicated • Can share next-hops

Example FIB structure Prefix lookup Topology HASH 0 0 0 0 0 0 0 0 1 X 0 0 3 0 0 2 1 4 2 0 0 3 2 Backup Table 1 23.45.6.2 Primary Table 3 topologies 4 next-hops for ECMP Shared Next hop structures

MT with MPLS • Use MPLS labels for de-multiplexors • Build a MPLS forwarding plane for each topology using LDP • For VPNs/BGP free cores • Simple LDP extensions • Essentially MT-LDP • No need to encapsulate traffic on a failure • Simpler than RSVP-TE/FRR • less signaling overheads • Configuration overhead is not clear though • Depends a lot on the OSS tools used

MT with multicast • Has become interesting with the advent of IP TV etc… • IETF discusses methods to extend • LDP for P2MP LSPs and • MPLS-FRR for P2MP LSPs • MT protection can be easily extended to be used there using P2MP MPLS labels and P2MP extensions to LDP • And we can still optimize traffic after failure

Dynamic TE • Traffic matrices can change significantly • DDoS attacks, Diurnal patterns, Failures • Adjust routing • Does not have to happen extremely fast • Ideally this should happen automatically • CSPF-MPLS has ways to cope with this • Traffic flows inside LSPs • May need a full mesh though • If a link gets overloaded cat try to shift some LSPs away from it • Doing this automatically can lead to oscillations

MT dynamic TE • When a link is overloaded need to shift traffic away from it • Option A: • Create topology T1, re-optimize weights in T1 and shift all traffic to T1 • Needs a coordinated switch, large impact on the network • Better when change in traffic patterns is permanent • Option B: • Shift only some (S,D) pairs to T1 • No need for coordinated switch and smaller impact to network • Better for temporary changes in the traffic patterns • Optimization problem • What traffic I send into T1 • What link weights I use for T1? • Can I do something that is adaptive/feedback based?

Immediate work items • MT with multiple traffic classes • New optimization constraints (interaction among traffic types) • P2MP protection • Optimization issues • What are the right optimization goals? • combine P2MP and P2P backup for reaching the optimal solution • Dynamic TE with MT • Optimization for both the (S,D) and the link weights • Adaptive based on congestion feedback?

Other interesting items • MT in Ethernet networks • There have been some related proposals already • Use MT routing to handle rapid changes in links like those in a wireless network • Link fading could be considered a partial link failure • Deal with inaccurate traffic matrices • Solutions that adapt to changing traffic matrices • Algorithms that find good routings even when given inaccurate traffic matrices • There has been significant work on the traffic matrix estimation and inaccuracy problem

Multi-topology protection: promises and problems