1 / 28

Multi-topology protection: promises and problems

Multi-topology protection: promises and problems. G. Apostolopoulos Institute of Computer Science Foundation Of Research and Technology Hellas (FORTH). Basic concept of MT protection. Based on IETF proposed MT extensions to IGPs Routers have multiple-routing tables

amina
Download Presentation

Multi-topology protection: promises and problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-topology protection: promises and problems G. Apostolopoulos Institute of Computer Science Foundation Of Research and Technology Hellas (FORTH)

  2. Basic concept of MT protection • Based on IETF proposed MT extensions to IGPs • Routers have multiple-routing tables • Need to pick a routing table for each incoming packet • Different addresses • Various types of packet marking • Use MT to repair failures • When a link/node fails affected traffic is locally switched to a pre-computed “backup” topology • Each destination in the FIB has a backup next-hop that is activated when a local link fails • Traffic reaches the destination over the backup topology without loops

  3. MT protection Traffic reaches dest over backup top Backup s Mark traffic to send to backup top Primary d

  4. Advantages of MT • Fast local repair of failure • Can repair all possible failures (single link or node) • Multiple failures can be detected and addressed • No need to distinguish between link and node failures • ECMP, SRLG, lan failures, and multi homed prefixes can be handled easily • No need for tunneling • But must mark packets instead • Can optimize how traffic is routed after the failure by manipulating link weights on the backup topologies • Failure may not last long but even so traffic impact is undesirable

  5. But there are issues • Basic operation and optimization have been worked out • Rough overview of some remaining issues: • How to differentiate traffic • Premium versus regular and BE • How to use MT in a real network: • Multiple areas, Inter-AS, Hot-potato routing with BGP transit traffic • How to return to normal after failure is repaired • Operational issues • How complex is to configure? • How expensive is to monitor/troubleshoot? • Incremental deployment? • How to optimize link weights • What to optimize for? • Need to know traffic matrix

  6. Traffic differentiation • Premium and regular/BE traffic • I may be willing to preempt non-premium traffic to make sure “premium” traffic is still ok • Standard practice with existing CSPF-FRR architectures • Different topologies for each traffic class • One of the envisioned uses of MT anyway • scalability? • Traffic optimization goals may be different now • Have to consider the interaction between the traffic types • Minimize effect on premium traffic • Do not starve BE traffic • …

  7. How to return to normal after repair? • Failure is repaired and IGP is re-converging • How to switch traffic back to the initial (no-failure) topology without micro-loops • How to avoid micro-loops in general • After each IGP convergence event • Solution • Use a “fixed” topology – same as the primary topology • Continue routing traffic over the fixed/backup topology • Let IGP converge in the primary topology • After “convergence is complete”, switch all traffic to the primary topology

  8. Converge in a separate topology Backup s Fixed d s Switch traffic after IGP has converged Primary d

  9. How to tell when IGP has converged • Use a “convergence” timer in IGP • Start it when a change that will require IGP re-convergence is detected • All traffic is forwarded over fixed/backup topologies • Must move traffic from primary to fixed • During convergence • New routes are installed in primary topology • After timer expires (after IGP has converged) • Switch all fixed and backup traffic to primary topology • Since no topology is in flux no micro-loops will occur from switching topologies

  10. MT with multiple areas • Some of the destinations may be summary routes coming from outside areas • Need to map these summary routes to backup topologies in order to compute the backup next-hop for them • IGP can do this mapping • Link and non-ABR node failures • Backup topologies for each area cover failures inside the area • No need to coordinate with other areas • Need to unmark the repaired packets when they leave the area • Remote area does not know about local backup topologies

  11. MT with areas • ABR failures • Failure affects two areas • Need a primary and a backup ABR for each summary route • Simple case: primary and backup ABR connect to the same area • Can handle with local backup topology • unmark packet when it leaves the area • Hard case: primary and backup ABR connect to different areas • Need to coordinate backup topologies among these areas else packet may loop

  12. Multi-area MT example Unmark the packet Packet will reach dest without issue Route-1

  13. Multi-area MT example Unmark the packet Route-2 Packet will not reach Dest needs coordination

  14. Other reasons for looking at all areas together • SRLGs may be different in each area • E.g. area 1 can not use ABR 2 as backup due to SRLG constraints in area 2 • May be necessary if I want to optimize routing • Backup topologies for different areas will have to coordinate their link weights for most effective routing after a failure • But it may be too expensive to optimize such a large topology

  15. Inter-AS traffic • Cover failures of border routers and peering links • Peering links do not belong to IGP • Need extensions to let IGP know about these links • Stub links • Stub (potentially multi-homed) ISP and outgoing traffic • Similar to the area problem • IGP can compute the backup topologies • Can compute few, independent of the number of BGP prefixes • Need to map BGP prefixes to these topologies to compute their backup next-hops • Should not have to import all BGP routes into IGP • Repaired packets need to be unmarked as they leave the AS

  16. Inter-AS operation Special node Prefix

  17. BGP-IGP interactions • How to map the BGP routes to the backup topologies and compute the backup next-hops for the BGP routes • Backup topologies are computed by IGP • Prefix reachability is controlled by BGP policy decisions • One approach • BGP will have to tell RIB which two border routers can be used for reaching a prefix • BGP must have a concept of a “backup” border router for each prefix • IGP will tell RIB about the backup topologies • RIB will compute the backup next-hop for BGP routes on their way to the FIB

  18. MT with hot-potato BGP traffic • Problem: • Changes in the IGP weights/topology can cause massive shifts to transit BGP traffic • MT can help • By avoiding micro-loops during IGP convergence • By creating a BGP forwarding topology that is engineered and protected with MT and insensitive to some of the changes in the IGP layer • This topology can be applied to only selected transit BGP prefixes • Optimization of traffic routing after failures becomes quite useful now

  19. Other concerns • What is the administrative overhead of MT? • What is the performance overhead of MT • Storage? • IGP signaling?

  20. Administrative overhead • Need to manage multiple IGP topologies • OSS tools will need to be extended • If backup topologies are optimized then I need to manage multiple sets of IGP link weights • Quite a bit of effort • But done by automated offline tools anyway • Troubleshooting and monitoring • Over which topology this prefix is routed? What is the connectivity status of topology T? • All tools (ping, traceroute) may have to be upgraded depending on the topology de-multiplexing method • Does not look too good but compare: • With full mesh of statically configured and optimized LSPs for TE • With statically configured FRR tunnels • Incremental deployment is tricky! • May not be able to guarantee protection from all failures if not supported by all nodes

  21. So how bad is scalability • Simulations show that • Can repair failures with 3-4 backup topologies • Can optimize routing after failure with 6-8 topologies • How much does each topology cost? • SPF computation • One SPF per topology, few topologies so not an issue • IGP signaling • No extra cost, single adjacency for all topologies • IGP RIB space • Separate routing tables for each topology • Can share next-hops • System RIB space • Separate routing table for each topology • Only for IGP routes • BGP routes will not have to be replicated • Can share next-hops

  22. Example FIB structure Prefix lookup Topology HASH 0 0 0 0 0 0 0 0 1 X 0 0 3 0 0 2 1 4 2 0 0 3 2 Backup Table 1 23.45.6.2 Primary Table 3 topologies 4 next-hops for ECMP Shared Next hop structures

  23. MT with MPLS • Use MPLS labels for de-multiplexors • Build a MPLS forwarding plane for each topology using LDP • For VPNs/BGP free cores • Simple LDP extensions • Essentially MT-LDP • No need to encapsulate traffic on a failure • Simpler than RSVP-TE/FRR • less signaling overheads • Configuration overhead is not clear though • Depends a lot on the OSS tools used

  24. MT with multicast • Has become interesting with the advent of IP TV etc… • IETF discusses methods to extend • LDP for P2MP LSPs and • MPLS-FRR for P2MP LSPs • MT protection can be easily extended to be used there using P2MP MPLS labels and P2MP extensions to LDP • And we can still optimize traffic after failure

  25. Dynamic TE • Traffic matrices can change significantly • DDoS attacks, Diurnal patterns, Failures • Adjust routing • Does not have to happen extremely fast • Ideally this should happen automatically • CSPF-MPLS has ways to cope with this • Traffic flows inside LSPs • May need a full mesh though • If a link gets overloaded cat try to shift some LSPs away from it • Doing this automatically can lead to oscillations

  26. MT dynamic TE • When a link is overloaded need to shift traffic away from it • Option A: • Create topology T1, re-optimize weights in T1 and shift all traffic to T1 • Needs a coordinated switch, large impact on the network • Better when change in traffic patterns is permanent • Option B: • Shift only some (S,D) pairs to T1 • No need for coordinated switch and smaller impact to network • Better for temporary changes in the traffic patterns • Optimization problem • What traffic I send into T1 • What link weights I use for T1? • Can I do something that is adaptive/feedback based?

  27. Immediate work items • MT with multiple traffic classes • New optimization constraints (interaction among traffic types) • P2MP protection • Optimization issues • What are the right optimization goals? • combine P2MP and P2P backup for reaching the optimal solution • Dynamic TE with MT • Optimization for both the (S,D) and the link weights • Adaptive based on congestion feedback?

  28. Other interesting items • MT in Ethernet networks • There have been some related proposals already • Use MT routing to handle rapid changes in links like those in a wireless network • Link fading could be considered a partial link failure • Deal with inaccurate traffic matrices • Solutions that adapt to changing traffic matrices • Algorithms that find good routings even when given inaccurate traffic matrices • There has been significant work on the traffic matrix estimation and inaccuracy problem

More Related