1 / 108

Deploying Tight-SLA services on an IP Backbone

This presentation will discuss design and deployment best practices to enable tight SLAs on an IP backbone, including validation results, operational guidelines, and deployment experience.

Download Presentation

Deploying Tight-SLA services on an IP Backbone

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deploying Tight-SLA services on an IP Backbone Clarence Filsfils – cf@cisco.com

  2. Objective • To present design & deployment good practices to enable tight SLAs to be offered • when to use what and how • validation results • operational guidelines • deployment experience • Focus on the backbone design

  3. An overview of the Analysis LLJ:Loss/Latency/Jitter Convergence DiffServ TE DSTE ISIS Sub-Second FRR Sub-100ms

  4. Further information • “Engineering a Multiservice IP backbone to support tight SLAs”, Computer Networks Special Edition on the New Internet Architecture • Full-Day Tutorial • RIPE41, APRICOT 2002: www.ibb.net/~filsfils • Low-Level Design Guides, Validation Results

  5. Agenda • Introduction and SLA • Sub-Second IGP Convergence • Backbone Diffserv Design • Conclusion

  6. Typical Core Per Class SLA Characteristics Typically more Classes at the Edge

  7. One-Way Jitter • Delay variation generally computed as the variation of the delay for two consecutive packets • Due to variation of • Propagation delay • Switching / processing delay • Queuing / scheduling delay • Jitters buffers remove variation but contribute to delay

  8. Backbone VoIP Jitter Budget • Typical jitter budget: • Mouth to ear budget 100ms • Backbone propagation – 30ms • Codec delay – ~35ms • Jitter Budget = 35ms • 30ms for the access • 5ms for the core • 10 hops => 500 µs/hop

  9. Per flow sequence preservation • Best-practise IP Design: per-flow loadbalacing! • Re-ordering Impact on Service Perception • Long-Lived TCP: degraded goodput • Real-time video: loss rate += OOS_rate • VoIP: jitter

  10. Re-ordering Impact on Service • [LAOR01]: “Results show that packet reordering, by at leastthree packet locations, of only a small percentage of packets in the backbonelink can cause a significant degradation of applications throughput. Longflows are affected the most. Due to the potential effect, minimizing packetreordering, as well as mitigating its effect algorithmically, should be considered”.

  11. Loss of Connectivity / Convergence • Incentive to reduce the loss of connectivity (LoC) • Availability • 99.999% per day  0.9sec of downtime • VoIP • 40msec LoC: glitch • 1, 2 sec LoC: call drop

  12. How to specify the target for the metric • SLA statistical definitions do matter • min/avg/max versus percentile • Measured time interval… • SLAs definitions today tend to be loose • averaged over a month • averaged over many POP-to-POP pairs (temptation to add short pairs to reduce average…) • IP Performance Metrics IETF WG

  13. Optimizing the IP Infrastructure • Loss, Latency, Jitter: iif Demand < Offer • OverProvisioned Backbone • Differentiated Services • Capacity Planning • TE and DS-TE • Loss of connectivity due to link/node failure • IGP Convergence • MPLS FRR Protection

  14. Agenda • Introduction and SLA • Sub-Second IGP Convergence • Backbone Diffserv Design • Conclusion

  15. Loss of Connectivity • IGP Backbone Convergence: • the time it takes for connectivity to be restored upon link/node failure/addition for an IP flow starting on an edge access router and ending on another edge access router, excluding any variation of BGP routes. • For this session, IGP = ISIS

  16. Historical ISIS Convergence • 10 to 30 seconds • Not excellent • In the past, focus has been more on stability than on fast convergence • typical trade-off

  17. What this presentation will explain • ISIS Convergence in 1 or 2 second is conservative

  18. Link-State protocol overview 24

  19. H G F C D E B A An example network 3 5 5 3 12 12 4 2 3 3 7 3 8 S2 4 S3 S1 3 S0

  20. The Final SPT rooted at A G: oif so & s3, Cost 13 5 F: oif so & s3, Cost 8 2 C: oif so & s3, Cost 6 D: oif s3, Cost 3 E: oif so, Cost 11 3 3 3 8 S3 B: oif so, Cost 3 A: oif null, Cost 0 3 S0

  21. G: oif so & s3, Cost 13 G 5 5 5 F: oif so & s3, Cost 8 F 12 12 4 2 2 D: oif s3, Cost 3 C: oif so & s3, Cost 6 C D E: oif so, Cost 11 3 3 E G: oif s3, Cost 13 3 3 7 3 5 3 8 8 S2 4 S3 S3 B B: oif so, Cost 3 S1 F: oif s3, Cost 8 3 A A: oif null, Cost 0 3 S0 S0 2 D: oif s3, Cost 3 C: oif s3, Cost 6 E: oif s1 & s3, Cost 12 3 3 8 S3 B: oif s1, Cost 4 A: oif null, Cost 0 4 S1

  22. The RIB construction Lo0: 1.1.1.1/32, C=0 Pos1: 2.0.0.1/30, C=2 • ISIS adds the following paths to the RIB: • 1.1.1.1/32: OIF = S0 or S3 with Metric 6 (6+0) • 2.0.0.1/30: OIF = S0 or S3 with Metric 8 (6+2) C: oif so & s3, Cost 6 D: oif s3, Cost 3 3 3 3 S3 B: oif so, Cost 3 A: oif null, Cost 0 3 S0

  23. LSDB, RIB and FIB sh isis data Static Routes ISIS LSDB BGP table Best RIB sh ip route Control Data Plane FIB & dFIB sh ip cef

  24. SPF optimisations 30

  25. SPF Optimizations • Most Basic Implementation • Any change (link, node, leave)  recompute the whole SPT and the whole RIB • Optimization 1: decouple SPT and RIB • If any topology change (node, link)  recompute SPT and the RIB • If only a leave change (IP prefix)  keep the SPT, just update the RIB for the nodes whose leaves have changed Called “SPF” Called “PRC”

  26. Int lo 0: 65.1.1.1/32 D A B Cost: 3, NH: D Cost: 0, NH: -- Cost: 3, NH: B G E Cost: 11, NH: B Cost: 13, NH: D F C Cost: 8, NH: D, B Cost: 6, NH: D, B PRC • PRC here consists in just adding 65.1.1.1/32 in the RIB. The SPT is not affected. S2 S3 S1 S0

  27. Incremental-SPF • Optimization 2 • When the topology has changed, instead of building the whole SPT from scratch just fix the part of the SPT that is affected • Only the leaves of the nodes re-analyzed during that process are updated in the RIB

  28. A B D Cost: 3, NH: D Cost: 0, NH: -- Cost: 3, NH: B G E Cost: 13, NH: D Cost: 11, NH: B C-G link is down. C-G link was not used in SPT anyway, therefore there is no need to run SPF. F C Cost: 8, NH: D, B Cost: 6, NH: D, B Incremental-SPF S2 S3 S1 S0

  29. H D B A Cost: 3, NH: D Cost: 0, NH: -- Cost: 3, NH: B G E Cost: 13, NH: D Cost: 11, NH: B F C Cost: 6, NH: D, B Cost: 8, NH: D, B Incremental-SPF F reports a new neighbor. The SPT need only to be extended behind F. There is no need for router A to recompute the whole SPT Router A will compute SPF from node F S2 S3 S1 S0

  30. Incremental-SPF • More information is kept in the SPT • Parents list • Neighbors list • Based on the changed information, the SPT is “modified” in order to reflect the changes

  31. Incremental-SPF • The further away from the root the change, the higher the gain

  32. SPF, PRC, I-SPF: summary • Only a leaf change • PRC • Graph impacted • normal-SPF: recompute the full SPT and hence reinserts all the ISIS routes in the RIB • I-SPF: only recomputes the part of the SPT that is affected. Only the leaves from that part are affected.

  33. Topology and Leaf Optimizations 39

  34. C D E LSP A IS: 3 B IS: 4 B IS: 7 C IS: 3 D B LSP B IS: 3 A IS: 4 A IS: 3 C IS: 8 E A Parallel point-to-point adjacencies 3 • Only best parallel adjacency is reported 3 7 3 8 S2 4 S3 S1 3 S0

  35. interface fastethernet1/0 • isis network point-to-point Rtr-B Rtr-B Rtr-B Pseudonode Rtr-A Rtr-A Rtr-A P2P mode for back-to-back GE • No DIS election • No CSNP transmission • No Pseudo-node and extra link

  36. Speeding up route installation • Limit the # of leaves in the IGP • only the BGP speakers are needed ( ) • rest: I-BGP • router isis • advertise passive-only

  37. SPF, PRC and LSP-genExponential BackOff Timers 43

  38. Backoff timer algorithm • IS-IS throttles it main events • SPF computation • PRC computation • LSP generation • Throttling slows down convergence • Not throttling can cause melt-downs • The scope is to react fast to the first events but,under constant churn, slow down to avoid to collapse

  39. Backoff timer algorithm • spf-interval <Max> [<Init> <Inc>] • Maximum interval: Maximum amount of time the router will wait between consecutives executions • Initial delay: Time the router will wait before starting execution • Incremental interval: Time the router will wait between consecutive execution. This timer is variable and will increase until it reaches Maximum-interval

  40. E1Event1 E2 E3 E4 E5 E6 E7 SPF SPF SPF 100ms 1000ms 2000ms 4000ms spf-interval 10 100 1000 • Then 8000ms • Then maxed at 10sec • 20s without Trigger is required before resetting the SPF timer to 100ms

  41. Default Values • Incremental-interval: • SPF: 5.5 seconds • PRC: 5 seconds • LSP-Generation: 5 seconds • Maximum-interval: • SPF: 10 seconds • PRC: 5 seconds • LSP-Generation: 5 seconds • Initial-wait: • SPF: 5.5 seconds • PRC: 2 seconds • LSP-Generation: 50 milliseconds

  42. LSP LSP E F B Two-Way Connectivity Check • For propagating Bad News, 1! LSP is enough

  43. Timers for Fast Convergence router isis spf-interval 1 1 50 prc-interval 1 1 50 • Init Wait: 1ms • 5.5 sec faster than default reaction! • Optimized for the going down mode • Exp Increment ~ S ms • Max Wait ~ n * S ms • CPU utilization < 1/n

  44. Timer for Fast Convergence router isis lsp-gen-interval 5 1 50 • The timers are designed to optimize the propagation of the information to other nodes. • Init-Wait = 1ms, 49ms faster than default • Exp-Inc = S, eg. 50ms

  45. LSP Pacing and Flooding 53

  46. LSP Pacing and Flooding Int pos x/xisis lsp-interval <> • Pacing: • Default: 33msecs inter-LSP gap • backoff protection • full database download • suggest to keep the default • Flooding • flood/SPF trade-off

  47. Link Protocol Properties 55

  48. Link Protocol Properties • Link Failure Detection • the faster and more reliable, the better • Dampening flapping links • Fast signalling of a Down information • Stable signalling of an UP information • Freeze a flapping link in Down status

  49. POS – Detection of a link failure • Pos delay trigger line: • hold time before reacting to a line alarm • default is: immediate reaction • Pos delay trigger path: • hold time before reacting to a path alarm • default is: no reaction • Carrier-delay • hold time between the end of the pos delay holdtime and the bring down of the IOS interface • default: 2000 msec

  50. POS – Detection of a link failure int pos 1/0 carrier-delay msec 8 • Redundant for POS interfaces

More Related