350 likes | 485 Views
Increasing IP Network Survivability: An Introduction to Protection Mechanisms. 20. October 22, 2000. Jonathan Sadler Lead Engineer - ONG SE. Motivation. There is increasing demand to carry mission critical traffic, real-time traffic, and other high priority traffic over the public internet
E N D
Increasing IP Network Survivability:An Introduction to Protection Mechanisms 20 October 22, 2000 Jonathan Sadler Lead Engineer - ONG SE
Motivation • There is increasing demand to carry mission critical traffic, real-time traffic, and other high priority traffic over the public internet • Any network that carries critical, high-priority traffic needs to be resilient to faults • As network technologies continue to improve and converge, protection and restoration schemes have become available at multiple layers
Protection • What is it? • Automated mechanism for recovering traffic path • Invoked when the current working path fails • Requirements • Fast restoration time • Voice / video / data can tolerate small outages ( 50ms) • Predictable • Protection path is pre-determined • Can be dedicated (1+1) or shared (M:N) • Can be preemptive
Protection • How is protection different from dynamic rerouting? • Dynamic rerouting develops a new path utilizing current network state information • Delay incurred as state updates are flooded through network • Time to re-converge on new end-to-end path is long • Therefore time until destinations become re-reachable is long • Side Effect: State information will be received by nodes that are not involved in restoration causing unnecessary CPU usage • While best effort services may tolerate this behavior, new services will not • VoIP • Virtual leased line
Protection Domains • Method of dividing up a network into separate sub-networks in which a protection mechanism will operate • Cross domain coordination is required
Protection Topologies • Within a protection domain, a number of protection topologies may be used • Linear • Ring • Mesh • For any topology the following terminology applies: • Working: The path or span being used to carry live traffic • Protect: The path or span that will be used to recover live traffic
Protection Topologies - Linear • Two nodes connected to each other with two or more sets of links Protect Protect Working Working (1+1) (1:n)
Protection Topologies - Ring • Two or more nodes connected to each other with a ring of links • Line vs. Drop interfaces • East vs. West interfaces W E D L E L W Working Protect W E E W
Protection Topologies - Mesh • Three or more nodes connected to each other • Can be sparse or complete meshes • Spans may be individually protected with linear protection • Overall edge-to-edge connectivity is protected through multiple paths Working Protect
Protection Mechanisms • Protection mechanisms are the algorithms which will restore services carried by a specific network topology • Typically take advantage of topology characteristics • Two different approaches exist • Link oriented • Multiple links that support end-to-end connectivity can be individually switched to restore service • Path oriented • Two paths exist which can be “globally” switched to restore service
Protection Mechanisms - Linear APS • Two nodes connected to each other with two or more sets of transmission facilities • Receiving node will signal source node to change from working to protect facility via out-of-band communication “Switchover” Protect Working Working Protect “OK”
Protection Mechanisms - BLSR • Bi-directional = both directions are handled as one unitLine Switched = multiple nodes reconfigure line behavior Ring • Node that determines need for change will signal out-of-band to other node. All intermediate nodes on protect path then “reconfigure”. • Pros: Efficient • Cons: Not as fast asother protectionmechanisms “OK” A ? Z-A Working Z-A Protect A-Z Working A-Z Protect Z-A Working Z-A Protect A-Z Working A-Z Protect “Switchover” ? Z
Protection Mechanisms - BLSR cont’d • How is this efficient? • Each node is involved in reconfiguring when a protection switch is necessary. Consequently, each node knows if the bandwidth reserved for a service is actually in use. • If a specific route is declared the “primary route” for the service, then the protect path will only be used when trying to restore a failure on the primary route. • As a result, it is possible to insert a second signal on the protect path. • When a protection switch is necessary to handle the higher priority traffic, then the “Extra Traffic” will be removed by the nodes as part of the switchover activity.
Protection Mechanisms - BLSR cont’d • Why is more time needed for a protection switch? • Signaling latency • Traffic cross connect activation / deactivation in intermediate nodes • Definitely needed when Extra Traffic is in use
Protection Mechanisms - UPSR • Unidirectional = Each traffic direction is independentPath Switched = Not handled “node-by-node”Ring • Source generates two copies of signal • Destination evaluates both copies and chooses “best path” signal • Pros: Low switch time • Cons: Not efficient A ? Z-A Protect A-Z Working A-Z Protect Z-A Working A-Z Working A-Z Protect ? Z
Protection Mechanisms - Mesh • End-to-End Path Oriented • Requires: • Topology Discovery • Constrained Route Selection (x2) • Primary route • Protection route • Resource affinity (diversity) • Signaling Protocol • Service setup • Protection switchover • No standard solutions (yet)
Protection Mech. - Revertive Switching • Once the failed path has been restored, should the traffic be moved back? • Non-revertive Switching • Done when failed path is no longer going to be used with service (i.e. service rolls) • Revertive Switching • Automatic • System determines primary path is acceptable • Wait to Restore Time • Manual • Technician determines primary path is acceptable • Good in cases where the fault is experienced only under load
Protection Domain Consideration • What should be the scope of repair? • Global Repair • Traffic is restored using facilities within the global network • Local Repair • Traffic is restored using the minimum amount of facilities • Lacks network view, leading to potentially inefficient resource utilization
Protection Hierarchy • Protection functionality is defined for: • Optical Layer • SONET • ATM / Frame Relay • MPLS / IP • How should all these layers interact? • They shouldn’t
Two Layer Recovery Model • Most providers are adopting a two-layer model, where: • Very-fast bulk restoration is done as close to the transport media as possible • Optical Switching • SONET where Optical Switching is not available • Service level restoration is done at the specific service layer • SONET -- VT1.5, STS-1, STS-3c, STS-12c, STS-48c services • ATM / FR -- Switched Data Services • MPLS -- IP Services • Layers in between are not used for restoration • Service level restoration timers are set so that transport restoration can be attempted first
Two Layer Recovery Model - Why? • Why have two layers instead of one? • Optical switching allows for the greatest number of services to be restored with the least amount of overhead • Optical switching will find out about physical failures first • Loss of light • Optical AIS • Optical protection domains are typically smaller than service-level protection domains, reducing signaling time • Service layers understand service specific performance requirements best, but may have a large number of services to restore
Protection in SONET/SDH • Topologies / Mechanisms Available • 1+1 Linear APS • UPSR • BLSR • 2-fiber • Restoration channels must be reserved, reducing protected capacity • 4-fiber -- two sets of Tx/Rx fibers for each line interface • Span Switch: Can restore by utilizing alternate Tx/Rx fibers • Ring Switch: Utilizes restoration channels located on a separate ring • Extra Traffic possible • APS, BLSR signaling done in K0 / K1 bytes of overhead
Protection in SONET/SDH (cont’d) • Failure Criteria • Loss of Signal (LOS) • Loss of Frame (LOF) • Threshold Crossing • Bit Error Rate (BER) • Coding Violations (CV) • Excessive SONET Pointer Justifications • Alarm Indication Signal (AIS)
Applying Protection to MPLS • What does this do for me? • Provides fast restoration of MPLS services • Can be done on a service-by-service basis. For example: • Best effort could be biased to use Extra Traffic links • Bronze could be put on unprotected, but avoid Extra Traffic • Silver could be protected 1:n • Gold could be protected 1+1
Applying Protection to MPLS - How? • Perform constraint based route selection for primary path • Signal creation of working path LSP • Perform constraint-based route selection for secondary path, adding a constraint which removes links that do not meet diversity requirements • Signal “reservation” of protectpath LSP Working Protect
Applying Protection to MPLS - How? • Extensions to IS-IS / OSPF • Utilizes the same Constraint Routing extensions as TE • New constraint: Shared Resource Link Group (SRLG) • Used for diversity determination • Extensions to CR-LDP / RSVP-TE • Add Protection LSP declaration to ERO • Add Reverse Notification Tree & Fault Notification Messages
OSPF w/ TEIS-IS w/ TE RSVP-TE CR-LDP MPLS Protection - General Mesh Mech? • End-to-End Path Oriented • Requires: • Topology Discovery • Constrained Route Selection (x2) • Primary route • Protection route • Resource affinity (diversity) • Signaling Protocol • Service setup • Protection switchover
Benefits of a Generalized Control Plane • Extension of MPLS to non-IP technologies allows for: • Rapid provisioning of lower layer connections • Optical trails • SONET / SDH trails • Cut-through connections • Reduces traffic load on core routers • Extension of IP semantics (i.e. diff-serv) • Validates services that paid for protection are protected
Cut-through connection (simplified example) • Four IP Routers operating over Optical Network • Initial overlay network connects routers in a hub / spoke topology • High traffic load exists between Router A and D • Router A realizes need for direct path (based on link load threshold crossing), and signals request for path into network • New direct path is now used for A-D traffic B D A C
Summary • New services require mechanisms to recover working traffic as fast as possible • Optical Layer protection tools provide restoration with the least amount of overhead • Service Layer protection is also necessary • MPLS-TE with extensions can provide protection support for IP Networks • Can be extended to support any mesh network • Use of MPLS to integrate Optical and IP control planes allows IP service semantics to control protection mechanisms used at lower layers
Sample Deployment - LATA • SONET Protection inLocal Loop Network • IP Mesh Protection inDistribution Networkfor IP services • SONET Protection inDistribution Networkfor Private Line services
Sample Deployment - Long-Haul • Private Line and IP services are clients of Optical Core Network • Optical Core Network is a sparse mesh protectedby MPLS mechanisms
References • GR-253-CORE, “Synchronous Optical NETwork (SONET) Transport Systems: Common Generic Criteria,” Issue 2 rev 2, (Bellcore, January 1999) • GR-1230-CORE, “SONET Bi-directional Line Switched Ring (BLSR) Equipment Generic Criteria,” Issue 4, (Bellcore, December 1998) • GR-1400-CORE, “SONET Dual-Fed Unidirectional Path Switched Ring (UPSR) Equipment Generic Criteria,” Issue 2, (Bellcore, January 1999) • draft-owens-te-network-survivability-00.txt, “Network Survivability Considerations for Traffic Engineered IP Networks,” (IETF, March 2000) • draft-ietf-mpls-recovery-frmwrk-00.txt, “Framework for MPLS-based Recovery,” (IETF, September 2000) • draft-chang-mpls-path-protection-01.txt, “A Path Protection / Restoration Mechanism for MPLS Networks,” (IETF, July 2000) • draft-chang-mpls-rsvpte-path-protection-ext-00.txt, “Extensions to RSVP-TE for MPLS Path Protection,” (IETF, June 2000)