Network Survivability

Network Survivability

Network Survivability • Want high availability for connections • A common requirement: 99.999% availability • Network links, nodes, individual channels can fail • Need survivable network that can continue providing service in the presence of failures • Survivability can be provided by protection switching • Provide some redundant capacity in the network • Automatically reroute around failure using the redundant capacity

Protection • Usually implemented in a distributed manner to ensure fast restoration of service after a failure • In most cases, protection schemes are engineered to protect against a single failure event • How to deal with more than one concurrent failure? • Divide the network into smaller subnetworks and restrict the protection scheme to within a subnetwork • Ensure that the mean time to repair a failure is much smaller than the mean time between failures

Basic Concepts • Working path: carry traffic under normal operation • Protection path: an alternate path to carry the traffic in case of failures • Working and protection paths are diversely routed so that both paths are not lost in case of a single failure

Dedicated v.s Shared Protection • Dedicated protection: each working connection is assigned its own dedicated protection bandwidth • Shared protection: if a set of working connections will not fail simultaneously, they can share protection bandwidth • Reduce bandwidth needed for protection • Protection bandwidth can be used to carry low-priority traffic under normal conditions

Nonrevertive v.s Revertive Protection • Traffic is switched from the working path to the protection path when a failure occurs • Nonrevertive protection: traffic remains on the protection path until it is manually switched back onto the working path • Revertive protection: once the working path is repaired, traffic is automatically switched back from the protection path onto the working path • Shared protection schemes are usually revertive

Unidirectional v.s Bidirectional Protection Switching • Unidirectional protection switching • When a fiber is cut, only the affected direction of traffic is switched over to the protection fiber • Used in conjunction with dedicated protection schemes • Traffic transmitted simultaneously on the working and protection paths • The receiver at the end of the paths simply selects the better of the two arriving signals • Not require a signaling protocol between the receiver and the transmitter

Unidirectional v.s Bidirectional Protection Switching • Bidirectional protection switching • When a fiber is cut, both directions of traffic are switched over to the protection fibers • The receiver needs to inform the transmitter of the cut  require a signaling protocol called automatic protection switching (APS) protocol • How does an APS protocol work? • If a receiver in a node detects a fiber cut, it turns off its transmitter on the working fiber and then switches over to the protection fiber to transmit traffic • The receiver at the other node detects the loss of signal on the working fiber and then switches its traffic over to the protection fiber

Path/Span/Ring Switching • Path switching: the connection is rerouted end to end on an alternate path • Span switching: the connection is rerouted on a spare link between the nodes adjacent to the failure • Ring switching: the connection is rerouted on a ring between the nodes adjacent to the failure

Network Survivability