340 likes | 821 Views
Ethernet Automatic Protection Switching (EAPS). A small comparison with Eternet Ring Protection Switching (ERPS). Introduction. EAPS is a protocol invented to increase the availability of Ethernet rings Developed by Extreme Networks (RFC3619 – 2003) Objective:
E N D
Ethernet Automatic Protection Switching (EAPS) A small comparison with Eternet Ring Protection Switching (ERPS)
Introduction • EAPS is a protocol invented to increase the availability of Ethernet rings • Developed by Extreme Networks (RFC3619 – 2003) • Objective: • Provide a resilience level comparable to SONET rings • Current version (v1.3 - 2011) has some enhancements over version 1 (RFC3619 – 2003)
Motivation • Ethernet is widely used in Local Area Networks (LANs) and Metropolitan Area Networks (MANs) • Typically present a ring topology • MAN operators want to reduce recovery time • Spanning Tree Protocol (STP) could take 30 – 60 second to recover • Rapid Spanning Tree Protocol (RSTP) is faster... • Convergence time depends on the number of nodes • Both STP and RSTP limit the number of nodes • EAPS recovers in less than 1 second (100 ms) • Does not limit the number of nodes!!!
Basic Considerations (I) • A ring is made up of two or more switches • Each switch has two ports connected to the ring • An EAPS domain exists on a single Ethernet ring • A domain protects a group of VLANs • A domain has a unique control VLAN • Multiple EAPS domains could coexist on the same ring • Multiple control VLANs
Basic Considerations (II) • For each EAPS domain: • One of the nodes is the Master (S1) • One port is designated as the Primary port (P) • The other is the Secondary Port (S) • All other nodes (S2-S6) are known as Transit nodes
Normal Operation • The Master node blocks its secondary port -> avoid loops • Non-control traffic is blocked (Control VLAN is NOT blocked) • Master is in COMPLETE state • Transient nodes are in LINKS-UP state • The Master sends health-check frames (HEALTH-CHECK- PDU) periodically (Hello timer) • From primary port to secondary port • Control frames consumed by the Master -> NOT forwarded
Fault Operation • When a fault is detected: • The Master changes to FAILED state • Unblocks secondary port • Flushes it bridging table • The Master orders the other nodes to flush their tables • Sends a RING-DOWN-FLUSH-FDB-PDU frame • Transit nodes learn the new topology
Fault Detection (I) • 2 ways of detecting a failure • Link Down Alert • Ring Polling • Link Down Alert • Transient nodes detect a link-down • Transient detecting the failure changes to LINKS-DOWN state • Transient sends a LINK-DOWN-PDU frame to the Master • Master changes to FAILED state • Master unblocks secondary port • ...
Fault Detection (II) • Ring Polling (version 1 – RFC3619) • Master sends HEALTH-CHECK-PDU frames periodically • From primary to secondary port • Master has a Fail-period timer • If health check frame received before timer expires -> reset timer • If health check frame NOT received before timer expires • Master changes to FAILED state • Master unblocks secondary port • ...
Fault Detection (III) • Ring Polling (version 1.3) • 2 options if the Fail-period timer expires (configurable) • «Open Secondary Port» -> previous slice • «Send-Alert» • Master DO NOT unblock its secondary port yet • Master sends a QUERY-LINK-STATUS-PDU frame out of both ports • Transit nodes with link failure reply with LINK-DOWN-PDU frame • Master changes to FAILED state • ... • Prevents False Failures • Health frames could not return to Master –> even if the ring is complete • Control VLAN misconfigurations • Too much traffic • Master node’s CPU busy Why?
Fault Restoration (I) • Master in FAILED state -> continues sendind HEALTH-CHECK-PDU frames • Ring restored -> Master’s secondary port receives health frame • Master changes to COMPLETE state • Blocks non-control frames on secondary port • Flushes its bridge table • Orders the other nodes to flush their tables • Sends a RING-UP-FLUSH-FDB-PDU frame • Transit nodes re-learn the topology
Fault Restoration (II) – PREFORWARDING State • Time between • The Transit node detecting its link is restored • The Master detecting the ring is restored • Master’s secondary port is unblocked • Possible temporary loop !!!! • When Transit node detects its link is restored • Changes to PREFORWARDING state and starts Preforwarding timer • Protected VLANs in that port are temporary blocked • Waits till a RING-UP-FLUSH-FDB-PDU is received • Changes to LINKS-UP state • Unblocks previously blocked VLANs • Flushes its bridge table and stops Preforwarding timer • Re-learns topology
Fault Restoration (III) – PREFORWARDING State • Preforwarding timer deals with: • Lost RING-UP-FLUSH-FDB-PDU from the Master • Another break in the ring • If the transient node remains in PREFORWARDING state indefinitely -> disconnected network • Preforwarding timer is derived from the Hello-timer for HEALTH-CHECK-PDU frames
Enhancements of version 1.3 • «Send-alert» configuration for Ring Polling fault detection method • INIT state • Master comes up for first time and its ports are up • Master does not know if the ring is up • Master starts in INIT state -> blocks secondary port • When the first health frame is received -> changes to COMPLETE state • Helps spotting misconfigurations in control VLAN • LINK-UP-PDU • Transient detects a link comes up -> sends LINK-UP-PDU to Master • Timestamp used for trouble-shooting • If the Master never changes to COMPLETE state • Allows use of EAPS Shared-Ports
VLANs in Multiple EAPS domains (Multiple Rings) (I) • EAPS could handle a simple configuration • Each ring has a EAPS domain, a Master node and a Control VLAN • VLAN spanning in both rings is added as protected by both EAPS domains
VLANs in Multiple EAPS domains (Multiple Rings) (I) • Topologies with a common link could be problematic • If the common link fails • Both Masters open secondary ports • Protected VLANs spanning both rings will have a loop • S1-S2-S3-S4-S5-S6-S7-S8-S9-S10-S1 • EAPS Shared-Ports deals with it • Out of the scope
States and Control Frames Version 1 – RFC3619 Version 1.3
Ethernet Ring Protection Switching (I) • Ethernet Ring Protection Switching (ERPS) is defined by ITU-T G.8032 -> achieve sub-50 ms recovery times in rings • Basic considerations: • One link is designated as the Ring Protection Link (RPL) -> blocked to prevent loops • The node setting the block is the RPL Owner (Master in EAPS) • Nodes monitor link failure using Ethernet Continuinity Check (ETH-CC) messages • Four defined local events: • Local Signal Failure (local SF) -> detection of link failure • Local clear Signal Failure (local clear SF) -> detection of link restoration • Wait-To-Restore Expire (WTR-Expire) -> timer expiration • Wait-To-Restore Running (WTR-Running) -> timer running
Ethernet Ring Protection Switching (II) • Basic considerations (cont.): • The protocol uses Ring Automatic Protection Switching (R-APS) messages: • R-APS(SF): sent by the node detecting link failure (gets local SF) • R-APS(NR): sent by the node detecting link restoration (gets local clear SF) • R-APS(NR,RB): sent by RPL Owner indicating the RPL is blocked • Two important timers • Wait-To-Restore (WTR) Timer: usedby the RPL Owner to verify that the ring has stabilized before blocking the RPL after failure • Guard Timer: used by links detecting link restoration to avoid receiving outdated R-APS messages • Three states for nodes • Initialization: first defining the node • Idle: normal state, RPL blocked, all nodes/ports working • Protecting: protection switching is in effect
Ethernet Ring Protection Switching (III) • Basic considerations (cont.): • An R-APS channel is configured using a VLAN -> transmitting R-APS messages
ERPS Principle of Operation (I) • In normal operation (nodes in state Idle): RPL is blocked • Link failure (local SF): nodes detecting it block failed port, send R-APS(SF) and flush filtering database (FDB) • Nodes receiving R-APS(SF) flush FDBs • RPL Owner receives R-APS(SF): flushes FDB, unblocks RPL • Link Restoration (local clear SF): detecting nodes send R-APS(NR) periodically and start Guard Timer • RPL Owner receives R-APS(NR): starts WTR Timer • WTR Timer expires: RPL Owner blocks RPL, sends R-APS(NR,RB) and flushes DFB • Nodes receiving R-APS(NR,RB) flush FDBs • Nodes detecting link restoration unblock recovered ports, stop sending R-APS(NR) and flush FDBs
EAPS vs. ERPS • Same basic idea: break the loop in the ring by blocking one port • In case of failure, unblock the blocked port and keep connectivity • EAPS: • Both the Master and Transient nodes can detect a failure • Only the Master detects the failed link is restored • ERPS: • Only the nodes adjacent to a failed link detect failures and restoration
References • S.Shah, M. Yip, «RFC3619: Extreme Networks’ Ethernet Protection Switching (EAPS), Version 1», Network Working Group, October 2003. • A. Lim, S. Blake, S. Shah, «Extreme Networks’ Ethernet Protection Switching (EAPS), Version 1.3», Internet-Draft, July 2011. • Extreme Networks Whitepaper «Ethernet Automatic Protection Switching (EAPS)», Extreme Networks, Inc., 2006. • J. D. Ryoo, H. Long, Y. Yang, M. Holness. Z. Ahmad, J. K. Rhee, «Ethernet Ring Protection for Carrier Ethernet Networks», IEEE Comm. Magazine, September 2008