150 likes | 288 Views
Protection and Reliability. BXR-48000 Switch Router. Objectives. Identify redundancy and protection for: System Control Processor (SCP) card Fabric module Port card Timing Control Module (TCM) Power Conditioning Module (PCM) Cooling Environmental Resource Card (ERC). BXR-48000 Recovery.
E N D
Protection and Reliability BXR-48000 Switch Router
Objectives • Identify redundancy and protection for: • System Control Processor (SCP) card • Fabric module • Port card • Timing Control Module (TCM) • Power Conditioning Module (PCM) • Cooling • Environmental Resource Card (ERC)
BXR-48000 Recovery • Redundancy features apply to the following: • Recovery due to hardware or software failure • Recovery of SCPs, fabrics and port cards from various conditions: • Cold start reboot • Warm start reboot • SCP switchover/failover
BXR-48000 Protection • The BXR-48000 supports redundant: • Power Conditioning Modules (PCMs) • Cooling (fan trays) • Timing Control Modules (TCMs) • Control processors • Fabrics • Port cards (via APS/MSP) • Internal signaling (100base-T full mesh) • Control and management links between all system communications channels • 99.999 percent available • Unavailable no more than 5.25 minutes per year
SCP Redundancy • SCP can work standalone or in 1+1 redundancy • Redundant SCPs can be configured to synchronize with each other • Heartbeat acts as “Keep Alive” between SCPs • Communicates with MCPs, TCMs, etc. • Lower SCP is the default working SCP CDB Flash CDB Flash MCPs MCPs
Fabric-level Data Protection • The BXR-48000 uses an N+1 redundancy system for the switch router fabric boards • Results in hitless operation during fabric failure • Requires a fabric board in the Fabric Protection (FP) slot • Uses the same hardware as all the other fabric boards • No special sparing needed • Uses data striping with parity
Fabric Protection: Data Striping • Used industry-wide in RAID applications • Only affects information as it travels through the system • Striping spreads data across all fabrics • Serial data stream converted to parallel “stripes” Fabric 3 Fabric 2 Ingress Port card Egress Port card Fabric 1 Fabric p
Odd number of logical 1’s Fabric Protection: Parity • Can be done on data units of any size • Based on a logical exclusive OR (XOR) of the data • Parity value effectively indicates an even or odd number of logical 1 values in the transmission • Comparing the received parity value with a calculated value detects flipped bits Sample Data Parity XOR Truth Table 1 0 1 1 0 1 1 1 1 0 0 1 0 1
Striping and Parity Example • Assume a redundant 120G configuration (3+1 fabrics) Receiver seesthe mismatch betweenreceived data andthe parity indication 1 1 Fabric 3 1 0 Fabric 2 101 101 Egress Portcard Ingress Portcard 1 1 Fabric 1 0 0 Fabric p Receiver reconstructsthe pattern accordingto the parity value The parity bit (XOR)traverses the protection fabric
Port Card Redundancy • Independent of fabric redundancy • Unidirectional and bi-directional APS (Bellcore)/MSP (ITU) • Initial working/protect determined/ configured by operator • Both port cards potentially working as APS (port fails, not the card) • Inter-card APS/MSP • APS state machines run on MCPs • Configurable to be revertive Adjacent ‘odd/even’ slots 1A, 1C, etc. 1B, 1D, etc. MCP MCP Direct link(100 Mbps) between adjacent port cards OC-48c/ STM-16 OC-48c/ STM-16
Port Card Port Card Port Card Switch Fabric Switch Fabric Switch Fabric SCP Y SCP X TCM 1 TCM 0 TCM Redundancy • TCMs operate in standalone or 1+1 mode • TCMs are hot-swappable • Clock source redundancy • Line Derived - primary and secondary • BITS - one input per TCM • Internal - Stratum 3E oscillator on each TCM • System support • Failover from one TCM to the standby is hitless
PCM Redundancy • Operates on 2 PCMs; 4 are required for redundancy • Redundant A0/A1 and B0/B1 feeds • Feed based on highest voltage • Load sharing only if A = B • Can operate on single “A” and “B” feeds indefinitely • Supports two split or four separate feeds • Typical load requires ~81A per feed Upper Fan Tray Top Power Distribution Panel Top Card Cage 12 Port cards 3 Fabrics 1 TCM-IM 1 SCP TCM-IM TCM X TCM Y AIM Bus Bars -48V Bottom Power Distribution Panel Bottom Card Cage 12 Port cards 3+1 Fabrics 1 SCP A0 B0 A1 B1 48 VDC Power Entry Panel Lower Fan Tray
Cooling Redundancy • Two fans trays each with four fans per tray • Both fan trays are required for proper cooling • Fan trays are hot-swappable and interchangeable • System operates with a failure of any one fan indefinitely • Remaining fans speed up upon failure of one fan
ERC Bus Architecture ERC Host Control SCP X Fabric card “n” Fabric card ERC Host Control SCP Y Port card “n” ERC (Backplane ID) Port card Upper Fan Tray ERC AIM ERC Lower Fan Tray ERC TCM 0 PCM A0 ERC TCM 1 PCM B0 ERC TIM PCM A1 ERC = Backplane Connection PCM B1 ERC = Serial Interface
Summary • Identified redundancy and protection for: • System Control Processor (SCP) card • Fabric module • Port card • Timing Control Module (TCM) • Power Conditioning Module (PCM) • Cooling • Environmental Resource Card (ERC)