Matthieu Clouqueur, Wayne D. Grover (presenter) clouqueur@trlabs, grover@trlabs

Mesh Restorable Networks with Complete Dual Failure Restorability and with Selectvely Enhanced Dual-Failure Restorability Properties Matthieu Clouqueur, Wayne D. Grover (presenter) clouqueur@trlabs.ca, grover@trlabs.ca TRLabs and University of Alberta Edmonton, AB, Canada web site for other related papers: www.ee.ualberta.ca/~grover OptiComm 2002 Boston, MA, USA 30/July/2002

Outline • Background on Dual Failure Restorability • Ideas and Motivations • Research Methods • Experimental Results • Conclusions and Impacts

Dual Failures - Really ? Not as “academic” a consideration as we first thought: • Sheer fiber route miles • Hermes RailTel estimate of one one cable cut /4 days • Span maintenance and upgrade effects • can be much like a first failure in network equivalent effects • Span SRLG and nodal bypass effects • cause logical dual failures • Availability of paths through single-failure restorable networks: • unavailability doesn’t just vanish... Becomes limited by dual failures

Background re: Dual Failure Restorability • Prior work on dual failure restorability analysis of span-restorable mesh networks:(refs: DRCN 01, JSAC 02) • concept of “first failure protection, second failure restoration” (pre-planned reaction) (adaptive reaction) • method for dual failure restorability analysis • Some key findings: • 1) Span restorable (or “link-protected”) mesh networks designed for R1 = 100%, give very high average R2 values as a side-effect ! • 2) Service path availability has far more to do with restorability to dual failures, not the speed of response to a single failure • and …3) Explicit design for R2=100% is very capacity-expensive

Background: Determination of “R2” Case 1: Two failures but no spatial interactions  no outage Case 2: Two failures and spatial interactions (competition for spare capacity)  may be outage Case 3: Two failures with second failure hitting the first restoration pathset  may be outage Case 4: Two failures isolating a degree-2 node  certain outage -> Use computer emulation of all dual failure pairs to analyze R2

Prior Finding of High Dual-failure Restorability in Networks Designed for Single Failure Protection / Restoration ... 100 % Between 50 % and 99 % R2(i j) on individual scenarios 70 % to 90 % network average R2 R1 (Single failure restorability) R2 (Dual failure restorability) Non-modular Modular R2 Results for 5 test networks: environment Environment Static behavior 0.53 to 0.75 0.69 to 0.83 First-failure adaptive 0.55 to 0.79 0.87 to 0.91 Fully-adaptive 0.55 to 0.80 0.91 to 0.99

Dual-failure restorable service class new gold silver Existing QoP paradigm bronze (economy) Research Questions • Is it possible to enhance the dual span-failure restorability of an R1=1 network design: • purely by a redistribution of the spare capacity ? • to maximize R2 subject to a given budget limit ? • Can we structure or allocate the finite R2 levels that are obtained to support a super-high availability service class ? “Platinum service class” = assureddual-failure restorability

Methods to Investigate these Questions Three Design Models : • Dual Failure Minimum Capacity (DFMC): Finds the minimum capacity assignment for full restorability to dual-failures (R2=100%) • Dual Failure Max Restorability (DFMR) Finds the spare capacity placement that maximizes the average restorability to dual-failures for a given spare capacity budget • Multi-service Restorability Capacity Placement (MRCP) Finds the minimum capacity assignment and routing that serves demands of multiple service classes including R0 (best-effort), R1 and R2-assured restorability service classes

Complete Dual Failure Restorability at Minimum Capacity (DFMC) • Minimize: • Total Cost of Capacity • Subject to: • (1) All demands are routed • (2) Working capacity supports (1) • (3) Restoration flows for 100% span restoration in the presence of each other span failure • (4) Spare capacity to support (3) • Note: This is with spare capacity reuse / sharing across non-simultaneous failure scenarios implicit in all cases

Dual Failure Maximum Restorability at Given Capacity (DFMR) • Minimize: • Total No. of Un-restorable Working Channels over all dual failure scenarios • subject to: • (1) All demands are routed • (2) Working capacity supports (1) • (3) Spare capacity less than an allowed Budget • (4) Restoration flows as feasible under (4) for all span failures in the presence of each other span failure

Multi-service Restorability Design at Minimum Capacity (MRCP) • Define: • “R1” , “R2” (and also “R0”)-restorable service demand matrices • Minimize: • Total Capacity • subject to: • (1) All demands are routed, (2) Working capacity supports (1) • (3) Restoration flows for all dual span failure scenarios for “R2” demands • (4) Restoration flows for all single span failures for all “R1” demands • (5) Spare capacity to support (3) and (4)

Large capacity increases are required to provide strictly 100% R2 Results with DFMC (Cost for R2=1 by design) • BENCHMARK: Cost of designing for full dual-failure restorability • Interpretation: Although average dual-failure restorability levels are quite high with a R1 design, the capacity cost for making the network restorable to all dual failures is extremely high, (~ 3 x in spare capacity relative to R1=1 design)

Results with DFMR (Acheivable R2 vs. Cost) • Trade-off between capacity and best acheivable dual-failure restorability: high capacity requirement as R2 =1 is approached (confirms DFMC results) Pure Redistribution of capacity “Budget amount”

Results with MRCP (MultiRestorability Service Class Design) • Results of MRCP confirm that R2 restorability can be guaranteed end to end for selected service paths: Up to about 20% of demands can be guaranteed R2 =1restorability for a small or negligible capacity increase

Concluding Insights and Comments • Designing for 100% Dual-failure restorability is feasible but very expensive • DFMR design method can maximize the network average dual failure restorability (R2) given any total budget for capacity. • MRCP design can structure and enhance the R2 ability of an R1-designed network onto specific priority paths: • 20 to 40% of all demands per O-D pair could be in this “platinum” service class at very little or no extra capacity cost. • And note ! Such R2-restorable service paths will have availability that exceeds that of 1+1 APS...

Normal First failure -> protection First failure -> protection Second failure-> restoration ! (adaptive) no outage yet R2(ij) >0 Second failure -> outage R2(ij) =0 “Takes a licking and keeps on ticking” :-) A Key Insight: why priority services in a “mesh-restorable” will network get better than 1+1 APS availability 1+1 APS “1F-P 2F-R” mesh (for a priority path) Normal

Matthieu Clouqueur, Wayne D. Grover (presenter) clouqueur@trlabs, grover@trlabs