360 likes | 503 Views
Optimal redundancy allocation for information technology disaster recovery in the network economy. Benjamin B.M. Shao IEEE Transaction on Dependable and Secure Computing, Vol. 2, NO. 3, July-September 2005 Presented by: Derek KD Jiang 江坤道. Agenda. Introduction
E N D
Optimal redundancy allocation for information technology disaster recovery in the network economy Benjamin B.M. Shao IEEE Transaction on Dependable and Secure Computing, Vol. 2, NO. 3, July-September 2005 Presented by: Derek KD Jiang 江坤道
Agenda • Introduction • Redundancy for IT disaster recovery • Redundancy allocation model • Solution procedure • Examples • Conclusion
Introduction • Modern organizations have become increasingly reliant on IT to facilitate business operation. • The issue of how to strengthen IT capability so that a company can prevent or quickly recover from disasters becomes a serious concern.
Introduction • Perform a impact analysis to: • Identify the disasters likely occur in the environment. • Evaluate the degree to which IT are vulnerable to sustain. • Take necessary measures to protect those IT functions according the importance. • This paper incorporate redundancy into critical IT functions and aims to maximize the survivability against potential disasters.
Introduction • Adopting cluster-centric approach, this paper concentrate on managing resources around independent clusters IT functions where each cluster is assigned its own dedicated solutions. • An optimization model is proposed, taking into account the significance of IT functions, the cost of IT solutions, and the availability of resources subject to budget limitation.
Agenda • Introduction • Redundancy for IT disaster recovery • Redundancy allocation model • Solution procedure • Examples • Conclusion
Redundancy for IT disaster recovery • Redundancy is a design principle of having one or more backup systems in case of failure of the main system. • The use of redundancy in preparation for disasters is of potential advantage due to two aspects. • Proactive prevention • Reactive recovery
Redundancy for IT disaster recovery • The objective is to select among competing alternatives for redundancy level and reap the best returns from a limited budget. • A quantitative model can provide the guidelines for allocating optimal redundancy levels to critical IT functions needing to be protected.
Agenda • Introduction • Redundancy for IT disaster recovery • Redundancy allocation model • Solution procedure • Examples • Conclusion
Redundancy allocation model • Suppose an organization is planning for taking measures of redundancy, and the budget is limited. • Several possible disasters have been identified with the potential to affect IT functions and to cause business discontinuity. • How to allocate redundancy to IT functions such that survivability is maximized and the cost still remains under budget?
Redundancy allocation model • The redundancy allocation problem (RAP) is formulated below
Redundancy allocation model • Survivability Smid in this context is defined as the likelihood of IT asset i to withstand disaster d and to ensure IT function m remains operational. IT function m fails against disaster d only when all of its selected solutions fail at the same time. In other words, as long as one of the selected solutions survives the disaster, IT function would be in operation.
Redundancy allocation model • Ensures that at least one solution is selected and allocated to each IT function. Notably, IT function without redundancy is allowable. • Indicates that the total costs can’t exceed the budget limit B.
Agenda • Introduction • Redundancy for IT disaster recovery • Redundancy allocation model • Solution procedure • Examples • Conclusion
Solution procedure • The proposed model is a 0-1 integer programming problem with a nonlinear objective function. • Due to the nonlinearity of the objective function, LR cannot be employed to tackle this problem. • A partial enumeration procedure based on probabilistic dynamic programming is presented.
The sum of failure probabilities of each IT function due to any disasters. Solution procedure
The recursive formula, where m < M Solution procedure • We define a state of system T as the available budget and stage m as IT function. • Let be the failure rate of the system composed of IT functions m, m+1,…, M.
Solution procedure • For stage (IT function) m, state (budget) T cannot exceed the total available budget B minus the minimum costs to be allocated for stage 1,…, m-1. • T must be at least equal to the cost of the least expensive solution in the current stage to ensure at least one solution for IT function m. For T not in the range, Fm(T) is defined as 1, so it won’t be chosen.
Solution procedure • Fm(T) of (4) deals with the risks of disaster occurrence and involves the calculation of expected failure rate of IT function m according to the remaining budget T. • The initial stage m=M and,
Solution procedure • The optimal objective function value F* is obtained as F1(B), representing the minimum overall failure rate of the whole system composed of all M IT functions with a budget of B. • The original maximum overall survivability S* of RAP is then equal to 1 - F1(B).
Agenda • Introduction • Redundancy for IT disaster recovery • Redundancy allocation model • Solution procedure • Examples • Conclusion
Example • Two LANs (M=2) with weight w1= 0.3, w2=0.7 respectively. • Flooding disaster that occurs with a likelihood of 0.05 (i.e., p1=0.05, p2=0.95 for no disaster). • It considers incorporating redundant bridges into LAN1 and redundant switches into LAN2 with a budget B=14.
Example • For LAN1 • Four types of bridges are available (n1=4), with C11=8, C12=2, C13=4, and C14=6. • The survival rates are S111=0.1, S121=0.09, S131=0.15, and S141=0.21 (i.e., v111=0.9, v121=0.91, v131=0.85, v141=0.79). • Their availabilities when no disaster occurs are S112=0.9999, S122=0.9993, S132=0.9997, and S142=0.9995 (i.e., v112=0.0001, v122=0.0007, v132=0.0004, v142=0.0005).
Example • For LAN2 • Three types of switches are available (n2=3), with C21=4, C22=6, and C23=5. • The survival rates are S211=0.06, S221=0.1, S231=0.2 (i.e., v211=0.94, v221=0.9, v231=0.8). • Their availabilities when no disaster occurs are S212=0.9994, S122=0.9990, S132=0.9996 (i.e., v212=0.0006, v222=0.0010, v232=0.0004)
Example • Starts with stage=2 • Since the least expensive switch for LAN2 has cost C21=4, and the least expensive bridge for LAN1 has cost C12=2, the valid range for T is . • Equation (6) then calculate F2(T) for T=4,…, 12. Take F2(6) for example: (X21, X22, X23)=(0, 0, 1), (0, 1, 0), (1, 0, 0). The minimum F2(T) = 0.02827 is associated with (0, 0, 1).
Example • Next, we proceed to find the optimal solution F1(14) in the final stage m=1. The minimum F1(14) is associated with (X11, X12 , X13, X14) = (0, 1, 0, 1), with F* = 0.03905 using F2(6) = 0.02827. Namely, the maximum survivability S* against flooding equal 1 – F* = 1 – 0.03905 = 0.96095.
Agenda • Introduction • Redundancy for IT disaster recovery • Redundancy allocation model • Solution procedure • Examples • Conclusion
Conclusion • Contributions • It presents one of the earliest quantitative studies to allocate redundancy for recovery planning. • An exact solution method based on probabilistic dynamic programming is presented to help obtain optimal solution of redundancy allocation. • Through sensitivity analysis, the model can further help IT managers make betters decisions.
Conclusion • IT plays an extremely important role in modern business operations, nevertheless, it has potential vulnerabilities against disasters. • RAP redundant allocation model proposed in this paper can fulfill the need for a structured decision analysis of recovery planning.
Conclusion • For future research, we can further categorize assets into hardware, software, and other types to examine the impacts of each asset type on the redundancy allocation decisions. • Specific assumptions of dependent IT functions or shared solutions can be made to address a different set of IT disaster recovery problems.