Dependability & Maintainability Theory and Methods Part 2: Repairable systems: Availability

Andrea Bobbio Dipartimento di Informatica Università del Piemonte Orientale, “A. Avogadro” 15100 Alessandria (Italy) bobbio@unipmn.it - http://www.mfn.unipmn.it/~bobbio/IFOA/ Dependability & Maintainability Theory and MethodsPart 2: Repairable systems: Availability IFOA, Reggio Emilia, June 17-18, 2003 Reggio Emilia, June 17-18, 2003

Repairable systems X 1 X 2 X 3 UP ••••• DOWN Y 1 Y 2 t • X 1, X 2 …. X n Successive UP times • Y1, Y 2 …. Y n Successive DOWN times Reggio Emilia, June 17-18, 2003

Repairable systems • The usual hypothesis in modeling repairable systems is that: • The successive UP times X 1, X 2 …. X n are i.i.d. random variable: i.e. samples from a common cdf F (t) • The successive DOWN times Y1, Y 2 …. Y nare i.i.d. random variable: i.e. samples from a common cdf G (t) Reggio Emilia, June 17-18, 2003

Repairable systems X 1 X 2 X 3 UP ••••• DOWN Y 1 Y 2 t • The dynamic behaviour of a repairable system is characterized by: • the r.v. X of the successive up times • the r.v. Y of the successive down times Reggio Emilia, June 17-18, 2003

Maintainability Let Ybe the r.v. of the successive down times: • G(t) = Pr { Y  t } (maintainability) • d G(t) • g (t) = ——— (density) • dt • g(t) • h g(t) = ———— (repair rate) • 1 - G(t) • MTTR = t g(t) dt (Mean Time To Repair)  0 Reggio Emilia, June 17-18, 2003

Availability The measure to characterize a repairable system is the availability (unavailability): The availability A(t) of an item at time t is the probability that the item is correctly working at time t. Reggio Emilia, June 17-18, 2003

Availability The measure to characterize a repairable system is the availability (unavailability): • A(t) = Pr { time t, system = UP } • U(t) = Pr { time t, system = DOWN } • A(t) + U(t) = 1 Reggio Emilia, June 17-18, 2003

Definition of Availability • An important difference between reliability and availability is: • reliability refers to failure-free operation during an interval(0 — t) ; • availability refers to failure-free operation at a given instant of timet(the time when a device or system is accessed to provide a required function), independently on the number of cycles failure/repair. Reggio Emilia, June 17-18, 2003

Definition of Availability I(t) 1 Failed and being restored Operating and providing a required function Operating and providing a required function 0 t 1 working 0 failed I(t) indicator function System Failure and Restoration Process Reggio Emilia, June 17-18, 2003

In the special case when times to failure and times to restoration are both exponentially distributed, the alternating process can be viewed as a two-state homogeneous Continuous Time Markov Chain Availability evaluation Time-independent failure rate  Time-independent repair rate  Reggio Emilia, June 17-18, 2003

UP 1 DN 0 2-State Markov Availability Model • Transient Availability analysis: • for each state, we apply a flow balance equation: • Rate of buildup = rate of flow IN - rate of flow OUT Reggio Emilia, June 17-18, 2003

UP 1 DN 0 2-State Markov Availability Model Reggio Emilia, June 17-18, 2003

2-State Markov Availability Model 1 A(t) Ass= Reggio Emilia, June 17-18, 2003

2-State Markov Model 1) Pointwise availabilityA(t) : 2) Steady state availability: limiting value as • If there is no restoration (=0) the availability • becomes the reliability A(t) = R(t) = Reggio Emilia, June 17-18, 2003

Steady-state Availability • Steady-state availability: • In many system models, the limit: • exists and is called the steady-state availability The steady-state availability represents the probability of finding a system operational after many fail-and-restore cycles. Reggio Emilia, June 17-18, 2003

Steady-state Availability 1 0 UP DOWN t Expected UP timeE[U(t)] = MUT = MTTF Expected DOWN timeE[D(t)] = MDT = MTTR Reggio Emilia, June 17-18, 2003

Availability: Example (I) • Let a system have a steady state availability • Ass = 0.95 • This means that, given a mission time T, it is expected that the system works correctly for a total time of: 0.95*T. • Or, alternatively, it is expected that the system is out of service for a total time: • Uss * T = (1- Ass) * T Reggio Emilia, June 17-18, 2003

Availability: Example (II) • Let a system have a rated productivity of W $/year. • The loss due to system out of service can be estimated as: • Uss * W = (1- Ass) * W • The availability (unavailability) is an index to estimate the real productivity, given the rated productivity. • Alternatively, if the goal is to have a net productivity of W $/year, the plant must be designed such that its rated productivity W’ should satisfy: • Uss * W’ = W Reggio Emilia, June 17-18, 2003

Availability • We can show that: • This result is valid without making any assumptions on the form of the distributions of times to failure & times to repair. • Also: Reggio Emilia, June 17-18, 2003

Motivation – High Availability Reggio Emilia, June 17-18, 2003

Maintainability • MDT (Mean Down Time or MTTR - mean time to restoration). • The total down time (Y ) consists of: • Failure detection time • Alarm notification time • Dispatch and travel time of the repair person(s) • Repair or replacement time • Reboot time Reggio Emilia, June 17-18, 2003

Maintainability • The total down time (Y ) consists of: • Logistic (passive) time • Administrative times • Dispatch and travel time of the repair person(s) • Waiting time for spares, tools … • Effective restoration (active) time • Access and diagnosis time • Repair or replacement time • Testand reboot time Reggio Emilia, June 17-18, 2003

Logistics • Logistic times depend on the organization of the assistance service: • Number of crews; • Dislocation of tools and storehouses; • Number of spare parts. Reggio Emilia, June 17-18, 2003

The number of spares Reggio Emilia, June 17-18, 2003

Maintenance Costs • The total cost of a maintenance action consists of: • Cost of spares and replaced parts • Cost of person/hours for repair • Down-time cost (loss of productivity) • The down-time cost (due to a loss of productivity) can be the most relevant cost factor. Reggio Emilia, June 17-18, 2003

Maintenance Policy • Is the sequence of actions that minimizes the total cost related to a down time: • Reactive maintenance: maintenance action is triggered by a failure. • Proactive maintenance: preventive maintenance policy. Reggio Emilia, June 17-18, 2003

Life Cycle Cost Reggio Emilia, June 17-18, 2003

Dependability & Maintainability Theory and Methods Part 2: Repairable systems: Availability