410 likes | 1.13k Views
Basic Concepts Reliability, MTTF, Availability, etc. Definitions. Reliability of a system is defined to be the probability that the given system will perform its required function under specified conditions for a specified period of time.
E N D
Basic ConceptsReliability, MTTF, Availability, etc. CprE 545: Fault Tolerant Systems (G. Manimaran)
Definitions • Reliability of a system is defined to be the probability that the given system will perform its required function under specified conditions for a specified period of time. • MTBF (Mean Time Between Failures): Average time a system will run between failures. The MTBF is usually expressed in hours. This metric is more useful to the user than the reliability measure. CprE 545: Fault Tolerant Systems (G. Manimaran)
Approaches to increase the reliability of a system Increasing reliability of a system • Worst case design • Using high quality components • Strict quality control procedures • Redundancy • Typically employed • Less expensive CprE 545: Fault Tolerant Systems (G. Manimaran)
Reliability expressions • Exponential Failure Law: • Reliability of a system is often modeled as: • R(t) = exp(-λt) • where λ is the failure rate expressed as percentage failures per 1000 hours or as failures per hour. • When the product “λt” is small, • R(t) = 1 - λt CprE 545: Fault Tolerant Systems (G. Manimaran)
Relation between MTBF and the Failure rate • MTBF is the average time a system will run between failures and is given by: • MTBF = ∫0 R(t) dt = ∫0 exp(-λt) dt = 1 / λ • In other words, the MTBF of a system is the reciprocal of the failure rate. • If “λ” is the number of failures per hour, the MTBF is expressed in hours. ∞ ∞ CprE 545: Fault Tolerant Systems (G. Manimaran)
A simple example • A system has 4000 components with a failure rate of 0.02% per 1000 hours. Calculate λ and MTBF. • λ = (0.02 / 100) * (1 / 1000) * 4000 = 8 * 10-4 failures/hour • MTBF = 1 / (8 * 10-4 ) = 1250 hours CprE 545: Fault Tolerant Systems (G. Manimaran)
Relation between Reliability and MTBF • R(t) = (1 – λt) = (1 – t / MTBF) • Therefore, • MTBF = t / (1 – R(t)) 1.0 0.8 Reliability R(t) 0.6 0.4 0.36 0.2 0 2 MTBF 1 MTBF Time t CprE 545: Fault Tolerant Systems (G. Manimaran)
An example • A first generation computer contains 10000 components each with λ = 0.5%/(1000 hours). What is the period of 99% reliability? • MTBF = t / (1 – R(t)) = t / (1 – 0.99) • t = MTBF * 0.01 = 0.01 / λav • Where λav is the average failure rate • N = No. of components = 10000 • λ= failure rate of a component • = 0.5% / (1000 hours) = 0.005/1000 = 5 * 10-6 per hour • Therefore, λav = N λ = 10000 * 5 * 10-6 = 5 * 10-2 per hour • Therefore, t = 0.01 / (5 * 10-2 ) = 12 minutes CprE 545: Fault Tolerant Systems (G. Manimaran)
Reliability for different configurations 1. Series Configuration 1 2 3 4 N R R R R R Overall reliability = Ro = R * R * R…. R = RN 2. Parallel Configuration 1 R Ro = 1 – (probability that all of the components fail) Ro = 1 – (1 - R)N 2 R N R CprE 545: Fault Tolerant Systems (G. Manimaran)
Reliability for different configurations 3. Hybrid Configuration 1 R 1 2 N 2 R R R R M R Overall reliability = Ro = ? CprE 545: Fault Tolerant Systems (G. Manimaran)
Reliability for different configurations 4. Triple Modular Redundancy (TMR) 1 R 2 Voting R M R Overall reliability = Ro = [3C2 * R2 * (1-R)] + [R3] CprE 545: Fault Tolerant Systems (G. Manimaran)
B A C E F D B A E F D Reliability calculation – a more complicated example R = Rc Rs2 + (1-Rc) Rs1 System Assuming C is faulty S1 B E Assuming C is fault free A F D S2 Rs1 can be calculated using parallel series formulae Needs further reduction
B B A F B A E F D A F D Rs2 = RE Rs3 + (1-RE) Rs4 S2 Assuming E is faulty S4 Assuming E is fault free D A F S3 S3
Maintainability • Maintainability of a system is the probability of isolating and repairing a “fault” in the system within a given time. • Maintainability is given by: • M(t) = 1 – exp(-µt) • Where µ is the repair rate • And t is the permissible time constraint for the maintenance action • µ = 1/(Mean Time To Repair) = 1/MTTR • M(t) = 1 – exp(-t/MTTR) CprE 545: Fault Tolerant Systems (G. Manimaran)
Availability • Availability of a system is the probability that the system will be functioning according to expectations at any time during its scheduled working period. • Availability = System up-time / (System up-time + System down-time) • System down-time = No. of failures * MTTR • System down-time = System up-time * λ * MTTR • Therefore, • Availability = System up-time / (System up-time + (System up-time * λ * MTTR) • = 1 / (1 + (λ *MTTR) • Availability = MTBF / (MTBF + MTTR) CprE 545: Fault Tolerant Systems (G. Manimaran)