1 / 15

Basic Concepts Reliability, MTTF, Availability, etc.

Basic Concepts Reliability, MTTF, Availability, etc. Definitions. Reliability of a system is defined to be the probability that the given system will perform its required function under specified conditions for a specified period of time.

cece
Download Presentation

Basic Concepts Reliability, MTTF, Availability, etc.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic ConceptsReliability, MTTF, Availability, etc. CprE 545: Fault Tolerant Systems (G. Manimaran)

  2. Definitions • Reliability of a system is defined to be the probability that the given system will perform its required function under specified conditions for a specified period of time. • MTBF (Mean Time Between Failures): Average time a system will run between failures. The MTBF is usually expressed in hours. This metric is more useful to the user than the reliability measure. CprE 545: Fault Tolerant Systems (G. Manimaran)

  3. Approaches to increase the reliability of a system Increasing reliability of a system • Worst case design • Using high quality components • Strict quality control procedures • Redundancy • Typically employed • Less expensive CprE 545: Fault Tolerant Systems (G. Manimaran)

  4. Reliability expressions • Exponential Failure Law: • Reliability of a system is often modeled as: • R(t) = exp(-λt) • where λ is the failure rate expressed as percentage failures per 1000 hours or as failures per hour. • When the product “λt” is small, • R(t) = 1 - λt CprE 545: Fault Tolerant Systems (G. Manimaran)

  5. Relation between MTBF and the Failure rate • MTBF is the average time a system will run between failures and is given by: • MTBF = ∫0 R(t) dt = ∫0 exp(-λt) dt = 1 / λ • In other words, the MTBF of a system is the reciprocal of the failure rate. • If “λ” is the number of failures per hour, the MTBF is expressed in hours. ∞ ∞ CprE 545: Fault Tolerant Systems (G. Manimaran)

  6. A simple example • A system has 4000 components with a failure rate of 0.02% per 1000 hours. Calculate λ and MTBF. • λ = (0.02 / 100) * (1 / 1000) * 4000 = 8 * 10-4 failures/hour • MTBF = 1 / (8 * 10-4 ) = 1250 hours CprE 545: Fault Tolerant Systems (G. Manimaran)

  7. Relation between Reliability and MTBF • R(t) = (1 – λt) = (1 – t / MTBF) • Therefore, • MTBF = t / (1 – R(t)) 1.0 0.8 Reliability R(t) 0.6 0.4 0.36 0.2 0 2 MTBF 1 MTBF Time t CprE 545: Fault Tolerant Systems (G. Manimaran)

  8. An example • A first generation computer contains 10000 components each with λ = 0.5%/(1000 hours). What is the period of 99% reliability? • MTBF = t / (1 – R(t)) = t / (1 – 0.99) • t = MTBF * 0.01 = 0.01 / λav • Where λav is the average failure rate • N = No. of components = 10000 • λ= failure rate of a component • = 0.5% / (1000 hours) = 0.005/1000 = 5 * 10-6 per hour • Therefore, λav = N λ = 10000 * 5 * 10-6 = 5 * 10-2 per hour • Therefore, t = 0.01 / (5 * 10-2 ) = 12 minutes CprE 545: Fault Tolerant Systems (G. Manimaran)

  9. Reliability for different configurations 1. Series Configuration 1 2 3 4 N R R R R R Overall reliability = Ro = R * R * R…. R = RN 2. Parallel Configuration 1 R Ro = 1 – (probability that all of the components fail) Ro = 1 – (1 - R)N 2 R N R CprE 545: Fault Tolerant Systems (G. Manimaran)

  10. Reliability for different configurations 3. Hybrid Configuration 1 R 1 2 N 2 R R R R M R Overall reliability = Ro = ? CprE 545: Fault Tolerant Systems (G. Manimaran)

  11. Reliability for different configurations 4. Triple Modular Redundancy (TMR) 1 R 2 Voting R M R Overall reliability = Ro = [3C2 * R2 * (1-R)] + [R3] CprE 545: Fault Tolerant Systems (G. Manimaran)

  12. B A C E F D B A E F D Reliability calculation – a more complicated example R = Rc Rs2 + (1-Rc) Rs1 System Assuming C is faulty S1 B E Assuming C is fault free A F D S2 Rs1 can be calculated using parallel series formulae Needs further reduction

  13. B B A F B A E F D A F D Rs2 = RE Rs3 + (1-RE) Rs4 S2 Assuming E is faulty S4 Assuming E is fault free D A F S3 S3

  14. Maintainability • Maintainability of a system is the probability of isolating and repairing a “fault” in the system within a given time. • Maintainability is given by: • M(t) = 1 – exp(-µt) • Where µ is the repair rate • And t is the permissible time constraint for the maintenance action • µ = 1/(Mean Time To Repair) = 1/MTTR • M(t) = 1 – exp(-t/MTTR) CprE 545: Fault Tolerant Systems (G. Manimaran)

  15. Availability • Availability of a system is the probability that the system will be functioning according to expectations at any time during its scheduled working period. • Availability = System up-time / (System up-time + System down-time) • System down-time = No. of failures * MTTR • System down-time = System up-time * λ * MTTR • Therefore, • Availability = System up-time / (System up-time + (System up-time * λ * MTTR) • = 1 / (1 + (λ *MTTR) • Availability = MTBF / (MTBF + MTTR) CprE 545: Fault Tolerant Systems (G. Manimaran)

More Related