1 / 14

Fault-Tolerant Computing Systems #4 Reliability and Availability

Fault-Tolerant Computing Systems #4 Reliability and Availability. Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th. Reliability and Availability. Reliability The probability that a system survives till time t (it has not fail till t )

nuneza
Download Presentation

Fault-Tolerant Computing Systems #4 Reliability and Availability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fault-Tolerant Computing Systems#4Reliability and Availability Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th

  2. Reliability and Availability • Reliability • The probability that a system survives till time t (it has not fail till t) • Availability • The probability that a system works properly at time t

  3. Preliminaries of Probability • Discrete sample space: • Tossing a coin • {head, tail} sample space • Continuous sample space: • How long the pc stays up after reboot • {t | t>0} sample space • Random variable • A function mapping each element of sample space to a real number • Ex. heads=1, tails=0

  4. Preliminaries • Random variable • A function mapping each element of sample space to a real number • CDF (Cumulative distributed function) • FX(t) = Pr [X≤t] Pr : probability that the system has gone down by time t • Pdf (Probability density function) • f(t) = dF(t) / dx • Expected Value, Mean • E[X] = 0t f(t)dt (X≥0) • Average outcome of the random experiment expect value, mean of a random variable

  5. Exponential Distribution The most commonly used distribute function in reliability modeling. • CDF • F(t) = 1 – e-lt • pdf • f(t) = l e-lt • Mean • 1/l • Memoryless property • Y = X – t • Gt(y) = Pr [Y≤ y | X > t ] = 1 – e-ly • Distribute of remaining life of a component does not depend on how long it has been working. • The component does not AGE ! (remaining life of X does not depend on the time that has passed) f(t) = 2e-2t F(t) = 1 – e-2t

  6. Reliability • Reliability • The probability that a system survives till time t • R(t) = Pr [X > t] = 1 – F(t) • X : Random probability variable X which represents a time to failure of the system (the life of the system) • R(t): represents probability that the system survives till time t F(t) = exponential Distribution F(t) = 1 – e-2t R(t) = e-2t t time 0 time t X

  7. Reliability • Reliability • R(t) = Pr [X > t] = 1 – F(t) • R(0) = 1 The system is initially working • R() = 0 No system has infinite lifetime F(t) = exponential Distribution R(t) = reliability F(t) = 1 – e-2t R(t) = e-2t t time 0 time t X

  8. = Failure Rate Probability that fault will occur in an interval time [t, t+Dt] • f(t)Dt • Probability that fault will occur in time [t, t+Dt] • f(t)Dt / R(t) • Probability of occurrence of fault at time [t, t+Dt], when the system is working properly at t • Failure Rate f(t) / R(t) f(t) = probability of fault F(t) = exponential Distribution R(t) = reliability f(t) = 2e-2t R(t) = e-2t F(t) = 1 – e-2t [t, t+Dt]

  9. Bathtub Curve • Failure Rate • f(t) / R(t) • Bathtub Curve • General Failure Rate observed from the empirical data collected from mechanical and electronic component • When lifetime of a system F(t)is exponential distribution,it has a constant Failure Rate (see previous slide) 2.constant failure rate • 3.last stage: • faults caused by age • 1.Initial stage: • Inherit defects • faulty design

  10. MTTF (Mean Time To Failure) • MTTF • E[X] = 0t f(t)dt = 0R(t)dt • X: theExpected valueof the probability variable which represents time till fault occurs in the system • When R(t) = e-lt (Xis exponential distribution) • Failure Rate = l • MTTF = 1 / l time 0 expected value

  11. Availability • The probability that a system works properly at time t • Availability is a measure that is frequently used for describing the behavior of the system • *If the system has no repair or replacement, availability is equal to reliability R(t) • R(t): the probability that no failures have occurred during the whole period (0,t) fails repairs fails repairs Operational Under repair Operational t Xi Xi+1 Xi+2 Ui Ui+1

  12. Availability • Instantaneous availability (ทันทีทันใด) • A(t) = Pr [probability that the component is functioning correctly att ] • Steady-State Availability (general meaning) • A = limt→∞ A(t) fails repairs fails repairs t Xi Xi+1 Xi+2 Ui Ui+1

  13. Availability • When Xi, Ui is exponential distribution • FXi(t) = 1 – e-lt, FUi(t) = 1 – e-mt • Instantaneous Availability A(t) = (m +le-(l+m)t )/(m+ l) • Steady-State Availability A = limt→∞ A(t) = m/(m+ l) t Xi Xi+1 Xi+2 Ui Ui+1

  14. MTTR (Mean Time To Repair) • MTTR (mean time to repair) • MTTR = E [ Ui ] Ui : the random variable that represents the downtime for i th repair or replacement E[Ui] : theExpected valueof Ui • MTTF (mean time to failure) • MTTF = E [ Xi ] Xi : the random variable that represents the duration of the i th function period. E[Xi] : theExpected valueof Xi • Steady-State Availability A = MTTF / (MTTF+MTTR) = m/(m+ l) (Xi,Ui is the exponential distribution of parameter l,m) t Xi Xi+1 Xi+2 Ui Ui+1

More Related