160 likes | 175 Views
Fault-Tolerant Computing Systems #4 Reliability and Availability. Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th. Reliability and Availability. Reliability The probability that a system survives till time t (it has not fail till t )
E N D
Fault-Tolerant Computing Systems#4Reliability and Availability Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th
Reliability and Availability • Reliability • The probability that a system survives till time t (it has not fail till t) • Availability • The probability that a system works properly at time t
Preliminaries of Probability • Discrete sample space: • Tossing a coin • {head, tail} sample space • Continuous sample space: • How long the pc stays up after reboot • {t | t>0} sample space • Random variable • A function mapping each element of sample space to a real number • Ex. heads=1, tails=0
Preliminaries • Random variable • A function mapping each element of sample space to a real number • CDF (Cumulative distributed function) • FX(t) = Pr [X≤t] Pr : probability that the system has gone down by time t • Pdf (Probability density function) • f(t) = dF(t) / dx • Expected Value, Mean • E[X] = 0t f(t)dt (X≥0) • Average outcome of the random experiment expect value, mean of a random variable
Exponential Distribution The most commonly used distribute function in reliability modeling. • CDF • F(t) = 1 – e-lt • pdf • f(t) = l e-lt • Mean • 1/l • Memoryless property • Y = X – t • Gt(y) = Pr [Y≤ y | X > t ] = 1 – e-ly • Distribute of remaining life of a component does not depend on how long it has been working. • The component does not AGE ! (remaining life of X does not depend on the time that has passed) f(t) = 2e-2t F(t) = 1 – e-2t
Reliability • Reliability • The probability that a system survives till time t • R(t) = Pr [X > t] = 1 – F(t) • X : Random probability variable X which represents a time to failure of the system (the life of the system) • R(t): represents probability that the system survives till time t F(t) = exponential Distribution F(t) = 1 – e-2t R(t) = e-2t t time 0 time t X
Reliability • Reliability • R(t) = Pr [X > t] = 1 – F(t) • R(0) = 1 The system is initially working • R() = 0 No system has infinite lifetime F(t) = exponential Distribution R(t) = reliability F(t) = 1 – e-2t R(t) = e-2t t time 0 time t X
= Failure Rate Probability that fault will occur in an interval time [t, t+Dt] • f(t)Dt • Probability that fault will occur in time [t, t+Dt] • f(t)Dt / R(t) • Probability of occurrence of fault at time [t, t+Dt], when the system is working properly at t • Failure Rate f(t) / R(t) f(t) = probability of fault F(t) = exponential Distribution R(t) = reliability f(t) = 2e-2t R(t) = e-2t F(t) = 1 – e-2t [t, t+Dt]
Bathtub Curve • Failure Rate • f(t) / R(t) • Bathtub Curve • General Failure Rate observed from the empirical data collected from mechanical and electronic component • When lifetime of a system F(t)is exponential distribution,it has a constant Failure Rate (see previous slide) 2.constant failure rate • 3.last stage: • faults caused by age • 1.Initial stage: • Inherit defects • faulty design
MTTF (Mean Time To Failure) • MTTF • E[X] = 0t f(t)dt = 0R(t)dt • X: theExpected valueof the probability variable which represents time till fault occurs in the system • When R(t) = e-lt (Xis exponential distribution) • Failure Rate = l • MTTF = 1 / l time 0 expected value
Availability • The probability that a system works properly at time t • Availability is a measure that is frequently used for describing the behavior of the system • *If the system has no repair or replacement, availability is equal to reliability R(t) • R(t): the probability that no failures have occurred during the whole period (0,t) fails repairs fails repairs Operational Under repair Operational t Xi Xi+1 Xi+2 Ui Ui+1
Availability • Instantaneous availability (ทันทีทันใด) • A(t) = Pr [probability that the component is functioning correctly att ] • Steady-State Availability (general meaning) • A = limt→∞ A(t) fails repairs fails repairs t Xi Xi+1 Xi+2 Ui Ui+1
Availability • When Xi, Ui is exponential distribution • FXi(t) = 1 – e-lt, FUi(t) = 1 – e-mt • Instantaneous Availability A(t) = (m +le-(l+m)t )/(m+ l) • Steady-State Availability A = limt→∞ A(t) = m/(m+ l) t Xi Xi+1 Xi+2 Ui Ui+1
MTTR (Mean Time To Repair) • MTTR (mean time to repair) • MTTR = E [ Ui ] Ui : the random variable that represents the downtime for i th repair or replacement E[Ui] : theExpected valueof Ui • MTTF (mean time to failure) • MTTF = E [ Xi ] Xi : the random variable that represents the duration of the i th function period. E[Xi] : theExpected valueof Xi • Steady-State Availability A = MTTF / (MTTF+MTTR) = m/(m+ l) (Xi,Ui is the exponential distribution of parameter l,m) t Xi Xi+1 Xi+2 Ui Ui+1