Continuous Random Variables

Probability and Statistics with Reliability, Queuing and Computer Science Applications: Chapter 3 Continuous Random Variables

Definitions • Distribution function: • If FX(x) is a continuous function of x, then X is a continuous random variable. • FX(x): discrete in x  Discrete rv’s • FX(x): piecewise continuous Mixed rv’s

Definitions (Continued) Equivalence: • CDF (cumulative distribution function) • PDF (probability distribution function) • Distribution function • FX(x) or FX(t) or F(t)

Probability Density Function (pdf) • X : continuous rv, then, • pdf properties:

Definitions(Continued) • Equivalence: pdf • probability density function • density function • density • f(t) = For a non-negative random variable

Exponential Distribution • Arises commonly in reliability & queuing theory. • A non-negative random variable • It exhibits memoryless (Markov) property. • Related to (the discrete) Poisson distribution • Interarrival time between two IP packets (or voice calls) • Time to failure, time to repair etc. • Mathematically (CDF and pdf, respectively):

CDF of exponentially distributed random variable with  = 0.0001 F(t) 12500 25000 37500 50000 t

Exponential Density Function (pdf) f(t) t

Memoryless property • Assume X > t. We have observed that the component has not failed until time t. • Let Y = X - t , the remaining (residual) lifetime • The distribution of the remaining life, Y, does not depend on how long the component has been operating. Distribution of Y is identical to that of X.

Memoryless property • Assume X > t. We have observed that the component has not failed until time t. • Let Y = X - t , the remaining (residual) lifetime

Memoryless property (Continued) • Thus Gt(y) is independent of t and is identical to the original exponential distribution of X. • The distribution of the remaining life does not depend on how long the component has been operating. • Its eventual breakdown is the result of some suddenly appearing failure, not of gradual deterioration.

Reliability as a Function of Time • Reliability R(t): failure occurs after time ‘t’. Let X be the lifetime of a component subject to failures. • Let N0: total no. of components (fixed); Ns(t): surviving ones; Nf(t): failed oneby time t.

Definitions (Continued) Equivalence: • Reliability • Complementary distribution function • Survivor function • R(t) = 1 -F(t)

Failure Rate or Hazard Rate • Instantaneous failure rate: h(t) (#failures/10k hrs) • Let the rv X be EXP( λ). Then, • Using simple calculus the following apples to any rv,

Hazard Rate and the pdf h(t) t = Conditional Prob. system will fail in (t, t + t) given that it has survived until time t f(t) t = Unconditional Prob. System will fail in (t, t + t) • Difference between: • probability that someone will die between 90 and 91, given that he lives to 90 • probability that someone will die between 90 and 91

Weibull Distribution • Frequently used to model fatigue failure, ball bearing failure etc. (very long tails) • Reliability: • Weibull distribution is capable of modeling DFR (α < 1), CFR (α = 1) and IFR (α >1) behavior. • α is called the shape parameter and  is the scale parameter

Failure rate of the weibull distribution with various values of  and  = 1 5.0 1.0 2.0 3.0 4.0

Infant Mortality Effects in System Modeling • Bathtub curves • Early-life period • Steady-state period • Wear out period • Failure rate models

Until now we assumed that failure rate of equipment is time (age) independent. In real-life, variation as per the bathtub shape has been observed Bathtub Curve Failure Rate l(t) Infant Mortality (Early Life Failures) Wear out Steady State Operating Time

Early-life Period • Also called infant mortality phase or reliability growth phase • Caused by undetected hardware/software defects that are being fixed resulting in reliability growth • Can cause significant prediction errors if steady-state failure rates are used • Availability models can be constructed and solved to include this effect • Weibull Model can be used

Steady-state Period • Failure rate much lower than in early-life period • Either constant (age independent) or slowly varying failure rate • Failures caused by environmental shocks • Arrival process of environmental shocks can be assumed to be a Poisson process • Hence time between two shocks has the exponential distribution

Failure rate increases rapidly with age Properly qualified electronic hardware do not exhibit wear out failure during its intended service life (Motorola) Applicable for mechanical and other systems Weibull Failure Model can be used Wear out Period

Bathtub curve DFR phase: Initial design, constant bug fixes CFR phase: Normal operational phase IFR phase: Aging behavior h(t) (burn-in-period) (wear-out-phase) CFR (useful life) DFR IFR t Increasing fail. rate Decreasing failure rate

We use a truncated Weibull Model Infant mortality phase modeled by DFR Weibull and the steady-state phase by the exponential Failure Rate Models 7 6 5 4 3 2 1 0 Failure-Rate Multiplier 0 2,190 4,380 6,570 8,760 10,950 13,140 15,330 17,520 Operating Times (hrs)

This model has the form: where: steady-state failure rate is the Weibull shape parameter Failure rate multiplier = Failure Rate Models (cont.)

There are several ways to incorporate time dependent failure rates in availability models The easiest way is to approximate a continuous function by a decreasing step function Failure Rate Models (cont.) 7 6 5 4 3 2 1 0 Failure-Rate Multiplier 0 2,190 4,380 6,570 8,760 10,950 13,140 15,330 17,520 Operating Times (hrs)

Failure Rate Models (cont.) • Here the discrete failure-rate model is defined by:

Uniform Random Variable • All (pseudo) random generators generate random deviates of U(0,1) distribution; that is, if you generate a large number of random variables and plot their empirical distribution function, it will approach this distribution in the limit. • U(a,b) pdf constant over the (a,b) interval and CDF is the ramp function

Uniform density

Uniform distribution • The distribution function is given by: 0 , x < a, F(x)= , a < x < b 1 , x > b. {

Uniform distribution (Continued)

HypoExponential • HypoExp: multiple Exp stages in series. • 2-stage HypoExp denoted as HYPO(λ1, λ2). The density, distribution and hazard rate function are: • HypoExp results in IFR: 0  min(λ1, λ2) • Disk service time may be modeled as a 3-stage Hypoexponential as the overall time is the sum of the seek, the latency and the transfer time

A simple and useful model of increasing failure rate: Failure probable state Robust state Failed state • Time to failure: Hypo-exponential distribution • Increasing failure rate aging HypoExponential used in software rejuvenation models • Preventive maintenance is useful only if failure rate is increasing

Erlang Distribution • Special case of HypoExp: All stages have same rate. • [X > t] = [Nt < r] (Nt: no. of stresses applied in (0,t]) and Nt is Possion (param λt). This interpretation gives,

Erlang Distribution • Is used to approximate the deterministic one since if you keep the same mean but increase the number of stages, the pdf approaches the delta function in the limit • Can also be used to approximate the uniform distribution

probability density functions (pdf) If we vary r keeping r/ constant, pdf of r-stage Erlang approaches an impulse function at r/ .

cumulative distribution functions (cdf) And the cdf approaches a step function at r/. In other words r-stage Erlang can approximate a deterministic variable.

Comparison of probability density functions (pdf) T = 1

Comparison of cumulative distribution functions (cdf) T = 1

Gamma Random Variable • Gamma density function is, • Gamma distribution can capture all three failure modes, viz. DFR, CFR and IFR. • α = 1: CFR • α <1 : DFR • α >1 : IFR • Gamma with α = ½ and  = n/2 is known as the chi-square random variable with n degrees of freedom

HyperExponential Distribution • Hypo or Erlang  Sequential Exp( ) stages. • Alternate Exp( ) stages  HyperExponential. • CPU service time may be modeled as HyperExp • In workload based software rejuvenation model we found the sojourn times in many workload states have this distribution

Log-logistic Distribution • Log-logistic can model DFR, CFR and IFR failure models simultaneously, unlike previous ones. • For, κ > 1, the failure rate first increases with t (IFR); after momentarily leveling off (CFR), it decreases (DFR) with time. This is known as the inverse bath tub shape curve • Use in modeling software reliability growth

Hazard rate comparison

Defective Distribution • If • Example: • This defect (also known as the mass at infinity) could represent the probability that the program will not terminate (1-c). Continuous part can model completion time of program. • There can also be a mass at origin.

Pareto Random Variable • Also known as the power law or long-tailed distribution • Found to be useful in modeling • CPU time consumed by a request • Webfile sizes • Number of data bytes in FTP bursts • Thinking time of a Web browser

Gaussian (Normal) Distribution • Bell shaped pdf – intuitively pleasing! • Central Limit Theorem: mean of a large number of mutually independent rv’s (having arbitrary distributions) starts following Normal distribution as n  • μ: mean, σ: std. deviation, σ2: variance (N(μ, σ2)) • μ and σ completely describe the statistics. This is significant in statistical estimation/signal processing/communication theory etc.

Normal Distribution (contd.) • N(0,1) is called normalized Guassian. • N(0,1) is symmetric i.e. • f(x)=f(-x) • F(z) = 1-F(z). • Failure rate h(t) follows IFR behavior. • Hence, N( ) is suitable for modeling long-term wear or aging related failure phenomena.

Functions of Random Variables • Often, rv’s need to be transformed/operated upon. • Y = Φ (X) : so, what is the density of Y ? • Example: Y = X2 • If X is N(0,1), then, • Above Y is also known as the χ2 distribution (with 1-degree of freedom).

Functions of RV’s (contd.) • If X is uniformly distributed, then, Y= -λ-1ln(1-X) follows Exp(λ) distribution • transformations may be used to generate random variates (or deviates) with desired distributions.

Functions of RV’s (contd.) • Given, • A monotone differentiable function, • Above method suggests a way to get the random variates with desired distribution. • Choose Φ to be F. • Since, Y=F(X), FY(y) = y and Y is U(0,1). • To generate a random variate with X having desired distribution, generate U(0,1) random variable Y, then transform y to x= F-1(y) . • This inversion can be done in closed-form, graphically or using a table.

Continuous Random Variables