440 likes | 539 Views
Probability Distributions Continued. Module 2b. Hypergeometric Distribution. Sampling Without Replacement: Consider a batch of 20 microwave modules, of which 3 are defective. We sample and test one module first, and find it is defective.
E N D
Probability Distributions Continued Module 2b
Hypergeometric Distribution Sampling Without Replacement: Consider a batch of 20 microwave modules, of which 3 are defective. We sample and test one module first, and find it is defective. • at this time, there was a 3/20 chance of obtaining a defect We sample a second time, without replacing the module: • this time, there is a 2/19 chance of obtaining a defect • the outcome of the second trial is no longer independent of the first trial • probability of success/failure changes with each trial K. McAuley
Hypergeometric Distribution Let’s figure out how to solve problems where trials aren’t independent. We’ll use a counting approach instead of looking at the probability of a success on each trial. • Suppose we have a total of 20 objects in the batch, of which 3 are defective (so 17 are good) • We take out 8 objects, all at once, andwe want to know the probability that 2 are defective (and 6 are not) • How many ways could we select 8 objects from the batch of 20? • How many possible ways could we select 2 defective items when there are 3 in the batch? • How many ways could we select 6 good items when there are 17 good items in the batch? K. McAuley
Hypergeometric Distribution Let’s do it in general: • Suppose we have a total of N objects in the batch, of which d are defective • We randomly take out n objects, and we want to know the probability of x of them being defective • There are ways of taking the sample of n objects • There are ways of selecting the x defective objects • There are ways of selecting the n-x good objects K. McAuley
Hypergeometric Distribution Example • Given a batch of 200 dashboard components, of which 10% are typically defective • We take a sample of 10 components and test without replacement • What is the probability of 3 defective components? • What is the probability of finding 0 defective components? K. McAuley
Poisson Distribution Example: Consider a 100 km section of the 401, in which car accidents occur randomly and independently. The average number of accidents in the 100 km section (per month) is 15. Let’s make some predictions about what might happen next month. What is the probability of a) 0 accidents occurring b) 10 accidents occurring K. McAuley
Poisson Distribution • Used when considering discrete occurrences in a continuous interval • e.g., # of breakages in 500 m of yarn • e.g., # of times photocopier will jam during 1 year • Derived from a Binomial distribution in which the number of trials is very large • To use the Poisson Distribution, we must be willing to assume that occurrences in different segments of the interval are independent. Why? K. McAuley
overall interval Poisson Distribution • Consider the time or space interval of interest and divide it into small sub-intervals • Assume that what happens in each sub-interval is a Bernoulli trial, with a probability p of success. • To get the Poisson Distribution, make the size of the sub-intervals very small. Why? K. McAuley
Poisson Distribution • Remember the Binomial distribution: • If we take the limit as the size of the sub-intervals 0 and the number of sub-intervals , but keep the average number of successes = np constant for the total interval we get the Poisson Distribution: K. McAuley
Poisson Distribution • We can also define , the average number of occurrences per unit time (or length) so that = t, where t is the length of the interval of interest K. McAuley
Poisson Distribution Example: Consider a 100 km section of the 401, in which car accidents occur randomly and independently. The average number of accidents in the 100 km section (per month) is 15. Let’s make some predictions about what might happen next month. What is the probability of a) 0 accidents occurring b) 10 accidents occurring K. McAuley
Poisson Distribution - Example = 15 occurrences on average, over the interval of interest (or = 15 occurrences per month and t=1 month) K. McAuley
Poisson Distribution Mean: • we identified as the average number of occurrences in the interval, so this makes sense. Variance: • What do we think about this? K. McAuley
Poisson Distribution Additional Notes: • The Poisson distribution can be used to approximate the Binomial distribution, when the number of independent trials is very large, and p is small • use = n p • Why is the approximation helpful? • if n=1000 trials, we must calculate 1000! which is very very large. • e.g., for n > 20, p < 0.05 - approximation is good for n > 100, p < 0.01 - approximation is even better K. McAuley
Continuous Random Variables Outcomes are values along the real number line. Examples: Temperature or pressure measurements reported to a large number of decimals Problem: There are infinitely many possible values of X, so the probability of obtaining any particular value is vanishingly small. We need to think about the probability that X will be in a particular interval when defining probability distributions for continuous variables. K. McAuley
Probability Density Functions Consider a probability density function fX(x) We get probabilities using areas under this curve • fX(x) is like a “continuous histogram” with the total area under the curve equal to 1. • Integrate fX(x) between particular values of x to get probability that X will be in the range of interest K. McAuley
Probability Density Function Example - Normal probability density function - the familiar “bell-shaped” curve What is the probability that 1.0<X<2.5? K. McAuley
Cumulative Distribution Function What is P(X<)? • e.g., P(Temperature<350 C) Cumulative Distribution Function K. McAuley
Expected Value (or Mean) We can also define the expected value operation in a manner analogous to the discrete case • Use an integral instead of a summation K. McAuley
Variance … is the expected squared deviation from the mean Standard deviation is the square root of variance. K. McAuley
Expected Values Just like the discrete case, we can find the expected value for any function of the random variable, if we know the probability density function K. McAuley
Important continuous distributions We will learn about these now • Uniform distribution • Exponential distribution • Normal distribution and these ones later when we need them • Student’s t-distribution • Chi-squared distribution • F-distribution K. McAuley
Uniform Distribution • We have values that occur in an interval • e.g., composition is between 2.5 and 3.5 g/Land the probability is equal (uniform) across the interval, but zero elsewhere fX(x) x a b K. McAuley
Uniform Distribution What is the probability density function? • Area of the rectangle must equal 1. Why? height = 1/(b-a) K. McAuley
Uniform Distribution Mean - • Does this match our intuition? Variance - • How could we prove this? • What happens to the variance as b and a get further apart? K. McAuley
Uniform Distribution When would we use a uniform distribution? Examples • readout from a pressure gauge • if we are provided only with the pressure in Pa to the nearest integer, the true pressure could be anywhere from 0.5 Pa below the reading to 0.5 Pa above • in the absence of any additional information, we assume that values are distributed uniformly between these two limits • another example - numerical truncation and round-off in computations K. McAuley
Normal Distribution • One of the most important distributions. Why? The Normal distribution at the left has =1 and 2=1 K. McAuley
Normal Distribution • is symmetric • centre is at the mean • variance and standard deviation are measures of the width of the distribution Cumulative distribution function: • Unfortunately, this integral has no analytical solution, so we rely on numerical integration results in tables • values in tables for µ=0 and =1 are in Appendix of text. K. McAuley
Standard Normal Distribution Problem • We don’t have a table for each possible mean and standard deviation Solution • Apply a transformation and use standard normal distribution tables If X is the original normally distributed random variable with mean µX and standard deviation X, thenZ has a mean of zero and a standard deviation of 1. K. McAuley
Standard Normal Distribution • mean of Z • variance of Z What rules about expectations were used to show this? Which things are random and which aren’t? K. McAuley
Using the Standard Normal Tables • What is P(Z < 1.96)? • What is P(Z < -1.96)? • What is P(-1.96 < Z < 1.96)? K. McAuley
Central Limit Theorem • Why is the Normal distribution so important? • Because the sum or average of the N random variables follows a Normal distribution if N is a large number • Imagine N independent random variables, each having the same distribution (any type of distribution at all) with mean and variance 2 then the average becomes normally distributed as N becomes large. Z is the standard Normal distribution K. McAuley
Central Limit Theorem • In many instances, the Normal distribution provides a reasonable approximation for quantities that are a sum or average of independent random variables • Course marks are sometimes normally distributed. Why? Why not? • Repeated measurements of the same variable often tend to a Normal distribution. Why? K. McAuley
New Topic- Failures in Time Example problem: • We have an important pump on a recycle line • The packing fails on average 0.6 times/year • What is the probability of the pump packing failing before 1 year? • We could also say, what is probability that the “time to failure” is less than 1 year? K. McAuley
Exponential Distribution Assume that events occur in time at an average rate occurrences per unit time What is the probability that the first occurrence of the event happens before time “t”? Approach - • Similar to a Poisson problem but what is different? • P(event occurs before a given time) = 1 - P(event doesn’t occur during the entire time interval) K. McAuley
Exponential Distribution • Event doesn’t occur in a given time means 0 occurrences • Poisson - with occurrence rate of t in time interval t. • P(event occurs before this time) = K. McAuley
Exponential Distribution Denote continuous random variable X as the time to occurrence. Cumulative distribution function • Probability density function is Why? • Sometimes we know , the average number of failures per unit time, and sometimes we know , the mean time to failure: K. McAuley
Pump Failure Problem • If the packing fails on average 0.6 times / year, what is the chance of failure within the first year? • P(pump fails within year) • 45% chance of failure within year • We derived the Exponential distribution from the Poisson Distribution, which came from the Binomial Distribution. What troubling assumptions are we making? K. McAuley
Exponential Distribution - Notes • The time to failure is a continuous random variable • We assume that the expected failure rate is constant, and that it doesn’t increase as equipment wears out • We assume that failures are independent, and that each time increment is an independent trial • mean and variance: K. McAuley
Exponential Distribution Problem Variations - • given mean time to failure, determine probability that time to failure is less than a given value • given fraction of components failing in a specified time, what is probability that time to failure is less than a given value? • what is probability that a component lasts at least a given time? Let’s make one up and do it? K. McAuley
Exponential Distribution Example: If a component has operated for 100 hours already, what is the probability that it will operate for at least 200 hours all together before failing, i.e., P(X>200 | X> 100)=? • Let A = {X>100}, B = {X>200} • Remember conditional probability • for our events, we have Why? • A B = B K. McAuley
Exponential Distribution What does this tell us? K. McAuley
Exponential Distribution Memoryless property of Exponential Distribution • The probability of the component lasting for another 100 hours, given that it has functioned for 100 hours, is simply probability of it lasting 100 hours • Prior history has no influence on probability of failure when exponential distribution is used • Is this how life works? • When are we justified in using the exponential distribution and when should we avoid it? K. McAuley