380 likes | 582 Views
STAT 111 Introductory Statistics. Lecture 8: More on the Binomial Distribution and Sampling Distributions June 1, 2004. Today’s Topics. More on the binomial distribution Mean and variance Sample proportion Normal approximation of the binomial Continuity correction
E N D
STAT 111 Introductory Statistics Lecture 8: More on the Binomial Distribution and Sampling Distributions June 1, 2004
Today’s Topics • More on the binomial distribution • Mean and variance • Sample proportion • Normal approximation of the binomial • Continuity correction • Sampling distribution of sample means • Central Limit Theorem
Recall: The Binomial Setting • There are a fixed number n of trials. • The n trials are all independent. • Each trial has one of two possible outcomes, labeled “success” and “failure.” • The probability of success, p, remains the same for each trial.
Recall: The Binomial Distribution • The distribution of the count X of successes in the binomial setting is called the binomial distribution with parameter n and p, where • n is the number of trials • p is the probability of a success on any trial • The count X is a discrete random variable, typically abbreviated as X ~ B(n, p). • Possible values of X are the whole numbers from 0 to n.
The Binomial Distribution • If X ~ B(n,p), then • Examples: Let n = 3.
Developing Binomial Probabilities for n = 3 S3 P(SSS) = p3 P(SSF) = p2(1 – p) P(SFS) = p2(1 – p) P(SFF) = p(1 – p)2 P(FSS) = p2(1 – p) P(FSF) = p(1 – p)2 P(FFS) = p(1 – p)2 P(FFF) = (1 – p)3 S2 p S1 p F3 1-p F2 S3 p p 1-p F3 1-p S2 S3 p p F3 1-p 1-p S3 p F1 1-p F2 1-p F3
Binomial Probabilities for n = 3 • Let X be the number of successes in three trials. P(FFF) = (1 – p)3 P(SSF) = p2(1 – p) P(SFS) = p2(1 – p) P(SFF) = p(1 – p)2 P(FSS) = p2(1 – p) P(FSF) = p(1 – p)2 P(FFS) = p(1 – p)2 P(SSS) = p3 P(X = 0) = (1 – p)3 P(X = 1) = 3p(1 – p) 2 P(X = 2) = 3p2(1 – p) P(X = 3) = p3 X=0 X=1 X=2 X=3
Example: Rolling a Die • Roll a die 4 times, let X be the number of times the number 5 appears. • “Success” = get a roll of 5, so P(Success) = 1/6.
Example: Rolling a Die • Find the probability that we get at least 2 rolls of 5.
Expected Value and Variance of a Binomial Random Variable • If X~B(n,p),then
Set-up for Derivation • Let Xiindicate whether the i th trial is a success or failure by, • X1, …, Xn are independent and identically distributed with probability distribution Xi =1, if ith trial is a success i = 1,2,….,n. Xi =0, if ith trial is a failure.
Binomial Example: Checkout Lanes • A grocery store has 10 checkout lanes. During a busy hour the probability that any given lane is occupied (has at least one customer) is 0.75. Assume that the lanes are occupied or not occupied independently of each other. • What is the probability that a customer will find at least one lane unoccupied? • What is the expected number of occupied lanes? • What is the standard deviation of the number of occupied lanes?
Sample Proportions • In statistical sampling we often want to estimate the proportion p of “successes” in a population. • The sample proportion is defined as • If the count X is B(n, p), then the mean and standard deviation of the sample proportion are
Sample Proportions • Our sample proportion is an unbiased estimator of the population proportion p. • The variability of our estimator decreases as sample size increases. • In particular, we must multiply the sample size by 4 if we want the cut the standard deviation in half.
Sample Proportions • The histogram of the distribution of the sample proportion when n = 1000, p = 0.6
Normal Approximation for Counts, Proportions • Let X be the number of successes in a SRS of size n from a large population having proportion p of successes, and let the sample proportion of successes be denoted by • Then for large n, • X is approximately normal with mean np and variance np(1 – p). • is approximately normal with mean p and variance p(1 – p) / n.
Normal Approximation: Rule of Thumb • The accuracy of the approximation generally improves as the sample size n increases. • For any fixed sample size, the approximation is most accurate when p is close to 0.5, and least accurate when p is near 0 or 1. • As a general rule of thumb, then, we use the normal approximation for values of n and p such that np ≥ 10 and n(1 – p) ≥ 10.
Example • The Laurier Company’s brand has a market share of 30%. Suppose that in a survey, 1,000 consumers of the product are asked which brand they prefer. What is the probability that more than 32% of the respondents will say they prefer the Laurier brand?
Another Example • A quality engineer selects an SRS of size 100 switches from a large shipment for detailed inspection. Unknown to the engineer, 10% of the switches in the shipment fail to meet the specifications. The actual binomial probability that no more than 9 of the switches in the sample fail inspection is P(X ≤ 9) = .4513. • How accurate is the normal approximation for this probability?
Another Example (cont.) • Let X be the number of bad switches; then X ~ B(100, 0.1). • It’s not that accurate. Note that np = 10, so n and p are on the border of values for which we are willing to use the approximation.
Continuity Correction • While the binomial distribution places probability exactly on X = 9 and X = 10, the normal distribution spreads probability continuously in that interval. • The bar for X = 9 in a probability histogram goes from 8.5 to 9.5, but calculating P(X ≤ 9) using the normal approximation only includes the area to the left of the center of this bar. • To improve the accuracy of our approximation, we should let X = 9 extend from 8.5 to 9.5, etc.
Continuity Correction • Use continuity correction to approximate the binomial probability P(X=10) when n=100, p=0.1 • Using the normal approximation to the binomial distribution, X is approximately distributed as N(10, 3).
Continuity Correction The exact binomial probability is P(X=10)=0.13187 P(9.5<Xnormal<10.5)=0.13237 9.5 10 10.5 P(Xbinomial=10)=0.13187
Continuity Correction 8.5 8 Q: what about continuity correction for P(X<8)?
Continuity Correction 14 13.5 Q: what about continuity correction for P(X>14)?
Example Re-visited • Using the continuity correction, the probability that no more than 9 of the switches in the sample fail inspection is
Example: Inspection of Switches • Find the probability that at least 5 but at most 15 switches fail the inspection.
Sampling Distributions • Counts and proportions are discrete random variables; used to describe categorical data. • Statistics used to describe quantitative data are most often continuous random variables. • Examples: sample mean, percentiles, standard deviation • Sample means are among the most common statistics.
Sampling Distributions • Regarding sample means, • They tend to be less variable than individual observations. • Their distribution tends to be more normal than that of individual observations. • We’ll see why later.
Sampling Distributions of Sample Means • Let be the mean of an SRS of size n from a population having mean µ and standard deviation σ. • The mean and standard deviation of are • Why?
Sampling Distributions of Sample Means • The shape of the distribution of the sample mean depends on the shape of the population distribution itself. • One special case: normal population distribution • Because: any linear combination of independent normal random variables is normal distributed.
Example • The foreman of a bottling plant has observed that the amount of soda pop in each “32-ounce” bottle is actually a normally distributed random variable, with a mean of 32.2 ounces and a standard deviation of .3 ounce. • If a customer buys one bottle, what is the probability that that bottle contains more than 32 ounces? • If that same customer instead buys a carton of 4 bottles, what is the probability that the mean of those 4 bottles is greater than 32 ounces?
Example • The starting salaries of M.B.A.s at Wilfrid Laurier Univ.(WLU) are normally distributed with a mean of $62,000 and a standard deviation of $14,500. The starting salaries of M.B.A.s at the University of Western Ontario (UWO) are normally distributed with a mean of $60,000 and a standard deviation of $18,300. • A random sample of 50 WLU M.B.A.s and a random sample of 60 UWO M.B.A.s are selected • What is the probability that the sample mean of WLU graduates will exceed that of the UWO graduates?
Central Limit Theorem • When the population distribution is normal, so is the sampling distribution of • What about when the population distribution is non-normal? • For large sample sizes, it turns out that the distribution of gets closer to a normal distribution. • As long as the population has finite standard deviation, this will be true regardless of the actual shape of the population distribution
Central Limit Theorem • Formally, draw an SRS of size n from any population with mean µ and finite standard deviation σ. • As n approaches infinity (gets very large) • This can hold even if the observations are not independent or identically distributed. • This is why normal distributions are common models for observed data.