Sampling Distribution of a Sample Proportion

Sampling Distribution of a Sample Proportion Lecture 26 Sections 8.1 – 8.2 Wed, Mar 8, 2006

Preview of the Central Limit Theorem • We looked at the distribution of the sum of 1, 2, and 3 uniform random variables U(0, 1). • We saw that the shapes of their distributions was moving towards the shape of the normal distribution. • If we replace “sum” with “average,” we will obtain the same phenomenon, but on the scale from 0 to 1 each time.

Preview of the Central Limit Theorem 2 1 0 1

Preview of the Central Limit Theorem • Some observations: • Each distribution is centered at the same place, ½. • The distributions are being “drawn in” towards the center. • That means that their standard deviation is decreasing. • Can we quantify this?

Preview of the Central Limit Theorem • = ½ 2 = 1/12 2 1 0 1

Preview of the Central Limit Theorem • This tells us that a mean based on three observations is much more likely to be close to the population mean than is a mean based on only one or two observations.

Parameters and Statistics • THE PURPOSE OF A STATISTIC IS TO ESTIMATE A POPULATION PARAMETER. • A sample mean is used to estimate the population mean. • A sample proportion is used to estimate the population proportion. • Sample statistics, by their very nature, are variable. • Population parameters are fixed.

Some Questions • We hope that the sample proportion is close to the population proportion. • How close can we expect it to be? • Would it be worth it to collect a larger sample? • If the sample were larger, would we expect the sample proportion to be closer to the population proportion? • How much closer?

The Sampling Distribution of a Statistic • Sampling Distribution of a Statistic – The distribution of values of the statistic over all possible samples of size n from that population.

The Sample Proportion • Let p be the population proportion. • Then p is a fixed value (for a given population). • Let p^ (“p-hat”) be the sample proportion. • Then p^ is a random variable; it takes on a new value every time a sample is collected. • The sampling distribution of p^ is the probability distribution of all the possible values of p^.

Example • Suppose that this class is 3/4 freshmen. • Suppose that we take a sample of 2 students, selected with replacement. • Find the sampling distribution of p^.

F P(FF) = 9/16 3/4 F 1/4 3/4 N P(FN) = 3/16 1/4 F P(NF) = 3/16 3/4 N 1/4 N P(NN) = 1/16 Example

Example • Let X be the number of freshmen in the sample. • The probability distribution of X is

Example • Let p^ be the proportion of freshmen in the sample. (p^ = X/n.) • The sampling distribution of p^ is

Samples of Size n = 3 • If we sample 3 people (with replacement) from a population that is 3/4 freshmen, then the proportion of freshmen in the sample has the following distribution.

Samples of Size n = 4 • If we sample 4 people (with replacement) from a population that is 3/4 freshmen, then the proportion of freshmen in the sample has the following distribution.

The Parameters of the Sampling Distributions • When n = 1, the sampling distribution is • The mean and standard deviation are •  = 3/4 = 0.75 • 2 = 3/16 = 0.1875

Sampling Distributions • Run the program Central Limit Theorem for Proportions.exe. • Use n = 30 and p = 0.75; generate 100 samples.

100 Samples of Size n = 30  = 0.75  = 0.079

Observations and Conclusions • Observation #1: The values of p^ are clustered around p. • Conclusion #1: p^ is probably close to p.

Larger Sample Size • Now we will select 100 samples of size 120 instead of size 30. • Run the program Central Limit Theorem for Proportions.exe. • Pay attention to the spread (standard deviation) of the distribution.

100 Samples of Size n = 120  = 0.75  = 0.0395

Observations and Conclusions • Observation #2: As the sample size increases, the clustering is tighter. • Conclusion #2A: Larger samples give more reliable estimates. • Conclusion #2B: For sample sizes that are large enough, we can make very good estimates of the value of p.

Larger Sample Size • Now we will select 10000 samples of size 120 instead of only 100 samples. • Run the program Central Limit Theorem for Proportions.exe. • Pay attention to the shape of the distribution.

10,000 Samples of Size n = 120  = 0.75  = 0.0395

10,000 Samples of Size n = 126

More Observations and Conclusions • Observation #3: The distribution of p^ appears to be approximately normal.

One More Conclusion • Conclusion #3: We can use the normal distribution to calculate just how close to p we can expect p^ to be. • However, we must know the values of  and  for the distribution of p^. • That is, we have to quantify the sampling distribution of p^.

The Sampling Distribution of p^ • It turns out that the sampling distribution of p^ is approximately normal with the following parameters. • This is the Central Limit Theorem for Proportions, summarized on page 519.

The Sampling Distribution of p^ • The approximation to the normal distribution is excellent if

Why Surveys Work • Suppose 51% of the population plan to vote for candidate X, i.e., p = 0.51. • What is the probability that an exit survey of 1000 people would show candidate X with less than 45% support, i.e., p^< .45?

Why Surveys Work • First, describe the sampling distribution of p^ if the sample size is n = 1000 and p = 0.51. • Check: np = 510  5 and n(1 – p) = 490  5. • p^ is approximately normal.

Why Surveys Work • The z-score of 0.45 is z = (0.45 – 0.51)/.01581 = -3.795. • P(p^< 0.45) = P(Z< -3.795) = 0.00007385 (not likely!) • Or use normalcdf(-E99, 0.45, 0.51, 0.01581).

Why Surveys Work • Perform the same calculation, but with a smaller sample size, say n = 50. • The probability turns out to be 0.1980, nearly a 20% chance. • By symmetry, there is also a 20% chance that the sample proportion is greater than 57%. • Thus, there is a 40% chance that the sample proportion is off by at least 6 percentage points.

Sampling Distribution of a Sample Proportion