Sampling Distributions

Sampling Distributions Chapter 7

The Concept of a Sampling Distribution • Repeated samples of the same size are selected from the same population. • The same sample statistic is calculated from the data in EACH sample. • The distribution of the sample statistics is the SAMPLING DISTRIBUTION of that sample statistic.

The Sampling Process SAMPLE POPULATION μ

The Sampling Distribution Repeated Sampling Sampling Distribution POPULATION μ

What is Standard Error? Standard Error has been identified as a quantity that is not understood. Is it a Standard Deviation? Standard Error of what? What does it tell us? The purpose of this presentation is to make the concept of Standard Error clearer and more understandable.

The Sampling Process 30, 42, 48, 49, 61, 54, 41, 38, 59, 57 Calculate Mean = 47.9 This sample mean is an ESTIMATE of the population mean. SAMPLE We should not be surprised that the estimate does not equal the true mean for the population! POPULATION Mean = 50

The Sampling Process 30, 42, 48, 49, 61, 54, 41, 38, 59, 57 Calculate Mean = 47.9 Plot the Sample Mean SAMPLE POPULATION Mean = 50

Repeated Sampling Sampling Distribution of the Sample Means The Sampling Distribution Calculate means for each sample m1, m2, … Plot All Sample Means

What about this sampling distribution? Each dot represents a mean from one of the samples. Each sample mean is an ESTIMATE of the population mean. Notice that center of this graph is around 50 and the spread ranges from 45 to 55.

What about this sampling distribution? The mean of sampling distribution (that is, the mean of the sample means) is the MEAN of the population! AND… We call the standard deviation of the distribution of sample means the STANDARD ERROR OF THE ESTIMATE OF THE POPULATION MEAN.

In Summary • STANDARD DEVIATION is a measure of the spread of data in a population or in a sample. • STANDARD ERROR is a measure of the spread of the ESTIMATES of a measure of a population calculated from repeated sampling.

In short…

Point Estimators • When inferences are made from the sample to the population, the sample mean is viewed as an estimator of the mean of the population from which the sample was selected. • Similarly, the proportion of successes in a sample is an estimator of the proportion of successes in the population.

Properties of Point Estimators • The summary statistic should be UNBIASED, that is the mean of the sampling distribution is equal to the value you would get if you computed the summary statistic using the entire population. More formally, an estimator is unbiased if its expected value equals the parameter being estimated. • The summary statistic should have as little variability as possible (be more precise than other estimates) and should have a standard error that decreases as the sample size increases.

Properties of the Sampling Distribution of the Sample Mean • If a random sample of size n is selected from a population with mean µ and standard deviation σ, then • The mean of the sampling distribution of equals the mean of the population µ = µ

Properties of the Sampling Distribution of the Sample Mean • If a random sample of size n is selected from a population with mean µ and standard deviation σ, then • The standard deviation, , of the sampling distribution of , sometimes called the standard error of the mean, equals the standard deviation of the population σ, divided by the square root of the sample size n: = σ/√n *Only used when N>10n

Properties of the Sampling Distribution of the Sample Mean • If a random sample of size n is selected from a population with mean µ and standard deviation σ, then • The shape of the sampling distribution will be approximately normal if the population is approximately normal; for other populations, the sampling distribution becomes more normal as n increases • This property is called the CENTRAL LIMIT THEOREM (CLT)

Reasonably Likely Averages Mean ± 1.96(SE) • 1.96 is the z-score and comes from the cut off point of the middle 95% of a normal distribution

If the Sampling Distribution is known… Probability questions about sample statistics can be answered. For example, A simple random sample of 50 is selected from a normal population with a mean of 50 and a standard deviation of 10. What is the probability that the sample mean will be greater than 53?

The Answer… A simple random sample of 50 is selected from a normal population with a mean of 50 and a standard deviation of 10. What is the probability that the sample mean will be greater than 53?

Properties of the Sampling Distribution of the sum of a Sample • If a random sample of size n is selected from a distribution with mean µ and standard deviation σ, then • The mean of the sampling distribution of the sum is µsum = nµ • The standard error of the sampling distribution of the sum is σsum=√n · σ • CLT applies

Sampling Distribution of the Sample Proportion • We will now move from studying the behavior of the sample mean to studying the behavior of the sample proportions (the proportion of “successes” in the sample)

Properties of the Sampling Distribution of the Number of Successes • If a random sample of size n is selected from a population with proportion of successes, p, then the sampling distribution of the number of successes X • Has mean µx = np • Has standard error σx = √np(1-p) • Will be approximately normal as long as n is large enough • As a guideline both np and n(1-p) are at least 10 np≥10 and n(1-p) ≥10

Example • The use of seat belts continues to rise in the United States, with overall seat belt usage of 82%. Mississippi lags behind the rest of the nation—only about 60% wear seat belts. Suppose you take a random sample of 40 Mississippians. How many do you expect will wear seat belts? What is the probability that 30 or more of the people in your sample wear seat belts?

Solution

Sampling Distributions of the Sample Proportion • True proportion of successes is represented by “p” • Sample proportion of successes is represented by “p hat” p hat = (# of successes)/(sample size)

Sampling Distribution of p-hat • How does p-hat behave? To study the behavior, imagine taking many random samples of size n, and computing a p-hat for each of the samples. • Then we plot this set of p-hats with a histogram.

Sampling Distribution of p-hat

Properties of p-hat • When sample sizes are fairly large, the shape of the p-hat distribution will be normal. • The mean of the distribution is the value of the population parameter p. • The standard deviation of this distribution is the square root of p(1-p)/n. As a guideline use np ≥ 10 and n(1-p) ≥ 10

Example • About 60% of Mississippians use seat belts. Suppose your class conducts a survey of 40 randomly selected Mississippians. • What is the chance that 75% or more of those selected wear seat belts? • Would it be quite unusual to find that fewer than 25% of Mississippians selected wear seat belts?

Solution

Sampling Distributions