Chapter 18

Chapter 18 Sampling Distribution Models

Demonstration • Observe in class SPSS demonstration related to sampling distribution models

Demonstration Summary First, we examined the distribution of state appropriations for education given the entire population of U.S. states. Our findings indicated that the distribution of state spending on education was skewed to the right, with a mean (m) of 1,272,969,120.00 and standard deviation (s) of 1,567,930,688.096

Demonstration Summary Next we randomly selected 30 states to be included in our sample. Analysis of this sample indicated that again the distribution of spending on education was skewed to the right; however the mean of the sample ( )was $1,410,710,766.67 with a standard deviation (s) of $1,941,673,134.577.

Demonstration Summary We then repeated the random sampling process to get a new sample of thirty states. We noticed that this new sample also had a distribution that was skewed to the right; however, the mean and standard deviation of this sample differed. The results were $1,126,093,266.67 and $1,781,298,838.439 respectively. Did we do something wrong?

Demonstration Summary We then examined 100 different random samples of size thirty and determined that each sample had a slightly different mean and standard deviation due to sampling variability (i.e. different combinations of states were included in each of our samples). When we went to create a histogram for our collection of sample means, we discovered something pretty amazing – that distribution looked very much like a normal model even though the distribution of state appropriations from our original population was skewed to the right.

Sampling Distribution A listing of all the values that a sample mean can take on and how often those values can occur is called the sampling distribution of a sample mean. This histogram of sample means depicts the sampling distribution of the sample mean. Like any other distribution, a sampling distribution of the sample mean has a shape, center, and measure of variability (i.e. spread) This distribution can be interpreted as the probability distribution of sample means. Under certain conditions this sampling distribution will approximate the normal model regardless of the shape of the distribution for the original variable from the population.

Simulating the Sampling Distribution of a Mean We can use simulation to get a sense as to what the sampling distribution of the sample mean might look like… Let’s start with a simulation of 10,000 tosses of a die. A histogram of the results is:

Means – Averaging More Dice Looking at the average of two dice after a simulation of 10,000 tosses: The average of 5 dice after a simulation of 10,000 tosses looks like:

Means – What the Simulations Show As the sample size (number of dice) gets larger, each sample average is more likely to be closer to the population mean. • So, we see the shape continuing to tighten around 3.5 And, it probably does not shock you that the sampling distribution of this mean becomes Normal.

The Central Limit Theorem (CLT) The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal model. The larger the sample, the better the approximation will be. The CLT is surprising and a bit weird: • Not only does the histogram of the sample means get closer and closer to the Normal model as the sample size grows, but this is true regardless of the shape of the population distribution. All we need is for the observations to be independent and collected with randomization.

Conditions Required for the CLT • Random Sampling Condition: The data values must be sampled randomly or the concept of a sampling distribution makes no sense. • Independence Assumption: Impossible to know for sure, instead use the 10% condition – the sample size, n, is no more than 10% of the population.

But Which Normal? Recall that normal models are described by their means and standard deviations. The mean of all sample means is the population mean m. That is to say, the sampling distribution of the mean has a mean m . The standard deviation of all sample means is . That is to say, the sampling distribution of the mean has a standard deviation .

The Sampling Distribution Model for a Mean When a random sample is drawn from any population with mean m and standard deviation s , its sample mean has a sampling distribution with the same mean m but whose standard deviation is (we write ).

The Sampling Distribution Model for a Mean (continued) No matter what population (whether it has a distribution that is symmetric, uniform, or skewed to the right or left) the random sample comes from, the shape of the sampling distribution is approximately Normal as long as the sample size is large enough. The larger the sample used, the more closely the Normal approximates the sampling distribution for the mean.

Sampling Distributions for Proportions • The Central limit theorem does not apply only to sample means • Can make the same conclusions about the shape, center and variability about the sample proportions. • The sample proportion is denoted by and is equal to the number of individuals in the sample in the category of interest, divided by the total sample size (n).

What About the Sampling Distribution Model for a Proportion Provided that the sampled values are independent and the sample size is large enough, the sampling distribution of (sample proportion) is modeled by a Normal model with • Mean: • Standard deviation: • Where p is the probability of success (i.e. observation falls into the specific group of the categorical variable that you are interested in). q is the probability of failure.

Necessary Conditions When Working with Proportions Two assumptions: • The sampled values must be independent of each other • The sample size, n, must be large enough Check the following corresponding conditions • 10% Conditions – sample size must be no larger than 10 percent of the population • Success/Failure Condition – sample size must be large enough such that n*p and n*q are at least 10. In other words we need to expect at least 10 success and 10 failures to have enough data for a sound conclusion.

Standard Error The standard deviations of our Normal models are as follows: • For proportions For means When we don’t know p or σ, we’re stuck, right?

Standard Error (con’t) Nope. We will use sample statistics to estimate these population parameters. • For a sample proportion, the standard error is • For the sample mean, the standard error is When we estimate the standard deviation of a sampling distribution using statistics found from the data, the estimate is called a standard error.

Watch out for small samples from skewed populations • If the original population is not itself normally distributed, here is a common guideline: For samples of size n greater than 30, the distribution of the sample means can be approximated reasonably well by a normal model. The approximation gets better as the sample size, n, becomes larger. • If the original population is itself normally distributed, then the sample means will be normally distributed for any sample size n (not just values of n larger than 30).

Applications of the Central Limit Theorem - #1 In the 2001 ACT, students had a mean score of 21.3 with a standard deviation of 6.0. Assume that the scores are normally distributed. If 60 students are randomly selected, find the probability that they have a mean score greater than 23.5.

Applications of the Central Limit Theorem - #2 A national study found that 44% of college students engage in binge drinking (5 drinks at a sitting for men, 4 for women). Use the 68-95-99.7 Rule to describe the sampling distribution model for the proportion of students in a randomly selected group of 200 college students who engage in binge drinking. Do you think the appropriate conditions are met?

Example #3 Carbon monoxide emissions for a certain kind of car vary with mean 2.9 g/m and standard deviation 0.4 g/m. A company has 80 of these cars in its fleet. • Estimate the probability that the mean CO level for the company’s fleet is between 3.0 and 3.1 g/m. • There is only a 5 percent chance that the fleet’s mean CO level is greater than what value?

Example #4 Just before a referendum on a school budget, a local newspaper polls 400 voters in an attempt to predict whether the budget will pass. Suppose that the budget actually has the support of 52% of the voters. What’s the probability the newspaper’s sample will lead them to predict defeat? Be sure to verify that the assumptions and conditions necessary for your analysis are met.

Assignment • Read Chapter 18 Again! • Try the following exercises from Ch. 18 • #1, 3, 7, 9, 17, 21, 23, 25, 27, 33, 37 • Work through the ActivStats assignments for Chapter 18 for additional practice.

Chapter 18

Chapter 18

Presentation Transcript

Chapter 18

Chapter 18

Chapter 18

Chapter 18

CHAPTER 18

Chapter 18

Chapter 18

Chapter 18

Chapter 18

Chapter 18

Chapter 18

CHAPTER 18

Chapter 18:

Chapter 18

Chapter 18

Chapter 18

Chapter 18

Chapter 18

Chapter 18

Chapter 18

Chapter 18

Chapter 18