180 likes | 384 Views
The Normal Curve and Sampling A sample will always be different from the true population This is called “sampling error” The difference between a sample and the true population, regardless of how well the survey was designed or implemented Different from measurement error or sample bias.
E N D
The Normal Curve and Sampling A sample will always be different from the true population This is called “sampling error” The difference between a sample and the true population, regardless of how well the survey was designed or implemented Different from measurement error or sample bias
Sampling distribution of Means • The existence of sampling error means that if you take a 1000 random samples from a population and calculate a 1000 means and plot the distribution of those means you will get a consistent distribution that has the following characteristics:
Characteristics of a Sampling Distribution • 1. the distribution approximates a normal curve • 2. the mean of a sampling distribution of means is equal to the true population • 3. the standard deviation of a sampling distribution is smaller than the standard deviation of the population. Less variation in the distribution because we are not dealing with raw scores but rather central tendencies.
Probability and the Normal Curve In chapter 6 – we are not interested in the distribution of raw scores but rather the distribution of sample means and making probability statements about those sample means.
Probability and the Sampling Distribution Why is making probabilistic statements about a central tendency important? • 1. it will allow us to engage in inferential statistics (later in ch. 7) • 2. it allows us to produce confidence intervals
Example of number 1: • President of UNLV states that the average salary of a new UNLV graduate is $60,000. We are skeptical and test this by taking a random sample of a 100 UNLV students. We find that the average is only $55,000. Do we declare the President a liar?
Not Yet!!!! We need to make a probabilistic statement regarding the likelihood of Harter’s statement. How do we do that? With the aid of the standard error of the mean we can calculate confidence intervals - the range of mean values within with our true population mean is likely to fall.
How do we do that? • First, we need the sample mean • Second, we need the standard deviation of the sampling distribution of means (what’s another name for this?) • a.k.a standard error of the mean
What’s the Problem? • The problem is… • We don’t have the standard deviation of the sampling distribution of means? • What do we do?
First – let’s pretend • Let’s pretend that I know the Standard Deviation of the Sampling Distribution of Means (a.k.a. the standard error of the mean). It’s 3000 • For a 95% confidence interval we multiply the standard error of the mean by 1.96 and add & subtract that product to our sample mean • Why 1.96?
So is the President Lying? CI = Mean + or – 1.96 (SE) = 55,000 +/- 1.96 (3000) = 55,000 +/- 5880 = $49,120 to 60,880
Estimating the SE • We Can Estimate the Standard Error of the Mean. • Divide the standard deviation of the sample by √n-1 • For example a sample standard deviation of 29849 would produce a estimate of the SE of around 3000 [29849 divide by √n-1] [remember n = 100] • Then multiply this estimate by t rather than 1.96 and then add this product to our sample mean. Why t?
The t Distribution • Empirical testing and models shows that a standard deviation from a sample underestimates the standard deviation of the true population • This is why we use N-1 not N when calculating the standard deviation and the standard error • So in reality, we are calculating t-scores, not z-scores since we are not using the true sd.
So when we are using a sample and calculating a 95% confidence interval (CI) we need to multiply the standard error by t, not 1.96 • How do we know what t is? • Table in back of book (Appendix C; Table C) • Df = N – 1 • 100-1 = 99; Use the df of 60 and level of significance of .05 (why?) • T = 2
Confidence Intervals for Proportions Calculate the standard error of the proportion: Sp = 95% conf. Interval = P +/- (1.96)Sp
Example • National sample of 531 Democrats and Democratic-leaning independents, aged 18 and older, conducted Sept. 14-16, 2007 • Clinton 47%; Obama 25%; Edwards 11% • P(1-P) = .47(1-.47) = .47(.53) = .2491 • Divide by N = .2491/531 = .000469 • Take square root = .0217 • 95% CI = .47 +/- 1.96 (.0217) • .47 +/- .04116 or 0.429 to .511