200 likes | 368 Views
Confidence Interval for Population Mean. The case when the population standard deviation is unknown (the more common case). Let’s review.
E N D
Confidence Interval for Population Mean The case when the population standard deviation is unknown (the more common case).
Let’s review. When we do not know the population mean we want to use a sample to get a feel for what the population mean might be. From the sample we calculate a sample mean. Since we know in theory that different samples would provide potentially different sample means, we take our one sample mean and build a margin of error around the sample mean. Then we have a level of confidence that the unknown population mean is in the interval we calculated based on the sample. Up to now we have looked at the case where the population standard deviation was known.
More review The margin of error I write about on the previous screen is calculated using a value of Z and the standard deviation of the sampling distribution. The values of Z most commonly used are Z Confidence interval 1.96 95% 1.645 90% 2.576 99% The standard deviation of the sampling distribution is the population standard deviation divided by the square root of the sample size.
So, the margin of error is Z times the standard deviation of the sampling distribution. The confidence interval is then (sample mean minus margin of error, sample mean plus margin of error). New information When the population standard deviation is not known then we have to modify our work just a little. The standard deviation of the sampling distribution will be called the standard error and will still be calculated similar to above. But we will not use a Z value in the margin of error. We will use a t value.
It turns out that when the population standard deviation is not known the sample mean has a t distribution. The t distribution is a lot like the normal distribution, but when we use the t distribution we have to be aware of something called degrees of freedom (df). The main point for us here is that degrees of freedom = sample size minus 1, or df = n – 1. So, if n = 19, df = 18, if n = 11, df = 10, and so on. Now if we want a 95% confidence interval in this case we 1) Calculate sample mean, 2) Calculate sample standard deviation, 3) Calculate standard error as sample standard deviation divided by the square root of the sample size, 4) find our t value in the t-table under the .025 column in the df row = n-l, 5) calculate our margin of error as t times standard error, 6) calculate interval as sample mean minus and plus margin of error.
Note we look in the .025 column on the t distribution for a 95% confidence because we would have .025 or 2.5% in the tails of the distribution. If we want a 99% confidence interval we look in the .005 column and if we want a 90% confidence interval we look in the .05 column for similar reasons. Let’s do an example. We have the following sample values from a population: 10, 8, 12, 15, 13, 11, 6, and 5. On the following slide I have a basic Excel printout to calculate the sample mean and sample standard deviation.
a) The point estimate of the population mean is the sample mean = 10. b) The point estimate of the population standard deviation is the sample standard deviation and when you round to two digits you get 3.46 c) To get the 95% confidence interval we need to get the standard error and the t statistic with a upper tail value of .025 and a df = 7. The t value is 2.365 The standard error is 3.46/sqrt(8) = 1.22. Thus the margin of error is (2.365)1.22 = 2.89. The interval is thus (10 – 2.89, 10 + 2.89) = (7.11, 12.89) and thus we are 95% confident the population mean is in the interval 7.11 to 12.89.
Z table and t table If you look in the Z table at a Z = 1.96 you see the value .9750. This means .9750 of the possible Z values have values 1.96 or less. .9750 is a cumulative value. .025 is in the upper tail. There is 1 standard normal distribution. But the t distribution is really a family of distributions, where each value of the degrees of freedom defines a new distribution. When you go to the t table in the book you see across the top the values of the upper tail area. When you go to the upper tail area .025 you see in the df infinity row the t value is 1.96. This means when the df is really big the t and the z distributions are the same. See the similarity with .05 and .005?
Another idea In a confidence interval we want to focus on the middle of the distribution. Say the line I have between b and c is the sample mean and the arrows point to the low and the high end of the interval. a b c d If I want a 95% confidence interval (and I had the distribution drawn in) b would have area .95/2 and c would also have .95/2. Area a would be .05/2 and the same would work for d. So, if I have a t distribution (or Z) why do I look at .025 when I have a 95% confidence interval? The answer is the table works with the upper tail and since the upper tail is just .025 we look there knowing that the other .025 is in the lower tail.
Problem (not in your book) Note when working with a t you have to pick the right column and the right row. The row is the df and equals n – 1. The column to look at is related to the story I had on the previous slide. In aproblem we may get 1 – alpha = some decimal. From this alpha = 1 minus some decimal. On the previous slide alpha was split in half in area a and d. We focus on d. a. alpha = .05 alpha/2 = .05/2 = .025. df = 9, the critical t = 2.262. b. alpha/2 = .01/2 = .005 and df = 9 so critical t = 3.250. c. alpha/2 = .1/2 = .05 and df = 15 so critical t = 1.753.
T-test for the Mean of a Population: Unknown population standard deviation Here we will focus on two methods of hypothesis testing: the critical value approach and the p-value approach.
We saw in the standard deviation in the population known case that when we do not know the true value of the population mean for a quantitative variable an hypothesis test can be carried out utilizing the z calculation (x bar minus mu under Ho:)/standard deviation of the mean. When the population standard deviation, sigma, of the variable is unknown we have to rely on the t distribution. Plus in the calculation of the standard error we will use the sample standard deviation. The t statistic = (x bar minus mu under Ho:)/ standard error of the mean. Let’s work a few problems.
Example For a company when they look at the past they have seen the average dollar amount on an invoice be $120. Over time this will be monitored and they will see if this changes. The question now is about whether or not the population mean is still 120. We will make this the null hypothesis. So we have Ho: μ = 120 and Ha: μ≠ 120 a) With the critical value approach the value of alpha has to be determined and say we have alpha = .05. When the alternative hypothesis has a not equal to sign we have a two tail test. This means we have .025 in each tail. But since we do not know the population standard deviation we have to use the t distribution. With a sample size of 12 we look in the df = n – 1 = 12 – 1 = 11 row. The critical t’s are thus -2.201 and 2.201. Let’s see what this looks like in a graph on the next slide
.025 alpha/2 = .025 lower Critical t = -2.201 Upper Critical t = 2.201 Let’s review what we have done. We have a null and alternative hypothesis. We have an alpha value and a sample size we will use. The critical values of t break up the t distribution into rejection and acceptance of the null hypothesis regions. Our decision rule will be this: If when we take a sample and calculate both a sample mean and the associated t value, called the t test statistic (and I will write tstat), if the tstat is less than the lower critical value or greater than the upper critical value we will reject the null. If the tstat is in the middle of the critical values we do not reject the null.
Now say we get an actual sample of 12 invoices and we see the sample mean is 112.85 and the sample standard deviation is 20.80. The tstat from the sample is (112.85 – 120)/(20.804/sqrt(12)) = - 1.19. Since the value of the tstat is -1.19 and since this value is in the middle of the critical values we do not reject the null. b) To proceed with the p-value approach to hypothesis testing I would like us to explore the t distribution with df = 11 row. Let’s see this on the next slide.
T distribution with DF = 11 .25 is area under curve .10 is area under curve .05 is area under curve .025 is area under curve .01 is area under curve .005 is area under curve -3.106 -2.718 -2.201 -1.796 -1.3634 -0.697 .0.697 1.363 1.796 2.201 2.718 3.106 Here I have marked off on the t distribution the positive and negative values. On the next slide I reproduce this with the tails colored in for when alpha is picked to be .05.
T distribution with DF = 11 .25 is area under curve .10 is area under curve .05 is area under curve .025 is area under curve .01 is area under curve .005 is area under curve -3.106 -2.718 -2.201 -1.796 -1.363 -0.697 .0.697 1.363 1.796 2.201 2.718 3.106 So, with alpha = .05 the critical t’s are -2.201 and 2.201. Next we take the sample mean and calculate the tstat. Again, in our example we had -1.19. The -1.19 occurs here on the number line. This falls between -1.363 and -.0697. The tail areas for these two values are 0.10 and 0.25, respectively. 17
The tail area for -1.19 is thus between 0.10 and 0.25. This is the basis for the p-value. But, because of the way the t distribution shows up in our book the best we can say about the tail area for the tstat is between 0.10 and 0.25. Since our alternative hypothesis Ha is a not equal to sign we have to double the tail area for -1.19 and so we say the p-value is between 0.20 and 0.50. (A computer or better table would have us see the tail area doubled would be .259 – we do not need that here.) Here is how we use the p-value approach. If the p-value is less than or equal to alpha reject the null, otherwise do not reject the null. In our example the p-value is at least 0.20 which is > .05 so we do not reject the null.
Say from a problem we see Ho: μ = 50 and thus H1: μ≠50. Also say x bar = 56 and s = 12 and the sample size is 16. The tstat = (56 – 50)/(12/sqrt16) = 2.00 On the next slide I show what the critical t’s would be in this problem if we wanted an alpha = .10. Note with n = 16 the df = 15.
alpha = .10/2 Critical t = 1.753 -1.753 The critical values of t are -1.753 and 1.753. If our tstat is outside these two values then we are saying that the sample information is placing us in a low probability area. This makes us suspicious of the null hypothesis and thus we reject it. Our tstat = 2.00 places us in the rejection region. Note if alpha was .05 we would not reject the null (critical values of – and + 2.131). The p-value is thus between .05 and .10.