460 likes | 769 Views
Statistics 303. Chapter 7 Inference for Means. Inference for Means. To this point, when examining the mean of a population we have always assumed that the population standard deviation ( s ) was known. In practice this is seldom the case.
E N D
Statistics 303 Chapter 7 Inference for Means
Inference for Means • To this point, when examining the mean of a population we have always assumed that the population standard deviation (s) was known. • In practice this is seldom the case. • We usually must estimate the population standard deviation with the sample standard deviation s (for a review of s, see pp. 49-50 of the book). • When we do this, the sampling distribution of the sample mean is no longer normally distributed, because of the adjustment for estimating s with s. • Thus, instead of using the Z, the standard normal distribution, we must use the appropriate t-distribution.
Inference for Means • The t-distribution • Although there is only one Z-distribution, there are many, many t-distributions. • In fact, there is a different t-distribution for each sample size used. • The shape of each t-distribution is very similar to the Z-distribution, but is slightly flatter. • The larger the sample size, the closer the t-distribution is to the Z-distribution.
Inference for Means • The t-distribution • The way we distinguish between various t-distributions is by finding the degrees of freedom (df) that correspond to the sample size. • When we are looking at only one sample, the degrees of freedom are the sample size minus one: df = n – 1. • We say that the one-sample t-statistic: has the t distribution with n – 1 degrees of freedom.
Inference for Means • The t-distribution • A table of t distribution critical values can be found in Table D (the last page of the book). • Note that these values are areas to the right, not areas to the left as in the Z-table. • In Table D, the degrees of freedom are listed in the left column. • The probabilities are on top (these probabilities are inside for the Z-table) • The individual t-values are inside the table. • Make sure to get acquainted with this table and how it differs from the Z-table.
Inference for Means • The t-distribution • In the book, p.452, we see an example of how the distributions compare:
Inference for Means • The t-distribution • With the change from s to s, and the change from z* to t*, the steps in producing confidence intervals and hypothesis tests are the same as we have seen previously. • In Chapter 1, p. 50, we find that s is calculated from the data using the formula: This formula is very cumbersome. Ideally, a computer is used to calculate s, particularly for large data sets.
Calculated from the data. Calculated from the data. Sample size t* is found in table D at the back of the book. It must correspond to the appropriate df = n – 1. It is easiest to find the confidence level at the bottom of the table and go up to the correct df. Confidence Interval for m with Unknown s • The formula for a confidence interval for m with unknown s is
Confidence Interval for m with Unknown s • Confidence Interval Example • An economist wants to determine the average amount a family of four in the United States spends on housing annually. He randomly selects 85 families of size four and finds the amount they spent on housing the previous year. • The economist wishes to estimate the mean with 99% confidence.
Calculated from the data. Confidence Interval for m with Unknown s • Confidence Interval Example • Information given: Sample size: n = 85. Data: $6,789, $8,233, $4,784, …, $5,974 (85 numbers) df = n – 1 = 85 – 1 = 84
t* is found in table D. We first go to the 99% confidence level at the bottom. Then we go up to 80 df (always round down). Thus, t* = 2.639. Confidence Interval for m with Unknown s • Confidence Interval Example This is a 99% confidence interval for the true average amount a family of four in the United States spends on housing annually.
Hypothesis Test for m with Unknown s • The steps for a hypothesis test are the same as those seen previously, namely, • 1. State the null hypothesis. • 2. State the alternative hypothesis. • 3. State the level of significance (i.e., a = 0.05). • 4. Calculate the test statistic (note change):
Hypothesis Test for m with Unknown s • 5. Find the P-value: • For a two-sided test: • For a one-sided test: • For a one-sided test: Because of the limited number of t-values given in Table D, it is more common to find a range for the P-value, rather than the exact value (as will be seen in the example). Computers can be used to obtain exact values.
Hypothesis Test for m with Unknown s • 6. Reject or fail to reject H0 based on the P-value. • If the P-value is less than or equal to a, reject H0. • It the P-value is greater than a, fail to reject H0. • 7. State your conclusion. • If H0 is rejected, “There is significant statistical evidence that the population mean is different than m0.” • If H0 is not rejected, “There is not significant statistical evidence that the population mean is different than m0.” Notice that these last two steps are exactly the same as for the case where s is known.
Hypothesis Test for m with Unknown s • T.V. Example • Suppose that the data collected from our class survey is a random sample from the entire university (which it obviously is not). We wish to see if there is evidence that the average amount of television watched for students here is more than 7 hours per week.
Hypothesis Test for m with Unknown s • T.V. Example • Information given: Sample size: n = 38.
Hypothesis Test for m with Unknown s • T.V. Example • 1. State the null hypothesis: • 2. State the alternative hypothesis: • 3. State the level of significance from “is more than” Assume a = 0.05
Hypothesis Test for m with Unknown s • T.V. Example • 4. Calculate the test statistic. • 5. Find the P-value. Remember the table gives probabilities to the right so we do not use the technique of subtracting from 1. Use df = 30 (rounding down)
Hypothesis Test for m with Unknown s • T.V. Example • 6. Do we reject or fail to reject H0 based on the P-value? • 7. State the conclusion. P-value = between 0.15 and 0.20 is greater than a = 0.05. Therefore, we fail to reject H0 “There is not significant statistical evidence that the average amount of television watched is more than 7 hours per week at the 0.05 level of significance.”
Matched Pairs t-test • To this point we have only looked at tests for single samples. • Soon we will look at confidence intervals and hypothesis tests for comparing two groups. • When each individual can be given both treatments, we can reduce the two samples to a single sample using a matched pairs design. • Examples: • Students are each given a pre-test and a post-test to determine the amount of material learned in a given time interval. • To examine the effect of a new drug, a large group of identical twins is identified. One twin is given a treatment and the other a placebo. • A ophthalmologist is examining the importance of the dominant eye in reading. A large group of subjects is asked to read a passage with dominant eye covered and again with the non-dominant eye covered. • It can be seen in each of these examples that something pairs the two responses.
Matched Pairs t-test • To analyze matched pairs data, we first reduce the data from two samples to one sample and then analyze the data using one-sample techniques. • The data is reduced from two samples to one by subtracting one of the responses from the other. • We could subtract each pre-test score from each post-test score. • We could subtract each placebo response from each treatment response. • We could subtract the time taken to read the passage with the non-dominant eye from the time taken to read the passage with the dominant eye.
Matched Pairs t-test • Example: Keyboards • “Suppose we want to compare two brands of computer keyboards, which we will denote as keyboard 1 and keyboard 2. Keyboard 1 is a standard keyboard, while keyboard 2 is specially designed so that the keys need very little pressure to make them respond. The manufacturer of keyboard 2 would like to claim that typing can be done faster using keyboard 2…A simple random sample of n = 30 teachers was selected from a population of high-school teachers attending a national conference. Each teacher typed the same page of text once using keyboard 1 and once using keyboard 2. For each teacher the order in which the keyboards were used was determined by the toss of a coin. For each teacher the variable measured was the time (in seconds) to correctly type the page of text…” (from Graybill, Iyer and Burdick, Applied Statistics, 1998).
Matched Pairs t-test Reduction to one sample • Example: Keyboards • Information given: Sample size: n = 30.
Matched Pairs t-test • Example: Keyboards • 1. State the null hypothesis: • 2. State the alternative hypothesis: • 3. State the level of significance from carefully reading Assume a = 0.05
Use df = 29 Matched Pairs t-test • Example: Keyboards • 4. Calculate the test statistic. • 5. Find the P-value. Remember the table gives probabilities to the right.
Matched Pairs t-test • Example: Keyboards • 6. Do we reject or fail to reject H0 based on the P-value? • 7. State the conclusion. P-value = between 0.01 and 0.02 is less than a = 0.05. Therefore, we reject H0 “There is significant statistical evidence that the average amount of time needed to type the passage is lower for keyboard 2 than keyboard 1 at the 0.05 level of significance.”
Matched Pairs Confidence Interval • After reducing the data to a single sample, we use the same formula as for a confidence interval for m with unknown s, namely, using the mean and standard deviation of the differences.
Matched Pairs Confidence Interval • Example: Golf Balls • “In the manufacture of golf balls two procedures are used. Method I utilizes a liquid center and method II, a solid center. To compare the distance obtained using both types of balls, 12 golfers are allowed to drive a ball of each type, and the length of the drive (in yards) is measured.” (from Milton, McTeer, and Corbet, Introduction to Statistics, 1997) • The manufacturer wants to estimate the mean difference with 90% confidence.
Matched Pairs Confidence Interval • Example: Golf Balls • Information given: Sample size: n = 12. df = n – 1 = 12 – 1 = 11
t* is found in table D. We first go to the 90% confidence level at the bottom. Then we go up to 11 df. Thus, t* = 1.796. Matched Pairs Confidence Interval • Example: Golf Balls This is a 90% confidence interval for the true average difference for the distance traveled for the two types of golf balls.
Comparing Two Means • We use the same basic principles for comparing two population means as those used for examining one population mean. • If the standard deviations s1 and s2 for each of the two populations are known, the two-sample z-statistic is then But it is very rare that both population standard deviations are known. We will examine the situation in which they are not known.
Comparing Two Means • When we are interested in comparing two population means and we are estimating the population standard deviations s1 and s2 with s1 and s2, the two-sample t-statistic is then with degrees of freedom equal to the smaller of n1-1 and n2-1 (or an appropriate estimate using computer software).
Comparing Two Means • The null hypothesis can be any of the following: • The alternative hypothesis can be any of the following (depending on the question being asked): The other steps are the same as those used for the tests we have looked at previously.
Comparing Two Means • Example: Tomatoes • “There has been some discussion among amateur gardeners about the virtues of black plastic versus newspapers as weed inhibitors for growing tomatoes. To compare the two, several rows of tomatoes are planted. Black plastic is used around nine randomly selected plants and newspaper around the remaining ten. All plants start at virtually the same height and receive the same care. The response of interest is the height in feet after a month’s growth.” (from Milton, McTeer, and Corbet, Introduction to Statistics, 1997). • Perform a test to see if there is any difference between the average heights with significance level 0.10.
Comparing Two Means • Example: Tomatoes • Information given: Sample sizes: n1 = 9, n2 = 10.
Comparing Two Means • Example: Tomatoes • 1. State the null hypothesis: • 2. State the alternative hypothesis: • 3. State the level of significance from “any difference between” a = 0.10
Remember the table gives probabilities to the right. Use df = 8 Comparing Two Means • Example: Tomatoes • 4. Calculate the test statistic. • 5. Find the P-value.
Comparing Two Means • Example: Tomatoes • 6. Do we reject or fail to reject H0 based on the P-value? • 7. State the conclusion. P-value = between 0.10 and 0.20 is greater than a = 0.10. Therefore, we fail to reject H0 “There is not significant statistical evidence that the average tomato plant heights are different for the two types of weed inhibitors at the 0.10 level of significance.”
Comparing Two Means • The confidence interval for the difference of two population means (m1- m2) is Where t* comes from Table D and corresponds to the confidence level desired and df = smaller of n1-1 and n2-1 .
Comparing Two Means • Example: Commercials • “There is some concern that TV commercial breaks are becoming longer. The observations on the following slide are obtained on the length in minutes of commercial breaks for the 1984 viewing season and the current season.” (from Milton, McTeer, and Corbet, Introduction to Statistics, 1997) • Find a 95% confidence interval for the difference between the true averages of the two seasons.
Comparing Two Means • Example: Commercials • Information given: Sample sizes: n1 = 16, n2 = 16.
t* is found in table D. We first go to the 95% confidence level at the bottom. Then we go up to 15 df. Thus, t* = 2.131. Comparing Two Means • Example: Commercials This is a 95% confidence interval for the true difference of average length in minutes for commercials between 1984 and the present.
Pooled t test: Comparing Two Means • The null hypothesis can be any of the following: • The alternative hypothesis can be any of the following (depending on the question being asked):
Pooled Estimator • Previously, we discussed two-sample t procedures from two populations with two unknown standard deviations. We then used the sample standard deviations to estimate the population standard deviations. But what about when the two populations have the same standard deviation. This estimate is called the pooled estimator of σ2 because it combines the information in both samples.
Test Statistic • Suppose that an SRS of size n1 is drawn from a normal population with unknown mean μ1 and that an independent SRS of size n2 is drawn from another normal population with unknown mean μ2. Suppose also that the two populations have the SAME standard deviation. Thus, the two-sample t statistic is • With degrees of freedom equal to n1 + n2– 2
Confidence Interval • A level C confidence interval for μ1 – μ2 is • Where t* comes from Table D and corresponds to the confidence level desired and df = n1 + n2 – 2