250 likes | 363 Views
CHAPTER 24. INFERENCE: Comparing Means. Two-Sample Problems
E N D
CHAPTER 24 INFERENCE: Comparing Means
Two-Sample Problems The goal of this inference is to compare difference in the means of two different groups; we may wish to compare the responses to two treatments or to compare the characteristics of two populations. For these problems, we use a two-sample t-test or a two-sample t-interval. It is important to note that there needs to be a separate sample from each treatment or each population. Comparing Two Means
Assumptions and Conditions for Comparing two means Independence Randomization: Tworandom samples from two distinct populations. 10% Condition: Both samples are less than 10% of the population Normality Nearly Normal Condition: Both populations are normally distributed. Independent Groups Distinct Groups: The two samples are independent of one another; that is, there is nothing (or no one) in both groups; also, one sample has no influence on the other. Comparing Two Means
Two-Sample t Procedures • In order to calculate the confidence interval or the test statistic, we need to use the Standard Error for the difference in the means. Don’t forget: VARIANCES ADD!
Two-Sample t Procedures • Draw an SRS of size n1 from a normal population with unknown mean µ1, and draw an independent SRS of size n2 from anothernormal population with unknown mean µ2. • The confidence interval (CI) for µ1 - µ2 given by hasconfidence level at least C no matter what the population standard deviations are for either population.
Two-Sample t Procedures • For a significance test, we let t* be the upper (1 – C) / 2 critical value for the t(k) distribution with df = k. • To test the hypothesis Ho: µ1 – µ2 = 0, compute the two-sample t statistic and use P-values or critical values for the t(k) distribution.
Two-Sample t Procedures • k is degree of freedom for a two-sample t-test where the df of the smaller of (n1 – 1) and (n2 – 1). Here is the actual formula: But most people agree to either use k = the smaller of (n1 – 1) or (n2 – 1) or for the most part we let the calculator deal with this formula.
Harder Working Hearts • Resting pulse rates for a random sample of 26 smokers had a mean of 80 beats per minute (bpm) and a standard deviation of 5 bpm. Among 32 randomly selected nonsmokers, the mean was 74 bpm and the standard deviation was 6 bpm. Both sets of data were roughly symmetric and had no outliers. Is there evidence of a difference in the mean pulse rate between smokers and nonsmokers? If so, how big?
Harder Working Hearts • Step 1: Identify population Parameter, state the null and alternative Hypotheses, determine what you are trying to do (and determine what the question is asking). • We wish todetermine if there is evidence of a difference in mean pulse rate between smokers and nonsmokers. Let s represent smokers and n represent non-smokers • Null Hypotheses: H0:μs - μn = 0 • There is no difference in pulse rates. • Alternative Hypotheses: HA: μs - μn≠ 0 • There is a difference in pulse rates
Harder Working Hearts • Step 2: Verify theAssumptionsby checking the conditions • Independence: • Randomization Condition: We are told that both samples were a random sample. • 10% Condition: We have less than 10 % of all smokers and nonsmokers • There is no reason to doubt independence.
Harder Working Hearts • Step 2: Verify theAssumptionsby checking the conditions • Normality: • We are told that both sets of data are unimodal and symmetric with no outliers, so it is safe to assume that the sampling distribution of both groups are approximately normal. • Independent Groups: • Data comes from two distinct populations, smokers and nonsmokers.
Harder Working Hearts • Step 3: If conditions are met, Name the inference procedure, find theTest statistic, andObtain the p-value in carrying out the inference: Name the test: We will use a Two-Sample T-test ns= 26 nn = 32 Ss = 5 Sn = 6 Test Statistic: Obtain the p-value:
Harder Working Hearts • Step 4: Make a decision (reject or fail to reject H0). State your conclusion in context of the problem using the p-value – make sure you relate your solution to the population mean! • Such a small p-value, .0001, makes it unlikely that we get such a difference in the means from sampling error, so we reject the null hypothesis. There is strong evidence that there is a difference in pulse rates between smokers and nonsmokers.
How Much More? • Determine the true difference in mean pulse rate between smokers and nonsmokers with 99% confidence. • Step 1: State what you want to know in terms of the Parameter and determine what the question is asking • We want to find an interval that is likely, with 99% confidence, to contain the true difference in mean pulse rates, μs – μn, of smokers and non-smokers. Let s represent smokers and n represent non-smokers.
How Much More? • Determine the true difference in mean pulse rate between smokers and nonsmokers with 99% confidence. • Step 2: Verify the Assumptions by checking the conditions All assumptions and conditions were satisfied in the previous problem.
How Much More? • Determine the true difference in mean pulse rate between smokers and nonsmokers with 99% confidence. • Step 3: Name the inference, do the work, and state the Interval: Name the test: This is a Two-Sample T-Interval Interval: (2.148, 9.852)
How Much More? • Determine the true difference in mean pulse rate between smokers and nonsmokers with 99% confidence. • Step 4: State your Conclusion in context of the problem • We are 99% confident that the true difference in pulse rates between smokers and nonsmokers is between 2.148 and 9.852. In other words, we are 99% confident that smokers have a pulse rate between 2.148 and 9.852 bpm higher than nonsmokers.
Pizza, Pizza!!! • Nutritional information from two different national chains, Papa Johns and Dominos, were examined to determine the amount of saturated fat (in grams) in one slice of various pizzas. Use the data below to determine if there is a difference in the two chains in the amount of saturated fat that slices of pizzas contain. The following table represents saturated fat (in grams) per a slice of pizza:
Pizza, Pizza!!! • Step 1: Identify population Parameter, state the null and alternative Hypotheses, determine what you are trying to do (and determine what the question is asking). • We want to know if the two pizza chains have significantly different mean saturated fat contents. Let P represent Papa John’s and D represent Dominos • H0:μP - μD = 0 • There is no difference in mean saturated fat content. • HA: μP - μD≠ 0 • There is a difference in mean saturated fat content
Pizza, Pizza!!! • Step 2: Verify theAssumptionsby checking the conditions • Independence: • Randomization Condition: We are not told if the samples were randomly selected. We will assume that the pizzas were representative of the population. If not representative, our results may not be valid. • 10% Condition: It is safe to assume that we have less than 10% of all pizza slices.
Pizza, Pizza!!! • Step 2: Verify theAssumptionsby checking the conditions • Normality: • Both samples are relatively small, so we look at the sample distributions: • Independent Groups: • Data comes from two distinct populations, Papa John’s and Domino’s. Domino’s Papa John’s It is safe to assume normality, since both samples are unimodal symmetric
Pizza, Pizza!!! • Step 3: If conditions are met, Name the inference procedure, find theTest statistic, andObtain the p-value in carrying out the inference: Name the test: We will use a Two-Sample T-test nP= 14 nD = 20 SP = 1.389 Sn = 3.193 Test Statistic: Obtain the p-value:
Pizza, Pizza!!! • Step 4: Make a decision (reject or fail to reject H0). State your conclusion in context of the problem using the p-value – make sure you relate your solution to the population mean! • The p-value is extremely small, .000001, so we reject the null hypothesis. There is very strong evidence that there is a difference in saturated fat content between Papa John’s and Domino’s.
To T or Not to T, That is the Question • Sometimes, you may wonder if you should use t or z. If you know σ, use z (this is very rare and almost never happens in the real world). Whenever you use s to estimate σ, use t. • What about pooling? • If we know that the variances are equal (or willing to assume this), we pool the two groups; otherwise don’t pool difference in means.