140 likes | 319 Views
Difference Between the Means of Two Populations . We have be studying inference methods for a single variable. When the variable was quantitative we had inference for the population mean. When the variable was qualitative we had inference for the population proportion.
E N D
We have be studying inference methods for a single variable. When the variable was quantitative we had inference for the population mean. When the variable was qualitative we had inference for the population proportion. Now we want to study inference methods for two variables. Both variables could be quantitative, both qualitative or one of each. Depending on the which we have, we will look to certain techniques. At this stage of the game we will begin to look at these different methods. I want to start with 1 quantitative variable and one qualitative variable. In fact the qualitative variable is special here: the variable identifies membership in one of only two groups. Then on the quantitative variable we segment each observation into the appropriate group and think about the mean of the quantitative variable of each group.
Our context here is that we really want to know about the population of the two groups, but we will only take a sample from each group. We will look at both confidence intervals and hypothesis tests in this context. Some notation: μi = the population mean of group i for i = 1, 2. σi = the population standard deviation for group i for i = 1, 2. xi = the sample mean of group i for i = 1, 2. si = the sample standard deviation for group i for i = 1, 2. Now for ease of typing I will call the population means mu1 or mu2, population standard deviations sigma1 or sigma2, sample means xbar1 or xbar2, and sample standard deviations s1 or s2. n1 is the sample size from population 1 and n2 has similar meaning.
Our context for inference is really the difference in means: mu1 minus mu2. So we are checking to see what difference there is in the means from the two groups. Our point estimator will be xbar1 minus xbar2. In a repeated sampling context the point estimator would vary from sample to sample. As an example say I want to check the average age of students in the economics program and the finance program. One sample from each group would yield one estimate and the estimate would likely be different when I get a different sample (from each major). Also note the sample obtained from group 1 is independent of the sample obtained from group 2. The sampling distribution of xbar1 minus xbar2 will be studied next.
The sampling distribution of xbar1 minus xbar2 Case 1 – we can use the normal distribution for the sampling distribution when sigma1 and sigma2 are known. This means we will use Z in our confidence intervals and hypothesis tests The center of the sampling distribution is mu1 minus mu2 and the standard error is the (note or digress x^2 means x squared ) square root[((sigma1^2)/n1)+((sigma2^2)/n2)]. Case 2 – we can use the t distribution for the sampling distribution when sigma1 and sigma2 are unknown. This means we will use t in our confidence intervals and hypothesis tests. The center of the sampling distribution is mu1 minus mu2 and the standard error is seen as the denominator of the equation on page 389. This is not pretty, but we must use it. Note that when using a t distribution one needs to have a degrees of freedom value. In our current context the value is n1 + n2 – 2.
Inference for case 1 Confidence interval We are C% confident the unknown population difference mu1 minus mu2 is in the interval (xbar1 minus xbar2) ± MOE, Where MOE = margin of error and this equals the appropriate Z times the standard error of the sampling distribution. Remember if C = 95 the Z = 1.96, and if C = 90 the Z = 1.645, and if C = 99 the Z = 2.58.
Hypothesis Test Recall from our past work that in an hypothesis test context we have a null and an alternative hypothesis. Plus the form of the alternative hypothesis will determine if we have a one or a two tailed test. Two tailed test When we study the difference in the means from two populations if we feel there is a difference of Do, but not concerned about the difference being positive or negative, then the null and alternative hypotheses are Ho: mu1 minus mu2 = Do, Ha: mu1 minus mu2 ≠ Do, and we have a two tailed test. Based on an alpha value (the probability of a type I error), we pick critical values of Z and if the calculated Z is more extreme than either critical value we reject the null and go with the alternative.
Alpha/2 alpha/2 lower Critical Z Upper Critical Z The calculated value of Z from the sample information = [(xbar1 minus xbar2) minus Do] divided by the standard error listed on slide 5 with case 1. Another way to think of the hypothesis test is with the use of the p-value for the calculated Z. If the p-value < alpha, reject the null. Otherwise you have to stick with the null. In practice with a two tailed test you will find the p-value as the area on one side of the distribution but you must double it to be on both sides.
One tailed test When the researcher has the feeling that the difference in mu1 and mu2 should be positive, then the alternative will reflect this feeling and we will have Ho: mu1 minus mu2 ≤ Do Ha: mu1 minus mu2 > Do. The signs are reversed when the researcher feels the difference should be negative. The test procedure proceeds in the same fashion as with the two-tailed test except the focus is just on one side of the distribution as directed by the alternative hypothesis. Note that Do is often zero. In that case we just want to see if the group means are different.
Common critical Z’s Two tailed test One tailed test (negative if on left) Alpha = .05 1.96 1.645 Alpha = .01 2.58 2.326 Alpha = .10 1.645 1.282
Inference for case 2 Inference for case 2 is similar to case 1, except in how the standard error is calculated and that is shown on slide 5 and in using t. Let’s do problem 10.12 pages 396/397 Here is an Excel printout To assist us. Note means Are incorrect in the book – Check it out!
a) Let’s say mu1= mean score for fixed-rate and mu2= mean score for variable-rate. Ho: mu1 = mu2 H1: mu1 ≠ mu2. (I have a not equal sign because it says “differ.”) b) With a two tail test the critical t’s (use t because population standard devs are Unknown – with df = 8 + 5 -2 = 11 ) with alpha = .10 we have-1.796 and 1.796 with alpha = .05 we have -2.201 and 2.201 with alpha = .01 we have -3.106 and 3.106 with alpha = .001 we have -4.437 and 4.437 The t stat from Excel = 3.743, so we reject the null and go with the alternative hypothesis at all the levels listed except when alpha = .001. This is very strong evidence! c) The two-tail p-value is .0032. This means p-value < alpha in all cases except when alpha = .001 and conclusion is the same as in part b).
d) The 95% confidence interval has us use the t value 2.201. The interval is (8.08375 – 6.966) – and + 2.201(square root(0.2744(1/8 + 1/5))) = 1.11775 – and + 0.657285 gives the interval [.460465, 1.775035]. We can be 95% confident the difference exceeds .4 because the interval is everywhere above .4. e) Ho: mu1 - mu2 < or = .4 H1: mu1 - mu2 > .4. The t stat is (1.11775 - .4)/(square root(0.2744(1/8 + 1/5)) = .71775/.2986302 = 2.4034742 With a one-tail test the critical t is 1.796. Since the the t stat is > the critical t we reject the null and go with the alternative.
Note that in this problem parts b and c are similar ways of concluding the same thing. Parts d and e also are similar ways of concluding the same thing.