740 likes | 769 Views
Economics 173 Business Statistics. Lectures 5 & 6 Summer, 2001 Professor J. Petry. Chapter 12. Inference about the Comparison of Two Populations. 12.1 Introduction. Variety of techniques are presented whose objective is to compare two populations. We are interested in:
E N D
Economics 173Business Statistics Lectures 5 & 6 Summer, 2001 Professor J. Petry
Chapter 12 Inference about the Comparison ofTwo Populations
12.1 Introduction • Variety of techniques are presented whose objective is to compare two populations. • We are interested in: • The difference between two means. • The ratio of two variances. • The difference between two proportions.
12.2 Inference about the Difference b/n Two Means: Independent Samples • Two random samples are drawn from the two populations of interest. • Because we are interested in the difference between the two means, we shall build the statistic for each sample (and support the analysis by the statistic S2 as well).
The Sampling Distribution of • is normally distributed if the (original) population distributions are normal . • is approximately normally distributed if the (original) population is not normal, but the sample size is large. • Expected value of is m1 - m2 • The variance of is s12/n1 + s22/n2
If the sampling distribution of is normal or approximately normal we can write: • Z can be used to build a test statistic or a confidence interval for m1 - m2
Practically, the “Z” statistic is hardly used, because the population variances are not known. t ? ? S12 S22 • Instead, we construct a “t” statistic using the • sample “variances” (S12 and S22).
Two cases are considered when producing the t-statistic. • The two unknown population variances are equal. • The two unknown population variances are not equal.
Example: S12 = 25; S22 = 30; n1 = 10; n2 = 15. Then, Case I: The two variances are equal • Calculate the pooled variance estimate by: n2 = 15 n1 = 10
Build an interval estimate or 0 • Construct the t-statistic as follows: • Perform a hypothesis test • H0: m1 - m2 = 0 • H1: m1 - m2 > 0; or < 0;
Run a hypothesis test as needed, or, build an interval estimate
Example12.1 • Do people who eat high-fiber cereal for breakfast consume, on average, fewer calories for lunch than people who do not eat high-fiber cereal for breakfast? • A sample of 150 people was randomly drawn. Each person was identified as a consumer or a non-consumer of high-fiber cereal. • For each person the number of calories consumed at lunch was recorded.
Calories consumed at lunch • Solution: • The data are quantitative. • The parameter to be tested is • the difference between two means. • The claim to be tested is that • mean caloric intake of consumers (m1) • is less than that of non-consumers (m2).
Identifying the technique • The hypotheses are: • H0: (m1 - m2) = 0 • H1: (m1 - m2) < 0 • To check the relationships between the variances, we use a computer output to find the samples’ standard deviations. We have S1 = 64.05, and S2= 103.29.It appears that the variances are unequal. • We run the t - test for unequal variances. (m1 < m2)
Calories consumed at lunch • At 5% significance level there is • sufficient evidence to reject the null • hypothesis.
Solving by hand • The interval estimator for the difference between two means is
Example 12.2 • Do job design (referring to worker movements) affect worker’s productivity? • Two job designs are being considered for the production of a new computer desk. • Two samples are randomly and independently selected • A sample of 25 workers assembled a desk using design A. • A sample of 25 workers assembled the desk using design B. • The assembly times were recorded • Do the assembly times of the two designs differs?
Assembly times in Minutes • Solution • The data are quantitative. • The parameter of interest is the difference • between two population means. • The claim to be tested is whether a difference • between the two designs exists.
Solving by hand • The hypotheses test is: • H0: (m1 - m2) = 0 H1: (m1 - m2) 0 • To calculate the t-statistic we have: • To check the relationship between the two variances calculate • the value of S1 and S2. We have S1= 0.92, and S2 =1.14. • We can infer that the two variances are equal to one another. Let us determine the rejection region
Notice the absolute value • The rejection region is • The test: Since t= 0.93 < 2.009, there is insufficient evidence to reject the null hypothesis. For a = 0.05 | t | .025 Rejection region .093 2.009
Conclusion: From this experiment, it is unclear at 5% significance level if the two job designs are different in terms of worker’s productivity. .025 Rejection region .093 2.009
Degrees of freedom t - statistic P-value of the one tail test P-value of the two tail test The Excel printout
A 95% confidence interval for m1 - m2 is calculated as follows: Thus, at 95% confidence level -0.3176 < m1 - m2 < 0.8616 Notice: “Zero” is included in the interval
Design A Design B Checking the required Conditions for the equal variances case (example 12.2) The distributions are not bell shaped, but they seem to be approximately normal. Since the technique is robust, we can be confident about the results.
12.4 Matched Pairs Experiment • What is a matched pair experiment? • Why matched pairs experiments are needed? • How do we deal with data produced in this way? The following example demonstrates a situation where a matched pair experiment is the correct approach to testing the difference between two population means.
Example 12.3 • To determine whether a new steel-belted radial tire lasts longer than a current model, the manufacturer designs the following experiment. • A pair of newly designed tires are installed on the rear wheels of 20 randomly selected cars. • A pair of currently used tires are installed on the rear wheels of another 20 cars. • Drivers drive in their usual way until the tires worn out. • The number of miles driven by each driver were recorded. See data next.
Solution • Compare two populations of quantitative data. • The parameter is m1 - m2 The hypotheses are: H0: (m1 - m2) = 0 H1: (m1 - m2) > 0 Mean distance driven before worn out occurs for the new design tires m1 Mean distance driven before worn out occurs for the existing design tires m2
We conclude that there is insufficient evidence to reject H0 in favor of H1. • The hypotheses are H0: m1 - m2 = 0 H1: m1 - m2 > 0 The test statistic is We run the t test, and obtain the following Excel results.
New design 7 6 5 4 3 2 1 0 45 60 75 90 105 More 12 10 8 Existing design 6 4 2 0 45 60 75 90 105 More While the sample mean of the new design is larger than the sample mean of the existing design, the variability within each sample is large enoughfor the sample distributions to overlap and cover about the same range. It is therefore difficult to argue that one expected value is different than the other.
Example 12.4 • to eliminate variability among observations within each sample the experiment was redone. • One tire of each type was installed on the rear wheel of 20 randomly selected cars (each car was sampled twice, thus creating a pair of observations). • The number of miles until wear-out was recorded
The range of observations sample A So what really happened here? The values each sample consists of might markedly vary... The range of observations sample B
Differences ...but the differences between pairs of observations might be quite close to one another, resulting in a small variability. The range of the differences 0
Observe the statistic t shown below and notice how a small variability of the differences (small sD) helps in rejecting the null hypothesis.
Solving by hand • Calculate the difference for each xi • Calculate the average differences and the standard deviation of the differences • Build the statistics as follows: • Run the hypothesis test using t distribution with nD - 1 degrees of freedom.
The hypotheses test for this problem is H0: mD = 0 H1: mD > 0 The statistic is The rejection region is: t > ta with d.f. = 20-1 = 19. If a = .05, t.05,19 = 1.729. Since 2.817 > 1.729, there is sufficient evidence in the data to reject the null hypothesis in favor of the alternative hypothesis. Conclusion: At 5% significance level the new type tires last longer than the current type.
Checking the required conditionsfor the paired observations case • The validity of the results depends on the normality of the differences.
12.5 Inferences about the ratio of two variances • In this section we discuss how to compare the variability of two populations. • In particular, we draw inference about the ratio of two population variances. • This question is interesting because: • Variances can be used to evaluate the consistency of processes. • The relationships between variances determine the technique used to test relationships between mean values
Point estimator of s12/s22 • Recall that S2 is an unbiased estimator of s2. • Therefore, it is not surprising that we estimate s12/s22 by S12/S22. • Sampling distribution for s12/s22 • The statistic [S12/s12] / [S22/s22] follows the F distribution. • The test statistic for s12/s22 is derived from this statistic.
S12 F = S22 S12/s12 F = S22/s22 Testing s12 / s22 • Our null hypothesis is always H0: s12 / s22 = 1 • Under this null hypothesis the F statistic becomes
The hypotheses are: H0: H1: Calories consumed at lunch Example 12.5 (see example 12.1) In order to perform a test regarding average consumption of calories at people’s lunch in relation to the inclusion of high-fiber cereal in their breakfast, the variance ratio of two samples has to be tested first.
Solving by hand • The rejection region is F>Fa/2,n1,n2 or F<1/Fa/2,n2,n1 which becomes (for a=0.05)... • The F statistic value is F=S12/S22 = .3845 • Conclusion: Because .3845<.63 we can reject the null hypothesis in favor of the alternative hypothesis. • There is sufficient evidence in the data to argue at 5% significance level that the variance of the two groups differ.
Estimating the Ratio of Two Population Variances • From the statistic F = [S12/s12] / [S22/s22] we can isolate s12/s22 and build the following interval estimator:
Example 12.6 • Determine the 95% confidence interval estimate of the ratio of the two population variances in example 12.1 • Solution • we find Fa/2,v1,v2 = F.025,40,120 = 1.61 (approximately)Fa/2,v2,v1 = F.025,120,40 = 1.72 (approximately) • LCL = (s12/s22)[1/ Fa/2,v1,v2 ] = (4102.98/10,669.770)[1/1.61]= .2388 • UCL = (s12/s22)[ Fa/2,v2,v1 ] = (4102.98/10,669.770)[1.72]= .6614
12.6 Inference about the difference between two population proportions • In this section we deal with two populations whose data are qualitative. • When data are qualitative we can (only) ask questions regarding the proportions of occurrence of certain outcomes. • Thus, we hypothesize on the difference p1-p2, and draw an inference from the hypothesis test.
x ˆ = p 1 1 n 1 Sampling Distribution of the Difference Between Two sample proportions • Two random samples are drawn from two populations. • The number of successes in each sample is recorded. • The sample proportions are computed. Sample 1 Sample size n1 Number of successes x1 Sample proportion Sample 2 Sample size n2 Number of successes x2 Sample proportion
Because p1, p2, are unknown, we use their estimates instead. Thus, are all equal to or greater than 5. • The statistic is approximately normally distributed if n1p1,n1(1 - p1), n2p2, n2(1 - p2) are all equal to or greater than 5. • The mean of is p1 - p2. • The variance of is p1(1-p1)/n1)+ (p2(1-p2)/n2)
Case 1: H0: p1-p2 =0 Calculate the pooled proportion Case 2: H0: p1-p2 =D (D is not equal to 0) Do not pool the data Testing the Difference between Two Population Proportions • We hypothesize on the difference between the two proportions, p1 - p2. • There are two cases to consider: Then Then
Example 12.7 • A research project employing 22,000 American physicians was conduct to discover whether aspirin can prevent heart attacks. • Half of the participants in the research took aspirin, and half took placebo. • In a three years period,104 of those who took aspirin and 189 of those who took the placebo had had heart attacks. • Is aspirin effective in preventing heart attacks?