770 likes | 928 Views
Chapter 13. Inference About Comparing Two Populations. Comparing Two Populations…. Previously we looked at techniques to estimate and test parameters for one population: Population Mean , Population Variance , and Population Proportion p
E N D
Chapter 13 Inference About Comparing Two Populations
Comparing Two Populations… Previously we looked at techniques to estimate and test parameters for one population: Population Mean , Population Variance , and Population Proportion p We will still consider these parameters when we are looking at two populations, however our interest will now be: The difference between two means. The ratio of two variances. The difference between two proportions.
Population 1 Sample, size: n1 Statistics: Parameters: Difference of Two Means… In order to test and estimate the difference between two population means, we draw random samples from each of two populations. Initially, we will consider independent samples, that is, samples that are completely unrelated to one another. (Likewise, we consider for Population 2)
Difference of Two Means In order to test and estimate the difference between two population means, we draw random samples from each of two populations. Initially, we will consider independent samples, that is, samples that are completely unrelated to one another. Because we are comparing two population means, we use the statistic:
Sampling Distribution of 1. is normally distributed if the original populations are normal –or– approximately normal if the populations are nonnormal and the sample sizes are large (n1, n2 > 30) 2. The expected value of is 3. The variance of is and the standard error is:
Making Inferences About Since is normally distributed if the original populations are normal –or– approximately normal if the populations are nonnormal and the sample sizes are large (n1, n2 > 30), then: is a standard normal (or approximately normal) random variable. We could use this to build test statistics or confidence interval estimators for …
Making Inferences About …except that, in practice, the z statistic is rarely used since the population variances are unknown. Instead we use a t-statistic. We consider two cases for the unknown population variances: when we believe they are equal and conversely when they are not equal. ??
When are variances equal? How do we know when the population variances are equal? Since the population variances are unknown, we can’t know for certain whether they’re equal, but we can examine the sample variances and informally judge their relative values to determine whether we can assume that the population variances are equal or not.
Test Statistic for (equal variances) • Calculate – the pooled variance estimator as… • …and use it here: degrees of freedom
CI Estimator for (equal variances) The confidence interval estimator for when the population variances are equal is given by: pooled variance estimator degrees of freedom
Test Statistic for (unequal variances) The test statistic for when the population variances are unequal is given by: Likewise, the confidence interval estimator is: degrees of freedom
≥ Which case to use? Which case to use? Equal variance or unequal variance? Whenever there is insufficient evidence that the variances are unequal, it is preferable to perform the equal variances t-test. This is so, because for any two given samples: The number of degrees of freedom for the equal variances case The number of degrees of freedom for the unequal variances case Larger numbers of degrees of freedom have the same effect as having larger sample sizes ≥
Example: Making an inference about m1– m2 • Example13.1 • Do people who eat high-fiber cereal for breakfast consume, on average, fewer calories for lunch than people who do not eat high-fiber cereal for breakfast? • A sample of 150 people was randomly drawn. Each person was identified as a consumer or a non-consumer of high-fiber cereal. • For each person the number of calories consumed at lunch was recorded.
Example: Making an inference about m1– m2 • Solution: • The data are interval. • The parameter to be tested is • the difference between two means. • The claim to be tested is: • The mean caloric intake of consumers (m1) • is less than that of non-consumers (m2).
Example: Making an inference about m1– m2 • The hypotheses are: • H0: (m1 - m2) = 0 • H1: (m1 - m2) < 0 • To check the whether the population variances are equal, we use (Xm13-01) computer output to find the sample variances We have s12= 4103, and s22 = 10,670. • It appears that the variances are unequal.
Example 13.1… A sample of 150 people was randomly drawn. Each person was identified as a consumer or a non-consumer of high-fiber cereal. For each person the number of calories consumed at lunch was recorded. The data: Independent Pop’ns; Either you eat high fiber cereal or you don’t n1+n2=150 Recall H1: There is reason to believe the population variances are unequal…
COMPUTE Example 13.1… Thus, our test statistic is: The number of degrees of freedom is: Hence the rejection region is…
COMPUTE Example 13.1… Our rejection region: Our test statistic: Since our test statistic (-2.09) is less than our critical value of t (-1.658), we reject H0 in favor of H1 — that is, there is sufficient evidence to support the claim that high fiber cereal eaters consume less calories at lunch. Compare INTERPRET
Confidence Interval… Suppose we wanted to compute a 95% confidence interval estimate of the difference between mean caloric intake for consumers and non-consumers of high-fiber cereals… That is, we estimate that non-consumers of high fiber cereal eat between 1.56 and 56.86 more calories than consumers.
Example: Making an inference about m1– m2 • Example 13.2 • An ergonomic chair can be assembled using two different sets of operations (Method A and Method B) • The operations manager would like to know whether the assembly time under the two methods differ.
Example: Making an inference about m1– m2 • Example 13.2 • Two samples are randomly and independently selected • A sample of 25 workers assembled the chair using method A. • A sample of 25 workers assembled the chair using method B. • The assembly times were recorded • Do the assembly times of the two methods differs?
Example 13.2 : Making an inference about m1– m2 Assembly times in Minutes • Solution • The data are interval. • The parameter of interest is the difference • between two population means. • The claim to be tested is whether a difference • between the two methods exists.
Example 13.2 : Making an inference about m1– m2 • Compute: Manually • The hypotheses test is: • H0: (m1 - m2) = 0 H1: (m1 - m2) ¹ 0 • To check whether the two unknown population variances are equal we calculate S12 and S22 (Xm13-02).
Example: Making an inference about m1– m2 COMPUTE • Manually • To calculate the t-statistic we have:
COMPUTE Example 13.2… The assembly times for each of the two methods are recordedand preliminary data is prepared… The sample variances are similar, hence we will assume that the population variances are equal…
COMPUTE Example 13.2… Recall, we are doing a two-tailed test, hence the rejection region will be: The number of degrees of freedom is: Hence our critical values of t (and our rejection region) becomes:
COMPUTE Example 13.2… In order to calculate our t-statistic, we need to first calculate the pooled variance estimator, followed by the t-statistic…
INTERPRET Example 13.2… Since our calculated t-statistic does not fall into the rejection region, we cannot reject H0 in favor of H1, that is, there is not sufficient evidence to infer that the mean assembly times differ.
Confidence Interval… We can compute a 95% confidence interval estimate for the difference in mean assembly times as: That is, we estimate the mean difference between the two assembly methods between –.36 and .96 minutes. Note: zero is included in this confidence interval…
Design A Design B Checking the required Conditions for the equal variances case (Example 13.2) The data appear to be approximately normal
Identifying Factors I… Factors that identify the equal-variances t-test and estimator of :
Identifying Factors II… Factors that identify the unequal-variances t-test and estimator of :
13.4 Matched Pairs Experiment • What is a matched pair experiment? • Why matched pairs experiments are needed? • How do we deal with data produced in this way? The following example demonstrates a situation where a matched pair experiment is the correct approach to testing the difference between two population means.
13.4 Matched Pairs Experiment Example 13.3 • To investigate the job offers obtained by MBA graduates, a study focusing on salaries was conducted. • Particularly, the salaries offered to finance majors were compared to those offered to marketing majors. • Two random samples of 25 graduates in each discipline were selected, and the highest salary offer was recorded for each one. The data are stored in file Xm13-03. • Can we infer that finance majors obtain higher salary offers than do marketing majors among MBAs?.
Example 13.3 • Solution • Compare two populations of interval data. The parameter tested is m1 - m2 m1 The mean of the highest salaryoffered to Finance MBAs • H0: (m1 - m2) = 0 H1: (m1 - m2) > 0 m2 The mean of the highest salaryoffered to Marketing MBAs
Example 13.3 From the data we have: • Solution – continued There is insufficient evidence to conclude that Finance MBAs are offered higher salaries than marketing MBAs. • Let us assume equal variances
The effect of a large sample variability • Question • The difference between the sample means is 65624 – 60423 = 5,201. • So, why could we not reject H0 and favor H1 where(m1 – m2 > 0)?
The effect of a large sample variability • Answer: • Sp2 is large (because the sample variances are large) Sp2 = 311,330,926. • A large variance reduces the value of the t statistic and it becomes more difficult to reject H0.
Reducing the variability The range of observations sample A The values each sample consists of might markedly vary... The range of observations sample B
Reducing the variability Differences ...but the differences between pairs of observations might be quite close to one another, resulting in a small variability of the differences. The range of the differences 0
Group 1Group 2 Difference 10 12 - 2 15 11 +4 Mean1 =12.5 Mean2 =11.5 Mean1 – Mean2 = 1 Mean Differences = 1 The matched pairs experiment • Since the difference of the means is equal to the mean of the differences we can rewrite the hypotheses in terms of mD (the mean of the differences) rather than in terms of m1 – m2. • This formulation has the benefit of a smaller variability.
The matched pairs experiment • Example 13.4 • It was suspected that salary offers were affected by students’ GPA, (which caused S12 and S22 to increase). • To reduce this variability, the following procedure was used: • 25 ranges of GPAs were predetermined. • Students from each major were randomly selected, one from each GPA range. • The highest salary offer for each student was recorded. • From the data presented can we conclude that Finance majors are offered higher salaries?
Example 13.4… The numbers in black are the original starting salary data; the number in blue were calculated. although a student is either in Finance OR in Marketing (i.e. independent), that the data is grouped in this fashion makes it a matched pairs experiment (i.e. the two students in group #1 are ‘matched’ by their GPA range the difference of the means is equal to the mean of the differences, hence we will consider the “mean of the paired differences” as our parameter of interest:
IDENTIFY Example 13.4… DoFinancemajors have higher salary offers than Marketing majors? Since: We want to research this hypothesis: H1: (and our null hypothesis becomes H0: )
Test Statistic for The test statistic for the mean of the population of differences ( ) is: which is Student t distributed with nD–1 degrees of freedom, provided that the differences are normally distributed. Thus our rejection region becomes:
COMPUTE Example 13.4… From the data, we calculate… …which in turn we use for our t-statistic… …which we compare to our critical value of t:
INTERPRET Example 13.4… • Since our calculated value of t (3.81) is greater than our critical value of t (1.711), it falls in the rejection region, hence we reject H0 in favor of H1; that is, there is overwhelming evidence (since the p-value = .0004) that Finance majors do obtain higher starting salary offers than their peers in Marketing. Compare…
Confidence Interval Estimator for We can derive the confidence interval estimator for algebraically as: In the previous example, what is the 95% confidence interval estimate of the mean difference in salary offers between the two business majors? That is, the mean of the population differences is between LCL=2,321 and UCL=7,809 dollars. Example 13.5
Identifying Factors… Factors that identify the t-test and estimator of :
Inference about the ratio of two variances So far we’ve looked at comparing measures of central location, namely the mean of two populations. When looking at two population variances, we consider the ratio of the variances, i.e. the parameter of interest to us is: The sampling statistic: is F distributed with degrees of freedom.