Inference About 2 or More Normal Populations, Part 2

Inference About 2 or More Normal Populations, Part 2 BMTRY 726 2/11/2014

Two Independent Samples Collect two independent random samples, one from each population Note we assume (i) Normality (ii) Equal covariance matrices (iii) Independent random samples

Two Independent Samples Compute summary statistics for each sample Since we assume S1 = S2 = S, we can compute a pooled estimate of the common covariance matrix

Two Independent Samples Test Use to estimate Note that estimate S with S Then reject if

Two Independent Samples Proof:

Two Independent Samples Properties of the two sample T2 test: (i) Equivalent to the LRT (ii) the square of the largest one-dimensional t-test (iii) T2 is the most powerful test in the class of tests that are invariant under affine transformations of full rank.

Two Independent Samples We can use this fact to estimate a confidence region for our estimate of the difference between the two populations An exact (1-a)100% confidence region for m1-m2 consists of all vectors dthat satisfy

Two Independent Samples We can also estimate simultaneous confidence intervals using what we already know… All linear functions of Or alternatively Bonferroni confidence intervals

Example Consider measures of steel (1)yield point and (2)ultimate strength captured for randomly selected samples of steel made at two rolling temperatures Measurements are reported on a scale of 1000 pounds per square inch

Example Temperature 1: what are the mean and variance?

Example Temperature 2: what are the mean and variance?

Example Find the pooled variance

Example Find T2 and the F-value: Which attributes are different at different temperatures?

95% Confidence Intervals Simultaneous:

95% Confidence Intervals Bonferroni Method:

95% Confidence Intervals One at a Time Method:

Two Independent Samples What if the model assumptions are incorrect? Unequal Covariance Matrices • If n1 = n2, large samples… unequal covariance matrices have little effect on the Type I error level and power of the 2-sample T2 test. • If n1 >n2 and eigenvalues of are all less than one, Type I error level is inflated

Our pooled estimate gives more weigh to the smaller matrix S1 which makes S too small, making T2 too large when H0 is true. 3. If n1 >n2 and eigenvalues of are all larger than one, Type I error level is too small and power of T2 is reduced.

Testing Equality of S1 and S2 • Bartlett’s test is a very popular method • Tests S1 and S2 in terms of generalized variances • Also a test of non-normality • The problem is that if normality assumption fails, results can be seriously misleading • Box’s M test provides an better alternative as long as sample sizes are large enough

Box’s Test We can test if our sample covariances are equal Box’s Test: -Note, g is the number of groups thus this test works for > 2 groups -However, best if nj>20 and p and g are not larger than 5

Example Back to our two measures of steel (1)yield point and (2)ultimate strength at two rolling temperatures First calculate u

Example Now calculate Box’s M

Remedies: • Data transformations • Large sample chi-square test which gives approximately the correct Type I error level (when n1 and n2 are both large) (i) populations not MVN (ii) unequal sample sizes (iii)

Large Sample Approximation If we no longer have a normal distribution for nor do we have a Wishart distribution for S-1 If n1-p and n2-p are large, CLT holds:

When Samples are Small • When n1 and n2 are small, we can not rely on CLT (what do we do). • The Behrens-Fisher procedure was developed to handle this scenario • Requires n1 and n2 are > p • Estimates an approximate distribution for T2

When Samples are Small • Behrens-Fisher procedure • Where

Samples not taken from MVN population: Type I error level for the 2-sample T2 test is not particularly affected by moderate departures from normality if the two populations have similar distributions. The one-sample test is much more sensitive to lack of normality, particularly skewed distributions

Inference About 2 or More Normal Populations, Part 2