Comparing Populations

Comparing Populations Proportions and means

Most studies will have more than one population. Example The Salk-vaccine trial 1954 A large study to determine if the Salk vaccine was effective in reducing the incidence of polio. • Two populations: • Individuals vaccinated with the Salk vaccine • Individuals vaccinated with a placebo A double blind study both individuals vaccinated and MD’s treating the cases did not know who recieved the vaccine and who received the placebo

When there are more than one population one will be interested in making comparisons. Comparisons are sometimes made through differences, sometimes through ratios

The sampling distribution of differences of Normal Random Variables An important fact: If X and Y denote twoindependent normal random variables, then : D = X –Y is normal with This fact allows us to determine the sampling distribution of differences

Comparing proportions

Situation • We have two populations (1 and 2) • Let p1 denote the probability (proportion) of “success” in population 1. • Let p2 denote the probability (proportion) of “success” in population 2. • Objective is to compare the two population proportions

Consider the statistic: This statistic has a normal distribution with using the important fact

Thus Has a standard normal distribution

We want to test either: or or

If p1 = p2 (p say) then the test statistic:

has a standard normal distribution. where is an estimate of the common value of p1 and p2.

Thus for comparing two binomial probabilities p1 and p2 The test statistic where

The Critical Region

Example • In a national study to determine if there was an increase in mortality due to pipe smoking, a random sample of n1 = 1067 male nonsmoking pensioners were observed for a five-year period. • In addition a sample of n2 = 402 male pensioners who had smoked a pipe for more than six years were observed for the same five-year period. • At the end of the five-year period, x1 = 117 of the nonsmoking pensioners had died while x2 = 54 of the pipe-smoking pensioners had died. • Is there a the mortality rate for pipe smokers higher than that for non-smokers

We want to test: The test statistic:

Note: (Non smokers) (Pipe smokers) (Combined)

The test statistic:

We reject H0 if: Not true hence we accept H0. Conclusion: There is not a significant (a= 0.05) increase in the mortality rate due to pipe-smoking

Estimating a difference proportions using confidence intervals Situation • We have two populations (1 and 2) • Let p1 denote the probability (proportion) of “success” in population 1. • Let p2 denote the probability (proportion) of “success” in population 2. • Objective is to estimate the difference in the two population proportions d = p1 – p2.

Confidence Interval for d = p1 – p2 100P% = 100(1 – a) % :

Example • Estimating the increase in the mortality rate for pipe smokers higher over that for non-smokers d = p2 – p1

Comparing Proportions

Summary The test for a difference in proportions Estimating the difference in proportion by a confidence interval (The test statistic)

Comparing Means

Comparing Means Situation • We have two normal populations (1 and 2) • Let m1and s1 denote the mean and standard deviation of population 1. • Let m2and s2 denote the mean and standard deviation of population 2. • Let x1, x2, x3 , … , xn denote a sample from a normal population 1. • Let y1, y2, y3 , … , ym denote a sample from a normal population 2. • Objective is to compare the two population means

or or We want to test either:

Consider the test statistic:

If: • will have a standard Normal distribution • This will also be true for the approximation (obtained by replacing s1 by sx and s2 by sy) if the sample sizes n and m are large (greater than 30)

Note:

Example • A study was interested in determining if an exercise program had some effect on reduction of Blood Pressure in subjects with abnormally high blood pressure. • For this purpose a sample of n = 500 patients with abnormally high blood pressure were required to adhere to the exercise regime. • A second sample m = 400 of patients with abnormally high blood pressure were not required to adhere to the exercise regime. • After a period of one year the reduction in blood pressure was measured for each patient in the study.

vs We want to test: The exercise group did not have a higher average reduction in blood pressure The exercise group did have a higher average reduction in blood pressure

Suppose the data has been collected and:

We reject H0 if: True hence we reject H0. Conclusion: There is a significant (a= 0.05) effect due to the exercise regime on the reduction in Blood pressure

Estimating a difference means using confidence intervals Situation • We have two populations (1 and 2) • Let m1 denote the mean of population 1. • Let m2 denote the mean of population 2. • Objective is to estimate the difference in the two population proportions d = m1 – m2.

Confidence Interval for d = m1 – m2

Example • Estimating the increase in the average reduction in Blood pressure due to the excercize regime d = m1 – m2

Comparing Means – small samples The t test

Comparing Means – small samples Situation • We have two normal populations (1 and 2) • Let m1and s1 denote the mean and standard deviation of population 1. • Let m2and s2 denote the mean and standard deviation of population 1. • Let x1, x2, x3 , … , xn denote a sample from a normal population 1. • Let y1, y2, y3 , … , ym denote a sample from a normal population 2. • Objective is to compare the two population means

We want to test either: or or

Consider the test statistic:

If the sample sizes (m and n) are large the statistic will have approximately a standard normal distribution This will not be the case if sample sizes (m and n) are small

The t test – for comparing means – small samples (equal variances) Situation • We have two normal populations (1 and 2) • Let m1and s denote the mean and standard deviation of population 1. • Let m2and s denote the mean and standard deviation of population 1. • Note: we assume that the standard deviation for each population is the same. s1 = s2 = s

Let

The pooled estimate of s. Note: both sxand syare estimators of s. These can be combined to form a single estimator of s, sPooled.

The test statistic: If m1 = m2 this statistic has a t distribution with n + m –2 degrees of freedom

are critical points under the t distribution with degrees of freedom n + m –2.

Example • A study was interested in determining if administration of a drug reduces cancerous tumor size. • For this purpose n +m = 9 test animals are implanted with a cancerous tumor. • n = 3 are selected at random and administered the drug. • The remaining m = 6 are left untreated. • Final tumour sizes are measured at the end of the test period

Comparing Populations