Comparison of Two Means

11 Comparison of Two Means

Tests involving two samples– comparing variances, F distribution • TOH - xA = xB ? • Step 1 - F-test  sA2 = sB2 ? • Step 2 - t-test  use different formula for (i) sA2 = sB2 . (ii) sA2 ≠sB2 • Goal – whether a given gene is expressed differently between patients and healthy subjects • This involves comparing the mean of the two samples • To answer this question one must first know whether the two samples have the same variance • The method used to compare variances of two samples – F distribution • Then we use t-test to test whether the mean of the gene is expressed differently between patients and healthy subjects

Tests involving two samples– comparing variances, F distribution • The values measured in controls are: 10, 11, 11, 12, 15, 13, 12 • The values measured in patients are: 12, 13, 13, 15, 12, 18, 17, 16, 16, 12, 15, 10, 12. Is the variance different between the controls and the patients at a 5% significant level ? • H0: sA2 = sB2, H1: sA2 ≠sB2 • Need to find a new test statistics, • Two-tail test • Notation: assume A = controls, B = patients in the following calculation • Controls sample A has d.o.f and variance = 6 and 2.66 • Patients sample B has d.o.f and variance = 12 and 5.74 • Consider the ratio F = 2.66/5.74 = 0.4634, • Significant level for two-tail test = 5%/2 = 2.5% • F-distribution (right tail) F0.025(6,12)= 3.7283 (from Excel) • F0.975(6,12) = 0.1864 (from Excel) F- distribution (right tail)http://mips.stanford.edu/public/classes/stats_data_analysis/234_99.html

F distribution – right tail 0.025 see next page

Tests involving two samples– comparing variances, F distribution • F0.025(6,12)= 3.7283 • F0.975(6,12) = 0.1864

Tests involving two samples – comparing variances, F-distribution • Usually we have F-distribution table for 0.01, 0.025, 0.05but not 0.975!! • Given F0.025(6,12) = 3.7283, how to find F0.975(6,12) ??? • The F distribution has the interesting property that : • left tail for an F with n1 and n2 d.o.f. is = the reciprocal of the right tail for an F with the d.o.f reversed: • F[Left tail(nA,nB)]a = 1/F[right tail(nB,nA)]1-a • F0.975(6,12) = 1/ F(1-0.975)(12,6) • F0.975(6,12) = 1/ F0.025(12,6) = 1/5.3662 = 0.18635 • back to our null hypothesis test • Since 0.18635 < 0.4634 < 3.7283 • Since the F-statistics is in between 0.18635 and 3.7283, we will accept the null hypothesis there is no difference between controls and patients

Tests involving two samples – comparing variances, F-distribution • Now, let us consider the ratio • The two different choices should lead to same conclusion, since the conclusion should not depend which variance we put on the numerator or denominator • Controls sample A has d.o.f and variance = 6 and 2.66 • Patients sample B has d.o.f and variance = 12 and 5.74 • F = 5.74/2.66 = 2.1579 • F-distribution (right tail) F0.025(12,6)= 5.3662 (from Excel) • F0.975(12,6) = 0.2682 (from Excel) • Since 0.2682 < 2.1579 < 5.3662 • Since the F-statistics is in between 0.2682 and 5.366, we will accept the null hypothesis there is no difference between controls and patients REMARK • The two F-tests are reciprocal to each other • That is 0.18635 < 0.4634 < 3.7283 • Reciprocal  1/0.18635 > 1/0.4634 >1/3.7283 •  5.3662 > 2.1579 > 0.2682

Tests involving two samples – comparing means The gene expression level of the gene AC002378 is measured for the patients, P and controls, C are given in the following: geneID P1 P2 P3 P4 P5 P6 AC002378 0.66 0.51 1.12 0.83 0.91 0.50 geneID C1 C2 C3 C4 C5 C6 AC002378 0.41 0.57 -0.17 0.50 0.22 0.71 • F-test: H0: sP2 = sC2, H1: sP2 ≠sC2 • T-test: H0: xP = xC, H1: xP ≠ xC • Mean of gene expression level of patients, XP = 0.755 • Mean of gene expression level of controls, XC = 0.373 • sP2 = 0.059, sC2 = 0.097 • To test whether the two samples have the same variance or not, we perform the F-test at a 5% level • F = 0.059/0.097 = 0.60, d.o.f. = 10 • F0.025(5,5) = 7.146, F0.975(5,5) = 0.1399 • In between 0.1399 and 7.146  accept the null hypothesis  the patients and controls have the same variances

Tests involving two samples – comparing means • t-statistic of two independent samples with equal variances • The t-score is where • the p-value, or the probability of having such a value by chance is 0.0400. This value is smaller than the significant level 0.05, and therefore we reject the null hypothesis, the gene AC002378 is expressed differently between cancer patients and healthy subjects.

Tests involving two samples – comparing means • t-statistic of two independent samples with unequal variances • The modifiedt-score is • The degree of freedom n need to be adjusted as • This value is not an integer and needs to be rounded down

Chapter11 p259

Chapter11 p264

Chapter11 p2268

Comparison of Two Means