CHAPTER 6 Statistical Inference & Hypothesis Testing

CHAPTER 6Statistical Inference & Hypothesis Testing 6.1 - One Sample Mean μ, Variance σ2, Proportion π 6.2 - Two Samples Means, Variances, Proportions μ1vs.μ2σ12vs.σ22π1vs.π2 6.3 - Multiple Samples Means, Variances, Proportions μ1, …, μkσ12, …,σk2π1, …, πk

Consider two independent populations… and a random variable X, normally distributed in each. POPULATION 1 POPULATION 2 X1 ~ N(μ1, σ1) X2 ~ N(μ2, σ2) Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2= 0 (“No mean difference") Test at signif level α σ1 σ2 μ0 1 2 Classic Example: “Randomized Clinical Trial”… Pop 1 = Treatment, Pop 2 = Control Random Sample, size n1 Random Sample, size n2 Sampling Distribution =?

Consider two independent populations… and a random variable X, normally distributed in each. POPULATION 1 POPULATION 2 X1 ~ N(μ1, σ1) X2 ~ N(μ2, σ2) Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2= 0 (“No mean difference") Test at signif level α σ1 σ2 μ0 1 2 Classic Example: “Randomized Clinical Trial”… Pop 1 = Treatment, Pop 2 = Control Random Sample, size n1 Random Sample, size n2 Sampling Distribution =? Recall from section 4.1 (Discrete Models): Mean(X – Y) = Mean(X) – Mean(Y) and if X and Y are independent… Var(X – Y) = Var(X) + Var(Y)

Consider two independent populations… and a random variable X, normally distributed in each. POPULATION 1 POPULATION 2 X1 ~ N(μ1, σ1) X2 ~ N(μ2, σ2) Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2= 0 (“No mean difference") Test at signif level α σ1 σ2 μ0 1 2 Classic Example: “Randomized Clinical Trial”… Pop 1 = Treatment, Pop 2 = Control Random Sample, size n1 Random Sample, size n2 Sampling Distribution =? Recall from section 4.1 (Discrete Models): Mean(X – Y) = Mean(X) – Mean(Y) = 0 under H0 and if X and Y are independent… Var(X – Y) = Var(X) + Var(Y)

Consider two independent populations… and a random variable X, normally distributed in each. POPULATION 1 POPULATION 2 X1 ~ N(μ1, σ1) X2 ~ N(μ2, σ2) Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2= 0 (“No mean difference") Test at signif level α σ1 σ2 1 2 Null Distribution But what if σ12andσ22are unknown? Then use sample estimates s12 and s22 with Z- or t-test, if n1 and n2 are large. s.e. 0

Consider two independent populations… and a random variable X, normally distributed in each. POPULATION 1 POPULATION 2 X1 ~ N(μ1, σ1) X2 ~ N(μ2, σ2) Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2= 0 (“No mean difference") Test at signif level α σ1 σ2 1 2 Null Distribution But what if σ12andσ22are unknown? Then use sample estimates s12 and s22 with Z- or t-test, if n1 and n2 are large. Later… s.e. (But what if n1andn2are small?) 0

Example:X = “$ Cost of a certain medical service” Assume X is known to be normally distributed at each of k = 2 health care facilities (“groups”). Hospital: X1 ~ N(μ1, σ1) Clinic: X2 ~ N(μ2, σ2) • Null Hypothesis H0: μ1 = μ2, • i.e., μ1 – μ2= 0 • (“No difference exists.") • 2-sided test at significance level α = .05 • DataSample 1: n1 = 137 Sample 2: n2 = 140 NOTE: > 0 Null Distribution 4.2 95% Margin of Error = (1.96)(4.2) = 8.232 95% Confidence Interval for μ1 – μ2: (84 – 8.232, 84+ 8.232) = (75.768, 92.232) does not contain 0 Z-score = = 20 >> 1.96  p << .05 Reject H0; extremely strong significant difference 0

Consider two independent populations… and a random variable X, normally distributed in each. POPULATION 1 POPULATION 2 X1 ~ N(μ1, σ1) X2 ~ N(μ2, σ2) Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2= 0 (“No mean difference") Test at signif level α 1 2 Samplesizen1 Sample size n2 largen1andn2 Null Distribution

Consider two independent populations… and a random variable X, normally distributed in each. POPULATION 1 POPULATION 2 X1 ~ N(μ1, σ1) X2 ~ N(μ2, σ2) Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2= 0 (“No mean difference") Test at signif level α 1 2 Samplesizen1 Sample size n2 largen1andn2 smalln1andn2 Null Distribution IF the two populations are equivariant, i.e., then conduct a t-test on the “pooled”samples.

Test Statistic Sampling Distribution =? Working Rule of Thumb Acceptance Region for H0 ¼ < F < 4

Consider two independent populations… and a random variable X, normally distributed in each. POPULATION 1 POPULATION 2 Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2= 0 (“No mean difference") Test at signif level α X1 ~ N(μ1, σ1) X2 ~ N(μ2, σ2) 1 2 smalln1andn2 Null Distribution IFequal variances is accepted, then estimate their common value with a “pooled”sample variance. The pooled variance is a weighted average of s12 and s22, using the degrees of freedom as the weights.

Consider two independent populations… and a random variable X, normally distributed in each. POPULATION 1 POPULATION 2 Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2= 0 (“No mean difference") Test at signif level α X1 ~ N(μ1, σ1) X2 ~ N(μ2, σ2) 1 2 smalln1andn2 Null Distribution IFequal variances is accepted, then estimate their common value with a “pooled”sample variance. IFequal variances is rejected, The pooled variance is a weighted average of s12 and s22, using the degrees of freedom as the weights. then use Satterwaithe Test, Welch Test, etc. SEE LECTURE NOTES AND TEXTBOOK.

Example:Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Hospital: Y1 ~ N(μ1, σ1) Clinic: Y2 ~ N(μ2, σ2) • Null Hypothesis H0: μ1 = μ2, • i.e., μ1 – μ2= 0 • (“No difference exists.") • 2-sided test at significance level α = .05 • Data: Sample 1 ={667, 653, 614, 612, 604}; n1 = 5 Sample 2 ={593, 525, 520}; n2 = 3 NOTE: > 0 • Analysis via T-test(if equivariance holds): Point estimates “Group Means”  “Group Variances” SS1 SS2 s2 = SS/df Pooled Variance The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights. df1 df2

Example:Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Hospital: Y1 ~ N(μ1, σ1) Clinic: Y2 ~ N(μ2, σ2) • Null Hypothesis H0: μ1 = μ2, • i.e., μ1 – μ2= 0 • (“No difference exists.") • 2-sided test at significance level α = .05 • Data: Sample 1 ={667, 653, 614, 612, 604}; n1 = 5 Sample 2 ={593, 525, 520}; n2 = 3  NOTE: > 0 • Analysis via T-test(if equivariance holds): Point estimates “Group Means” “Group Variances” s2 = SS/df SS = 6480 Pooled Variance The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights. df = 6 p-value = Reject H0 at α = .05 stat signif, Hosp > Clinic > 2 * (1 - pt(3.5, 6)) Standard Error [1] 0.01282634

R code: > y1 = c(667, 653, 614, 612, 604) > y2 = c(593, 525, 520) > > t.test(y1, y2, var.equal = T) Two Sample t-test data: y1 and y2 t = 3.5, df = 6, p-value = 0.01283 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 25.27412 142.72588 sample estimates: mean of x mean of y 630 546 Formal Conclusion p-value < α = .05 Reject H0at this level. Interpretation The samples provide evidence that the difference between mean costs is (moderately) statistically significant, at the 5% level, with the hospital being higher than the clinic (by an average of $84).

NEXT UP… PAIRED MEANS page 6.2-7, etc.

CHAPTER 6 Statistical Inference & Hypothesis Testing