Hypothesis Testing: Type II Error and Power

Hypothesis Testing: Type II Error and Power

Type I and Type II Error Revisited NULL HYPOTHESIS Actually True Actually False Fail to Reject DECISION Reject Either type error is undesirable and we would like both a and b to be small. How do we control these?

A Type I error, or an a-error is made when a true hypothesis is rejected. • The letter “a” (alpha) is used to denote the probability related to a type I error • a also represents the level of significance of the decision rule or test • You, as the investigator, select this level

A Type II error, or an b-error is made when a false hypothesis is NOT rejected. • The letter “b” (beta) is used to denote the probability related to a type II error • 1-b represents the POWER of a test: • The probability of rejecting a false null hypothesis • The value of b depends on a specific alternative hypothesis • b can be decreased (power increased) by • increasing sample size

Computing Power of a Test • Example: Suppose we have test of a mean with • Ho: mo = 100 vs. Ha: mo100 • s = 10 • n = 25 • a = .05 If the true mean is in fact m = 105, • what is b, the probability of failing to reject Ho when we should ? • What is the power (1-b) of our test to reject Ho when we should reject it?

In this example, the standard error is s/n = 10/5=2, so that: a/2 = .025 a/2 = .025 mo=100 100 -1.96(2) = 96.08 100 +1.96(2) = 103.92 We will reject Ho if (x  96.08) or if (x  103.92)

We will reject Ho • if x is greater than 103.92 • or x is less than 96.08 • Let’s look at these decision points relative to our specific alternative. • Suppose, in fact, that ma= 105. Distribution based on Ha 96.08 103.92 ma=105

ma=105 96.08 103.92 z - 4.46 - 0.96 0

note: • a is fixed in advance by the investigator • b depends on • the sample size  se = (s / n) • the specific alternative, ma • we assume that the variance s2 holds for both the null and alternative distributions a/2 a/2 b ma 105 m0 100 100-1.96(se) = 96.08 100+1.96(se) = 103.92

Again, looking at our specific alternative: ma = 105 b:area where we fail to reject Ho even though Ha is correct a/2:area where we reject Ho for Ha – Good! a/2 ma 105 m0 100 100-1.96(se) = 96.08 100+1.96(se) = 103.92

We define power as 1-b • power = Pr(rejecting Ho | Ha is true) • In our example, • power = 1-b = 1 – .1685 = .8315 • That is, • with a = .05 • a sample size of n=25 • a true mean of ma= 105, • the power to reject the null hypothesis (mo=100) is 83.15%.

Example 2: • Suppose we want to test, at the a = .05 level, the following hypothesis: • Ho: m= 67 vs. Ha: m 67 • We have n=25 and we know s = 3. To test this hypothesis we establish our critical region. a/2 a/2 ? 67 ?

Here, we reject Ho, at the a=.05 level when: or a/2: Rejection region a/2: Rejection region 65.82 67 68.18

Now, select a specific alternative to compute b: Let Ha1: ma=67.5 “fail-to-reject” region based on H0 65.82 67.5 68.18 z – 2.80 0 1.13 or Power = 1-b = 13%

Now look at the same thing for different values of ma: Type II Error (b) and Power of Test for a = .05, n=25, mo = 67, s = 3 mo

Let us plot Power (1-b) vs. alternative mean (µa). This plot will be called the power curve. Note: at ma= mo1-b = a 1.00 The farther the alternative is from m0, the greater the power. 0.75 0.50 1 - b 0.25 0.00 65 66 67 68 69 m0 ma

Suppose we want to test, the same hypothesis, still at the a = .05 level, s = 3 : • Ho: m= 67 vs. Ha: m 67 • But we will now use n=100. We establish our critical region – now with sx= s / n = 3/10 = .3 a/2 a/2 ? 67 ?

With n=100, we reject Ho, at the a=.05 level when: or a/2: Rejection region a/2: Rejection region 66.41 67 67.59

Again, select a specific alternative to compute b: Let Ha: ma=67.5 “fail-to-reject” region based on H0 66.41 67.5 67.59 z – 3.63 0 0.30 or Power = 1-b = 38%

Now look at the same thing for different values of ma: Type II Error (b) and Power of Test for a = .05, n=100,mo = 67, s = 3 mo

1.00 0.75 0.50 0.25 0.00 65 66 67 68 69 Power Curves: Power (1-b) vs. ma for n=25, 100 a = .05, mo = 67 – n = 100 – n = 25 1 - b For the same alternative ma, greater n gives greater power. ma

Clearly, the larger sample size has resulted in • a more powerful test. • However, the increase in power required an additional 75 observations. • In all cases a = .05. • Greater power means: • we have a greater chance of rejecting Ho in favor of Ha • even for alternatives that are close to the value of mo.

We will revisit our discussion of power when we discuss sample size in the context of hypothesis testing. • Minitab allows you to compute power of a test for a specific alternative: • You must supply: • The difference between the null and a specific alternative mean: m0-ma • The sample size, n • The standard deviation, s

Using Minitab to estimate Sample Size: Stat  Power and Sample Size  1-Sample Z Sample size (to specify several, separate with a space) Difference between mo and ma ( to specify several, separate with a space) 2-sided test s

Power and Sample Size 1-Sample Z Test Testing mean = null (versus not = null) Calculating power for mean = null + difference Alpha = 0.05 Assumed standard deviation = 10 Sample Difference Size Power 2 25 0.170075 2 100 0.516005 5 25 0.705418 5 100 0.998817

Hypothesis Testing: Type II Error and Power