Medical Biometry I

Medical Biometry I (Biostatistics 511) Discussion Section Week 8 C. Jason Liang Biostat 511

Discussion Outline • Calculating confidence interval for population mean (μ) • When population standard deviation (σ) is known • When population standard deviation (σ) is not known • I have a confidence interval. What is it really telling me? • Two-sided hypothesis testing • z-test (σ known) and t-test (σ not known) • Three different ways, all equivalent. • A little Stata. • Putting it all together • Connections, more interpretations Biostat 511

Confidence intervals for population mean (population σ known) What we want to know: what is the population mean cholesterol for hypertensive men? What we have: a random sample of 25 hypertensive men and their cholesterol. Knowledge that the population cholesterol standard deviation for hypertensive men is 45 mg/ml The data: 233.47 203.76 204.66 279.39 189.35 227.17 187.55 234.37 234.37 274.89 241.58 160.53 189.35 167.74 205.56 231.67 160.53 266.79 163.23 222.67 202.86 272.19 229.87 219.06 297.40 What would be an estimate of the population mean cholesterol for hypertensive men? Biostat 511

Confidence intervals for population mean (population σ known) What would be an estimate of the population mean cholesterol for hypertensive men? But we would like some measure of uncertainty for this estimate.This is often expressed by a confidence interval. Biostat 511

Confidence intervals for population mean (population σ known) What would be an estimate of the population mean cholesterol for hypertensive men? But we would like some measure of uncertainty for this estimate.This is often expressed by a confidence interval. 95% confidence intervals are most common. Here is the 95% confidence interval calculated from our data How did we get this? Biostat 511

Confidence intervals for population mean (population σ known) General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean (α = 1-0.95 = 0.05). Plug in values calculated from sample (), from look-up table (z0.975), and already known to us (σ, n) Use calculator or Stata What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? What if the sigma was larger/smaller? Biostat 511

Confidence intervals for population mean (population σ known) General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean (α = 1-0.95 = 0.05). Plug in values calculated from sample (), from look-up table (z0.975), and already known to us (σ, n) Use calculator or Stata What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? What if the sigma was larger/smaller? It would mean plugging in a larger n, which would make for a tighter CI, i.e. the values would be closer to the sample mean. Biostat 511

Confidence intervals for population mean (population σ known) General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean (α = 1-0.95 = 0.05). Plug in values calculated from sample (), from look-up table (z0.975), and already known to us (σ, n) Use calculator or Stata What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? What if the sigma was larger/smaller? A 99% CI means larger values for t and thus a wider interval. A 90% CI means smaller values for t and a tighter interval. It would also affect the interpretation. Biostat 511

Confidence intervals for population mean (population σ known) General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean (α = 1-0.95 = 0.05). Plug in values calculated from sample (), from look-up table (z0.975), and already known to us (σ, n) Use calculator or Stata What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? What if the σwas larger/smaller? Larger σ means a wider interval. Smaller σ means a tighter interval. Makes sense – sampling from less diffuse data should mean less uncertainty. Biostat 511

Confidence intervals for population mean (population σ NOT known) What we want to know: what is the population mean cholesterol for hypertensive men? What we have: a random sample of 25 hypertensive men and their cholesterol. Knowledge that the population cholesterol standard deviation for hypertensive men is 45 mg/ml The data: 233.47 203.76 204.66 279.39 189.35 227.17 187.55 234.37 234.37 274.89 241.58 160.53 189.35 167.74 205.56 231.67 160.53 266.79 163.23 222.67 202.86 272.19 229.87 219.06 297.40 What would be an estimate of the population mean cholesterol for hypertensive men? Biostat 511

Confidence intervals for population mean (population σ NOT known) General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean. Plug in values calculated from sample (, s), look-up table (t24, 0.975), and already known to us (n) Use calculator or Stata If we drew another sample of the same size, which values would be the same and which would likely change? What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? Biostat 511

Confidence intervals for population mean (population σ NOT known) General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean. Plug in values calculated from sample (, s), look-up table (t24, 0.975), and already known to us (n) Use calculator or Stata If we drew another sample of the same size, which values would be the same and which would likely change? What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? We’d likely get different values for and s. Our n would remain fixed. Our twould remain the same, assuming we still want a 95% confidence interval. Biostat 511

Confidence intervals for population mean (population σ NOT known) General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean. Plug in values calculated from sample (, s), look-up table (t24, 0.975), and already known to us (n) Use calculator or Stata If we drew another sample of the same size, which values would be the same and which would likely change? What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? It would mean plugging in a larger n, which would make for a tighter CI, i.e. the values would be closer to the sample mean. Biostat 511

Confidence intervals for population mean (population σ NOT known) General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean. Plug in values calculated from sample (, s), look-up table (t24, 0.975), and already known to us (n) Use calculator or Stata If we drew another sample of the same size, which values would be the same and which would likely change? What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? A 99% CI means larger values for t and thus a wider interval. A 90% CI means smaller values for t and a tighter interval. Biostat 511

Confidence interval of sample mean - interpretation Scientific collaborator asking statistician some questions: Q: What is your best estimate of the population mean? A: The sample mean! For our sample, it is 220. Q: But how sure are you that it is the population mean? A: I don’t know if it is or not, but I can tell you the 95% confidence interval calculated from our data is (204.07, 235.93) Q: Ok, so there’s a 95% chance that the pop. mean is in that interval right? A: Not quite! The true mean either is or it isn’t in that confidence interval. So we can’t put a probability on it. However, I can tell you that if I were to repeat this experiment over and over again, 95% of the confidence intervals produced will contain the truth. Biostat 511

Hypothesis testing for population mean (population σ known: z-test) • Known facts: • In the general population, men have mean cholesterol of 211 mg/ml with standard deviation 45 mg/ml. • What we want to know: • Do men in the hypertensive population have different mean cholesterol than men in the general population? • What we have: • A random sample of 25 hypertensive men and their cholesterol. Knowledge that the population std. dev. for hypertensive men is the same as that of the general population (45 mg/ml) Biostat 511

Hypothesis testing for population mean (population σ known: z-test) Set up our hypotheses. We focus on two-sided testing: H0: μ=211 Ha: μ>211 or μ<211 Decide on our α -level, or Type-I error rate. In other words: in situations where the null hypothesis is true, how often do we want to make an incorrect conclusion? Typically 5%. α = 0.05 If the null hypothesis really were true, what observed values of would make us say, “that can’t be right”. The α value affects how quickly we would jump to this conclusion. More formally: These “extreme” potential observed values of are the rejection regions. Biostat 511

Hypothesis testing for population mean (population σ known: z-test) • Now let’s do our two-sided hypothesis test. • The data from our sample: • 233.47 203.76 204.66 279.39 189.35 227.17 187.55 234.37 234.37 274.89 241.58 160.53 189.35 167.74 205.56 231.67 160.53 266.79 163.23 222.67 202.86 272.19 229.87 219.06 297.40 • In the following slides we will go through three different, but mathematically equivalent ways of performing our test • On the scale • On the Z-score scale • On the p-value scale Biostat 511

Hypothesis testing for population mean (population σ known: z-test) For a two-sided test with α=0.05, we reject if < μ0-z0.975×or> μ0+z0.975× Where μ0 is the mean under the null hypothesis (in our example, 211). In the above example, we constructed critical values on the scale. So for our sample, the critical values are 211-1.96× and 211+1.96× The in our sample is 220. This is not less than 193.36, and it is not greater than 228.64. Thus we have insufficient evidence to reject the null hypothesis. Biostat 511

Hypothesis testing for population mean (population σ known: z-test) Some may be more comfortable operating on the Z-score scale. Calculate: And reject the null if Z < -z0.975or Z >z0.975 So in our sample, the Z-score is The Z-score of 1 is not less than -1.96 (-z0.975) and not greater than 1.96 (z0.975). Thus we have insufficient evidence to reject the null hypothesis. Biostat 511

Hypothesis testing for population mean (population σ known: z-test) Finally, some prefer the p-value scale. If our observed is greater thanthen: If our observed is less thanthen: Where in this case, Z denotes a standard normal random variable. If our calculated p is less than our α (typically 0.05) we reject the null. In our sample, (220) is greater than (211). So our calculation is: Our p-value of 0.32 is greater than our α of 0.05. So we have insufficient evidence to reject the null. Biostat 511

Hypothesis testing for population mean (population σ known: z-test) It is not a coincidence that all three methods produced the same conclusion. They are mathematically equivalent! When doing an analysis yourself, just pick the one you feel most comfortable with. When reading research papers though, it is good to be familiar with all three. Let’s use some pictures to help illustrate why the three methods are equivalent. Biostat 511

Hypothesis testing for population mean (population σ known: z-test) Suppose we live in a world where hypertensive men actually are the same as everyone else (i.e. H0 is true): Say we took MANY samples of 25 hypertensive male cholesterols and found the sample mean for each of these samples. A histogram of these millions of sample means: Biostat 511

Hypothesis testing for population mean (population σ known: z-test) -scale Suppose we live in a world where hypertensive men actually are the same as everyone else (i.e. H0 is true): Say we took MANY samples of 25 hypertensive male cholesterols and found the sample mean for each of these samples. A histogram of these millions of sample means: If H0 is true, the probability of observing an in the extreme red area is 5% (recall α=0.05). Biostat 511

Hypothesis testing for population mean (population σ known: z-test) Z-score scale Suppose we live in a world where hypertensive men actually are the same as everyone else (i.e. H0 is true): Say we took MANY samples of 25 hypertensive male cholesterols and found the Z-score for each of these samples. A histogram of these millions of Z-scores: If H0 is true, the probability of observing a Z-score in the extreme red area is 5% (recall α=0.05). Biostat 511

Hypothesis testing for population mean (population σ known: z-test) p-value scale Suppose we live in a world where hypertensive men actually are the same as everyone else (i.e. H0 is true): Say we took MANY samples of 25 hypertensive male cholesterols and found the sample mean for each of these samples. A histogram of these millions of sample means: If H0 is true, the probability of observing another more extreme than the one from our sample is 0.32 (the blue area). Biostat 511

Hypothesis testing for population mean (population σ NOT known: t-test) • In the previous example, we knew the population sd. What if we don’t? • Known facts: • In the general population, men have mean cholesterol of 211 mg/ml with standard deviation 45 mg/ml. • What we want to know: • Do men in the hypertensive population have different mean cholesterol than men in the general population? • What we have: • A random sample of 25 hypertensive men and their cholesterol. Knowledge that the population sd for hypertensive men is the same as that of the general population (45 mg/ml) Biostat 511

Hypothesis testing for population mean (population σ NOT known: t-test) Set up our hypotheses. We focus on two-sided testing: H0: μ=211 Ha: μ>211 or μ<211 Decide on our α -level, or Type-I error rate. In other words: in situations where the null hypothesis is true, how often do we want to make an incorrect conclusion? Typically 5%. α = 0.05 If the null hypothesis really were true, what observed values of would make us say, “that can’t be right”. The α value affects how quickly we would jump to this conclusion. More formally: These “extreme” potential observed values of are the rejection regions. Biostat 511

Hypothesis testing for population mean (population σ NOT known: t-test) • Same as z-test, but we replace the z’s with t’s, and the with the sample standard deviation S. For a 2-sided test with α=0.05, reject the null when • Note: • When looking up tn-1,1-α/2 we need both degrees of freedom (in our example, n-1) and α • is the sample standard deviation • With large samples, t-tests and z-tests are essentially equivalent. Biostat 511

Hypothesis testing for population mean (population σ NOT known: t-test) For a two-sided test with α=0.05, we reject if < μ0-t1-n,0.975×or> μ0+t1-n, 0.975× Where μ0 is the mean under the null hypothesis (in our example, 211). In the above example, we constructed critical values on the scale. So for our sample, the critical values are 211-2.064× and 211+2.064× The in our sample is 220. This is not less than 195.07, and it is not greater than 226.93. Thus we have insufficient evidence to reject the null hypothesis. Biostat 511

Hypothesis testing for population mean (population σ NOT known: t-test) Some may be more comfortable operating on the T-score scale. Calculate: And reject the null if T < -tn-1, 0.975or T >tn-1, 0.975 So in our sample, the T-score is The T-score of 1.166 is not less than -2.064 (-tn-1, 0.975) and not greater than 2.064 (tn-1, 0.975). Thus we have insufficient evidence to reject the null hypothesis. Biostat 511

Hypothesis testing for population mean (population σ NOT known: t-test) Finally, some prefer the p-value scale. If our observed is greater thanthen: If our observed is less thanthen: Where here, is a random variable with a t-distribution with n-1 degrees of freedom. If our calculated p is less than our α (typically 0.05) we reject the null. In our sample, (220) is greater than (211). So our calculation is: Our p-value of 0.255 is greater than our α of 0.05. So we have insufficient evidence to reject the null. Biostat 511

One sample t-test example in Stata We can do all of this in Stata using the ttesti command Null mean Sample mean Sample Std. dev. Sample size . ttesti 25 220 38.6 211 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 220 7.72 38.6 204.0667 235.9333 ------------------------------------------------------------------------------ mean = mean(x) t = 1.1658 Ho: mean = 211 degrees of freedom = 24 Ha: mean < 211 Ha: mean != 211 Ha: mean > 211 Pr(T < t) = 0.8724 Pr(|T| > |t|) = 0.2551 Pr(T > t) = 0.1276 Biostat 511

One sample t-test example in Stata We can do all of this in Stata using the ttesti command Null mean Sample mean Sample Std. dev. Sample size This is the T-score . ttesti 25 220 38.6 211 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 220 7.72 38.6 204.0667 235.9333 ------------------------------------------------------------------------------ mean = mean(x) t = 1.1658 Ho: mean = 211 degrees of freedom = 24 Ha: mean < 211 Ha: mean != 211 Ha: mean > 211 Pr(T < t) = 0.8724 Pr(|T| > |t|) = 0.2551 Pr(T > t) = 0.1276 Biostat 511

One sample t-test example in Stata We can do all of this in Stata using the ttesti command Null mean Sample mean Sample Std. dev. Sample size . ttesti 25 220 38.6 211 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 220 7.72 38.6 204.0667 235.9333 ------------------------------------------------------------------------------ mean = mean(x) t = 1.1658 Ho: mean = 211 degrees of freedom = 24 Ha: mean < 211 Ha: mean != 211 Ha: mean > 211 Pr(T < t) = 0.8724 Pr(|T| > |t|) = 0.2551 Pr(T > t) = 0.1276 Biostat 511

One sample t-test example in Stata We can do all of this in Stata using the ttesti command Null mean Sample mean Sample Std. dev. Sample size . ttesti 25 220 38.6 211 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 220 7.72 38.6 204.0667 235.9333 ------------------------------------------------------------------------------ mean = mean(x) t = 1.1658 Ho: mean = 211 degrees of freedom = 24 Ha: mean < 211 Ha: mean != 211 Ha: mean > 211 Pr(T < t) = 0.8724 Pr(|T| > |t|) = 0.2551 Pr(T > t) = 0.1276 In a world where H0 is true, the probability of seeing a sample mean even smaller than the one we observed (<220) is 87.24%. This is a p-value. H0: μ = 211 Ha: μ < 211 Biostat 511

One sample t-test example in Stata We can do all of this in Stata using the ttesti command Null mean Sample mean Sample Std. dev. Sample size . ttesti 25 220 38.6 211 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 220 7.72 38.6 204.0667 235.9333 ------------------------------------------------------------------------------ mean = mean(x) t = 1.1658 Ho: mean = 211 degrees of freedom = 24 Ha: mean < 211 Ha: mean != 211 Ha: mean > 211 Pr(T < t) = 0.8724 Pr(|T| > |t|) = 0.2551 Pr(T > t) = 0.1276 In a world where H0 is true, the probability of seeing a sample mean even greater than the one we observed (>220) is 12.76%. This is a p-value. H0: μ = 211 Ha: μ > 211 Biostat 511

One sample t-test example in Stata We can do all of this in Stata using the ttesti command Null mean Sample mean Sample Std. dev. Sample size . ttesti 25 220 38.6 211 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 220 7.72 38.6 204.0667 235.9333 ------------------------------------------------------------------------------ mean = mean(x) t = 1.1658 Ho: mean = 211 degrees of freedom = 24 Ha: mean < 211 Ha: mean != 211 Ha: mean > 211 Pr(T < t) = 0.8724 Pr(|T| > |t|) = 0.2551 Pr(T > t) = 0.1276 In a world where H0 is true, the probability of seeing a sample mean more extreme than the one we observed (>220 or <202) is 25.51%. This is a p-value. H0: μ = 211 Ha: μ ≠ 211 Biostat 511

One sample t-test example in Stata In our sample of cholesterol measurements from 25 hypertensive males, we observed a mean cholesterol of 220 mg/ml (95% CI: 204.07, 235.93). We conduct a two-sided hypothesis test with the null hypothesis that the mean cholesterol of hypertensive males is the same as the mean cholesterol of the general male population using the t-test. Our test resulted in a T-score of 1.17. This does not fall in the two-sided α=0.05 rejection region, so is not a statistically significant result. We thus conclude that we do not have sufficient evidence to reject the null hypothesis. Note that this does not mean the null hypothesis is true, just that we do not have sufficient evidence to rule it out. . ttesti 25 220 38.6 211 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 220 7.72 38.6 204.0667 235.9333 ------------------------------------------------------------------------------ mean = mean(x) t = 1.1658 Ho: mean = 211 degrees of freedom = 24 Ha: mean < 211 Ha: mean != 211 Ha: mean > 211 Pr(T < t) = 0.8724 Pr(|T| > |t|) = 0.2551 Pr(T > t) = 0.1276 In practice, conclusions/interpretations may not be this wordy. We do so here for thoroughness. Biostat 511

Summary • Some takeaways • Hypothesis testing can be done on the mean scale, the z-scale (t-scale if we don’t know σ), or the p-value scale. Another way if we are doing 2-sided testing: just calculate the (1-α)% confidence interval (e.g. 95% CI for α=0.05). If the null mean is not in the interval, reject it. These are all mathematically equivalent. • If we do not reject the null it does not imply the null is true! It simply means we don’t have sufficient evidence to reject it. • What does α =0.05 mean? One overly simplified example: in clinical trials it means we are willing to let through 5% of drugs that have no effect. We don’t know how many drugs have no effect. We just know we are willing to let through 5% of them. Biostat 511

Medical Biometry I