Introduction to Biostatistics for Clinical Researchers

Introduction to Biostatistics for Clinical Researchers University of Kansas Department of Biostatistics & University of Kansas Medical Center Department of Internal Medicine

Schedule Friday, December 10 in 1023 Orr-Major Friday, December 17 in B018 School of Nursing Possibility of a 5th lecture, TBD All lectures will be held from 8:30a - 10:30a

Materials • PowerPoint files can be downloaded from the Department of Biostatistics website at http://biostatistics.kumc.edu • A link to the recorded lectures will be posted in the same location

An Introduction to Hypothesis Testing: The Paired t-Test

Topics • Comparing two groups: the paired-data situation • Hypothesis testing: the Null and Alternative hypotheses • Relationships between confidence intervals and hypothesis testing when comparing means • P-values: definitions, calculations, and more

The Paired t-test: the Confidence Interval Component

Two Group Designs • For continuous endpoint: Are the population means different? • Subjects could be randomized to one of two treatments (randomized parallel-group design) • Compare the mean responses from each treatment • Also referred to as independent groups • Subjects could each be given both treatments with the ordering of treatments randomized (paired design) • Compare the mean difference to zero (or some other interesting value) • Pre-post data • Matched case-control

Example: Pre- versus Post- Data • Why pair the observations? • Decrease variability in response • Each subject acts as its own control (reduced sample sizes) • Good way to get preliminary data/estimates to develop further research

Example: Pre- versus Post- Data • Ten non-pregnant, pre-menopausal women 16-49 years old who were beginning a regimen of oral contraceptive (OC) use had their blood pressures measured prior to starting OC use and three-months after consistent OC use • The goal of this small study was to see what, if any, changes in average blood pressure were associated with OC use in such women • The data shows the resulting pre- and post-OC use systolic BP measurements for the 10 women in the study

Blood Pressure and OC Use

Blood Pressure and OC Use • The sample average of the differences is • Also note: • The sample standard deviation (s) of the differences is sdiff = 4.6 • Standard deviation of differences follows the formula: where • each represents an individual difference and • is the mean difference

Note on Paired Data Designs • The BP information is essentially reduced from two samples (prior to and after OC use) into one piece of information--difference in BP between the two samples • Response is “within-subject” • This is standard protocol for comparing paired samples with a continuous outcome measure

The Confidence Interval Approach • Suppose we want to draw a conclusion about a population parameter: • In a population of women who use OC, is the average change in blood pressure (after - before) zero? • The CI approach allows us to create a range of plausible values for the average change (μΔ) in blood pressure using data from a single, imperfect, paired sample

The Confidence Interval Approach • A 95% CI for μΔ in BP in the population of women taking OC is

Note • The number 0 is NOT in the confidence interval (1.5-8.1) • This suggests there is a non-zero change in BP over time • The phrase “statistically significant” change is used to indicate a non-zero mean change

Note • The BP change could be due to factors other than OC • Change in weather over pre- and post- period • Changes in personal stress • A control group of comparable women who were not taking OC would strengthen this study • This is an example of a pilot study-a small study done just to generate some evidence of a possible association • This can be followed up with a larger, more scientifically rigorous study

The Paired t-test: the Hypothesis Testing Component

The Hypothesis Testing Approach • Suppose we want to draw a conclusion about a population parameter: • In a population of women who use OC, is the average change in blood pressure (after - before) zero? • The hypothesis testing approach allows us to choose between two competing possibilities for the average change (μΔ) in blood pressure using data from a single, imperfect, paired sample

Hypothesis Testing • Two mutually exclusive, collectively exhaustive possibilities for “truth” about mean change, μΔ • Null hypothesis: HO: μΔ = 0 (what we wish to ‘nullify’) • Alternative hypothesis: HA : μΔ ≠ 0 (what we wish to show evidence in favor of) • We use our data as ‘evidence’ in favor of or against the null hypothesis (and alternative hypothesis, as a result)

Hypothesis Testing • Null: Typically represents the hypothesis that there is no association or difference • HO: μΔ = 0  There is no associationbetween OC use and blood pressure • Alternative: The very general complement to the null • HA: μΔ ≠ 0  There is an association between OC use and blood pressure

Hypothesis Testing • Our result will allow us to either reject or fail to rejectHO • We start by assuming HO is true, and ask: • How likely, given HO is true, is the result we got from our sample? • In other words, what are the chances of obtaining the sample data we actually observed (“evidence”) if the truth is that there is no association between blood pressure and OC use?

Hypothesis Testing • HO, in combination with other information about our population and the size of our sample, sets up (via the CLT) a theoretical probability distribution of sample means computed from all possible samples of size n = 10 where µ∆ = 0 Population distribution of BP change σ∆ µ∆ = 0

Hypothesis Testing • HO, in combination with other information about our population and the size of our sample, sets up (via the CLT) a theoretical probability distribution of sample means computed from all possible samples of size n = 10 where µ∆ = 0 Sampling distribution of the sample mean µ∆ = 0

Hypothesis Testing • Theoretically, if the null hypothesis were true we would be more likely to observe values of the sample mean “close” to zero: µ∆ = 0

Hypothesis Testing • Theoretically, if the null hypothesis were true it would be unlikely that we should observe values of the sample mean “far” from zero: µ∆ = 0

Hypothesis Testing • We observed a sample mean of = 4.8 mmHg—is it far enough from zero for us to conclude in favor of HA? In favor of HO In favor of HA In favor of HA µ∆ = 0

Hypothesis Testing • We need some measure of how probable the result from our sample is given the null hypothesis • The sampling distribution of the sample mean allows us to evaluate how unusual our sample statistic is by computing a probability corresponding to the observed results—the p-value • If p is small, it suggests that the probability of obtaining the observed result from the hypothesized distribution is not likely to occur by chance. • In other words, either • The null hypothesis is actually true and, just by chance, we got a sample that gave us an unlikely result; or • The null hypothesis is actually false, and we got a sample with evidence of such

Hypothesis Testing • The null hypothesis is actually false, and we got a sample with evidence of such If we are using a random sample, we can be assured that this is the case (95% confident, in fact)

Hypothesis Testing • To compute a p-value, we need to find our value of on the sampling distribution and figure out how “unusual” it is • Recall: = 4.8 mmHg µ∆ = 0

Hypothesis Testing • Problem: What is σ∆? µ∆ = 0

Hypothesis Testing • Solution: the Student’s t distribution µ∆ = 0

Hypothesis Testing • Where is = 4.8 mmHg located on the sampling distribution (t9)? µ∆ = 0

Hypothesis Testing • The p-value is the probability of getting a sample result as (or more) extreme that what we observed, given the null hypothesis is true: µ∆ = 0

Hypothesis Testing • The p-value is the area under the curve corresponding to values of the sample mean more extreme than 4.8 • P(| | ≥ 4.8) µ∆ = 0

Hypothesis Testing • Strictly for convenience, we standardize our distribution • We center it at zero by subtracting the mean • We adjust the variability to correspond to s = 1 by dividing every observation by -t µ∆ = 0 +t

Hypothesis Testing • The p-value is the area under the curve corresponding to values of the sample mean more extreme than 4.8 • P(| | ≥ 4.8) = P(|t| ≥ 3.3)  easily found in any t table -3.3 µ∆ = 0 +3.3

Hypothesis Testing • Note: this t is called a test statistic (and is synonymous with a z-score) • It represents the distance of the observation from the hypothesized mean in standard errors • In this case, our mean (4.8) is 3.3 SE away from the hypothesized mean (0) • Based on this, what do you think the p-value will look like? Is a result 3.3 SE above its mean unusual?

Hypothesis Testing • t = 3.3 on the sampling distribution (t9)

The p-value • The p-value is the probability of getting a sample result as (or more) extreme that what we observed, given the null hypothesis is true

The p-value • We can look this up in a t-table • . . . or we can let Excel or another statistical package do it for us • =TDIST(3.3,9,2)

Interpreting the p-value • P = 0.0092: if the true before OC/after OC blood pressure difference is zero among all women taking OCs, then the chance of seeing a mean difference as extreme or more extreme than 4.8 in a sample of 10 women is 0.0092 • We now need to use the p-value to make a decision—either reject or fail to reject HO • We need to decide if our sample result is unlikely enough to have occurred by chance if the null was true

Using the p-value to Make a Decision • Establishing a cutoff • In general, to make a decision about what p-value constitutes “unusual” results, there needs to be a cutoff such that all p-values less than the cutoff result in rejection of the null • The standard (but arbitrary) cutoff is p = 0.05 • This cutoff is referred to as the significance level of the test and is usually represented by α • For example, α = 0.05

Using the p-value to Make a Decision • Establishing a cutoff • Frequently, the result of a hypothesis test with p < 0.05 is called statistically significant • At the α = 0.05 level, we have a statistically significant blood pressure difference in the BP/OC example

Example: BP/OC • Statistical method • The changes in blood pressures after oral contraceptive use were calculated for 10 women • A paired t-test was used to determine if there was a statistically significant change in blood pressure, and a 95% confidence interval was calculated for the mean blood pressure change (after-before) • Result • Blood pressure measurements increased on average 4.8 mmHg with standard deviation 4.6 mmHg • The 95% confidence interval for the mean change was 1.5-8.1 mmHg • The blood pressure measurements after OC use were statistically significantly higher than before OC use (p = 0.009)

Example: BP/OC • Discussion • A limitation of this study is that there was no comparison group of women who did not use oral contraceptives • We do not know if blood pressures may have risen without oral contraceptive usage

Example: Clinical Agreement • Two different physicians assessed the number of palpable lymph nodes in 65 randomly selected male sexual contacts of men with AIDS or AIDS-related conditions1 1Example based on data taken from Rosner, B. (2005). Fundamentals of Biostatistics (6th ed.), Duxbury Press

95% Confidence Interval • A 95% CI for difference in mean number of lymph nodes (Doctor 2 compared to Doctor 1):

Getting a p-value • Hypotheses: • HO: µdiff = 0 • HA : µdiff ≠ 0 • Assume the null is true • Compute the distance in SEs between and the hypothesized value (zero) • The sample result is 7.8 SEs below 0—is this unusual?

Getting a p-value • Sample result is 7.8 SEs below 0—is this unusual? • See where this result falls on the sampling distribution (t64) • The p-value corresponds to P(|t|> 7.8)—without looking it up we know p < 0.001

Introduction to Biostatistics for Clinical Researchers