Statistical considerations for grants

Statistical considerations for grants Brian Healy

Comments from previous class • Change time of course • Available on-line power calculators • http://www.cs.uiowa.edu/~rlenth/Power/ • Two-sided vs. one-sided • Comparison of statistical packages

Review • Type I error • Type II error • Ways to increase power • Power/sample size calculation with continuous outcome

Type I error • We could plot the distribution of the sample means under the null before collecting data • Type I error is the probability that you reject the null given that the null is true • a = P(reject H0 | H0 is true) Notice that the shaded area is still part of the null curve, but it is in the tail of the distribution a

Type II error • Definition: when you fail to reject the null hypothesis when the alternative is in fact true (type II error) • This type of error is based on a specific alternative b=P(fail to reject the H0 | HA is true)

Power • Definition: the probability that you reject the null hypothesis given that the alternative hypothesis is true. This is what we want to happen. Power = P(reject Ho | HA is true) = 1 - b • Since this is a good thing, we want this to be high

Outline • Aspects of statistical considerations section of a grant • Example statistical analysis section • Worked example from dataset from students in class • Management of data collection/ spreadsheet

Aspects of statistical considerations • Overarching statistical issues: • Data management • Methodological issues esp. related to data collection (ex. Image processing) • Handling missing data • Clustering/correlation of observations • Specific aims: • Identify outcomes/explanatory variables • Type of analysis • Power calculation

Research study • Study design • Experimental question- What are you trying to learn? How will you prove this? • Sample selection- Who are you going to study? • Data collection • What should be collected? • Analysis of data • Results- Was there any effect? • Conclusions- What does this all mean? To whom do results apply?

Experimental question: What? How? Sample selection: Who? How many? Collect Data Analysis: Is there an effect? Conclusion: To whom?

How is statistics related to each stage? • Study design • Experimental question- Define outcome, sources of variability, unit and analysis plan • Sample selection- Sample size, type of sample

Experimental question • In a grant, the experimental question is written as the specific aims • Generally, specific aims can be easily translated to a null hypothesis • If specific aims are more general, the specific null hypotheses are listed in the grant after the aims • This is the critical step in the grant because everything else is based on the aims • Usually easiest if can set up hypothesis as Y/N question

Example • Dr. Janet Hall kindly provided a grant to use as an example • One goal of the grant was to investigate whether age had an effect on estrogen treatment in post-menopausal woman • Is there an interaction between estrogen and age? • The treatment is given to increase resting metabolic activity in the brain as measured by PET and other neuroimaging modalities • In addition, the effect of age on resting metabolic activity at baseline (untreated) was of interest

Specific aim 1 • SPECIFIC AIM #1: To determine the effect of aging on changes in baseline (resting state) cortical function and their responses to estrogen using FDG-PET. • Hypotheses: • Resting state metabolic activity, as measured by FDG-PET at baseline, is decreased in the dorsolateral prefrontal cortex (DLPFC) and increased in the hippocampus as a function of age. • Estrogen exposure results in progressive increases in resting metabolic activity in the DLPFC over time in young postmenopausal women that is not seen in their older counterparts.

Hypothesis 1 • Resting state metabolic activity, as measured by FDG-PET at baseline, is decreased in the dorsolateral prefrontal cortex (DLPFC) as a function of age. • What is the experimental question? • Is the FDG-PET level different in the hippocampus or DLPFC for women of different ages? • What is the outcome? • What is the explanatory variable?

Types of variables • The outcome is FDG-PET level and this is a continuous variable • The explanatory variable, age, could be considered continuous, but for this grant it was decided to group patients into young post-menopausal women (age 45-55) vs. old post-menopausal women (age 70-80) • What type of analysis would we use in this case? • Are the data approximately normal?

Sample selection • Our sample selection is based on the definition of the groups • What is the effect of this definition? Does it affect the generalizability of the findings? • For this study, we plan to sample small groups from a single site • Could another approach have been used? • What is the advantage of a single site? Disadvantage?

Sample size calculation • We have defined our null hypothesis, outcome and sample selection • What sample size do we need? • In this case, previous data showed mean (SD) FDG-PET at DLPFC in the young group of 83.0 (7.3) and in the old group of 76.2 (7.3) • What else do we need for our sample size calculation? • Power=0.8, a=0.05 • Assuming equal groups, we need 20 patients per group

Additional considerations • Multiple comparisons • We have two outcomes so should we adjust the significance level for the two comparisons? • Bonferroni correction for significance level • Do any specifics regarding the measurement need to be discussed? • Confounders/adjustment

Abbreviated grant section • Analysis plan: The two groups will be compared using a two-sample t-test. • Power calculation: Previous data has estimated the mean (SD) FDG-PET in the young group of 83.0 (7.3) and in the old group of 76.2 (7.3). Group sample sizes of 20 and 20 achieve 82% power to detect a difference of 6.8 between the two assuming a standard deviations of 7.3 in each group and a significance level of 0.05 using a two-sided two-sample t-test.

Hypothesis 2 • Estrogen exposure results in progressive increases in resting metabolic activity in the DLPFC over time in young postmenopausal women that is not seen in their older counterparts. • What is the experimental question? • Is the effect of estrogen on the FDG-PET level in the DLPFC different for women of different ages?

Types of variables • One potential outcome is change in FDG-PET level and this is a continuous variable • Age group and treatment are the explanatory variables • How many FDG-PET levels are measured and how many observations contribute to the analysis? • What type of analysis could we use in this case?

Data set-up • We measure the change in four types of patients (young/treated, young/placebo, old/treated, old/placebo) • We can estimate the mean change in all groups, but what is truly of interest for our hypothesis? • Interaction between the two measures • Linear regression/two-way ANOVA

Mean in treated young patients Mean in treated old patients Mean in untreated old patients Mean in untreated young patients

Sample selection • Now that our outcome and explanatory variable are clearly defined • Our sample selection in this case is a little more complex • Age group is defined by enrollment • Patients in each group were randomized to treatment or placebo • What does the randomization get for us?

Sample size calculation • We have defined our null hypothesis, outcome and sample selection • What sample size do we need? • What preliminary data would we need or what would we need to hypothesize to calculate the sample size? • Some resources for this complex design on-line, but likely you should consider speaking to a statistician for this

Abbreviated grant section • Analysis plan: The effect of age on the treatment effect of estrogen in post-menopausal women will be investigated using a two-way ANOVA. The outcome for the analysis will be the change in the FDG-PET level before and after the treatment and the two factors will be age and treatment group. The focus of the analysis will be the interaction between the two factors. • Power calculation: Given our preliminary data and available sample size, we will have 80% power to detect a hypothesized difference of x using a two-way ANOVA.

Alternative analysis strategy • Rather than focusing on the difference between the before and after treatment measurements, we could have included all of the measurements in a single model • Each patient contributes a before and after treatment measurement rather than a difference • The analysis of this approach requires accounting for the repeated measures within a subject • Repeated measures ANOVA or mixed effects model

Advantages of this approach • Handles missing data more easily • Generalizes to more than two measurements easily • Power calculations with mixed effects models can be completed as well

Conclusions • Each hypothesis needs an analysis plan that describes the type of data and statistical approach used to analyze the data • Each hypothesis also requires a sample size or power calculation • Additional issues (missing data, confounding) must be included in the statistical analysis section

Worked example

Kidney transplant research • Students in the class are investigating the effect of genetics of the donor/recipient pair on various outcomes • Creatinine level measured at time of transplant, 3 month, 6 month, 12 month and 36 months after transplant • Time to rejection of the transplant • Type of rejection (acute/chronic) • Genetic factor of interest is large deletion polymorphisms at 20 sites

Study design • Patients have been followed at 4 different sites since 1995 • Korea • Finland • BWH • MGH • Only HLA genetic data is available at the moment, but would like to genotype sufficient numbers of patients to determine if there is an effect

Experimental question • Specific aim: To explore the potential contribution of a new class of large deletion polymorphisms on the development of acute and chronic renal allograft rejection following renal transplantation. • Hypotheses: • Donor/recipient pairs with matching deletion polymorphisms will have lower creatinine levels at all time points compared to non-matched pairs • Donor/recipient pairs with matching deletion polymorphisms will have fewer acute/chronic rejection events compared to non-matched pairs

Definition of groups • Both the donor and recipient for each transplant will be genotyped and classified as either having the deletion or not having the deletion • We decided to treat each group separately initially. What type of variable is the explanatory variable?

Creatinine levels • Here are the initial values for the creatinine for one of the populations • Note the outliers at the end of the distribution. These would be very important to model • Turned out they were incorrect data

Analysis plan • Initially, we will compare each creatinine measurement separately • Since I have 4 groups (categorical outcome) and a continuous outcome, I will compare across the groups using ANOVA • The corrected data look sufficiently normal to make this analysis plan reasonable • An alternative option would be to use a Kruskal-Wallis test, which is a rank-based test that is not sensitive to the outliers

Abbreviated analysis plan • Analysis plan: The four groups of donor/recipient pairs will be compared using ANOVA. If a significant difference between the groups is observed, the pairwise comparisons will be completed with the appropriate correction for multiple comparisons. Although we could investigate the main effect of the donor’s and recipient’s deletion status in a two-way ANOVA model, our interest is in the four group comparison given the relationships seen in previous work.

Additional considerations • Rather than modeling each creatinine separately, should we model them together? • Trend with time? • Multiple comparisons if treat separately? • Confounders: • Age • Gender • HLA status • Should we treat all 20 deletion separately?

Power calculation • Unlike the previous example, we have no preliminary data regarding the effect of these deletions • How can we complete a power calculation? • Option 1: Propose a sample size from each group and determine the difference between groups you could detect • Option 2: Estimate the effect using an available measurement/literature value

Available measurement • In the dataset, we have HLA status and can calculate the mean (SD) in each of the four groups • Using this preliminary data, we can perform a power calculation and assume the effect size for the deletions will be similar to HLA • How good of a surrogate is HLA for deletion?

Abbreviated power calculation • Power calculation: Our preliminary data have shown that the mean (SD) month 12 creatinine levels of the recipient was 1.21 (0.29) in HLA identical donor/recipient pairs and 1.28 (0.33) in HLA non-identical donor/recipient pairs. We anticipate that recipients who are deletion matches will behave like the HLA identical recipients and recipients who are not deletion matches will behave like the HLA non-identical recipients. A sample size of 202 per group is required to have 80% power to detect the proposed difference between the groups at the 0.05 level using one-way ANOVA

Additional considerations • Pairwise tests • Is there a better approximation for the group means? • Clustering by country

Proportion with acute rejection • Another outcome for the study is the proportion of patients who experience acute rejection • The table at the end of the study would look like this:

Abbreviated analysis plan • Analysis plan: The proportion of patients who have acute rejection will be compared across the groups using a chi-square test for each deletion separately. In order to investigate the combined effect of deletions, multiple logistic regression models will also be fit.

Power analysis • As previously, there is no preliminary data, but let’s try the set sample size approach now • Assume that we have two groups, matched and non-matched, and we have 200 matched patients and 400 non-matched patients • What type of power analysis could we complete?

Abbreviated power analysis • Power analysis: Given our sample size (200 matched patients and 400 non-matched patients) and the assumption that matching would decrease the proportion with acute rejection, we will have at least 80% power to detect the differences presented in Table xx.

Additional considerations • Clustering by region • Stratified analysis

Statistical considerations for grants