1 / 97

Sample Size Analysis for Preparation of Grant Applications

Sample Size Analysis for Preparation of Grant Applications. Alla Sikorskii Department of Statistics and Probability Michigan State University. Outline. Hierarchy of evidence in clinical research Distributional and anchor-based measures of treatment effect

locke
Download Presentation

Sample Size Analysis for Preparation of Grant Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sample Size Analysis for Preparation of Grant Applications AllaSikorskii Department of Statistics and Probability Michigan State University

  2. Outline • Hierarchy of evidence in clinical research • Distributional and anchor-based measures of treatment effect • Clinical and statistical significance • Sample size determination for superiority clinical trials • Sample size determination for equivalence and non-inferiority clinical trials • Sample size determination for superiority trials with repeated measures • Sample size considerations for cluster randomized trials

  3. Software Many software packages available: free G*Power available at http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/download-and-register (Google “gpower”) PROC POWER in SAS Power and Sample Size (PASS) nQuery Advisor Other

  4. Successful collaboration with a statistician • We will be discussing a-priori sample size determination when planning a study, not so called “post-hoc power analysis” (various forms shown to be flawed – Hoening and Heisey 2001) • Question 1: Outcomes? Primary and secondary outcomes to enter in power and sample size calculations • Question 2: Study design? Randomization, number of groups, repeated measures • Question 3: Preliminary data?

  5. Statistician’s first question: outcomes? • Biological: tumor shrinkage, survival time, time free of disease progression, biomarkers (from blood pressure to cytokines and genetic markers) • Patient-reported: symptoms, functioning, general health perception, knowledge, behavior • Type of outcome: mean, median, proportion (rate), or time-to-event. These lead to different sample size formulas and resulting sample size requirements • Primary and secondary outcomes

  6. Statistician’s second question: design? • Causal inference can not be achieved by statistical analysis alone. Study design plus analysis allow to get close to the conclusion of causality • “Counterfactual view of causality” (Rubin, 1974): the effect of treatment is the difference between what happened to the patient as a result of giving the treatment and what would have happened had treatment been denied

  7. Hierarchy of evidence (AHRQ, 1994) • 1A. Meta-analysis of multiple, well-designed controlled studies (strongest) • 1B. Randomized clinical trial (RCT) • 2. Well-designed non-randomized controlled study - a prospective (pre-planned) study, with predetermined eligibility criteria and outcome measures, case-control studies, cohort studies with controls • 3. Well-designed non-experimental studies - subjects are neither randomly assigned nor randomly selected (case studies, focus groups) • 4. Expert committee reports, expert opinions, consensus statements, expert judgment

  8. Level 2 designs • Randomization is not always possible • Useful as pilot studies before RCTs • Issue with pre- to post- designs: attribution of the effect to treatment. Example: patients with brain tumors are given a treatment, and 6 months later their cognitive function is the same as at intake. Pre- to post- effect is zero, but this is not the effect of treatment

  9. Attribution of Treatment Effect Compare two groups of patients: those who received treatment 1 and treatment 2 Find that outcome is better in group 1 compared to group 2 Does this mean that it is treatment 1 that caused the improvement in outcome?

  10. Attribution of Treatment Effect Cont. Many other variables (other than treatment) could have influenced the outcome Some of these variables were measures, and some were not RCT argument: randomization creates groups with the same characteristics (even those that are not measured) at intake If differences between groups are found post-treatment, then since everything else was the same, it has to be the treatment

  11. Sample size for level 2 designs • When 2 independent groups (different subjects) are compared, sample size approaches discussed for RCTs can be used (discussed later). Analysis is similar, but without the randomization, attribution of the effect is different • For pre- to post- designs, or one group, appropriate confidence intervals and hypotheses tests can be used for sample size considerations • Few studies with experimental pre- to post-designs (one group) funded

  12. Example sample size calculation: descriptive aim • Descriptive aim: to determine (describe) the levels of hemoglobin A1C in a particular population. • Goal: to estimate the average hemoglobin A1C level. What sample size is needed? • Approach: confidence interval (no hypothesis stated) • Statistical answer: • Need to input: desired margin of error, M, and sigma.

  13. Example sample size calculation: descriptive aim continued • Additional information: normal hemoglobin A1C level for adult males is 4.68%-6.80%. • Pre-specified margin of error M can be chosen to be (6.80-4.68)/2=1.06 (half of the range of normal values). • Generally, estimating the mean to within ±M should have meaningful precision

  14. Example sample size calculation (continued) • Assumption: normal distribution or large sample • Use t-distribution instead of standard normal (start with confidence factor of 2 instead of 1.96) • Need an estimate of sigma that can be derived from previous studies, published articles on studies done in this population (adult males in this case) • Suppose in a previous study an estimate of 2.4% was reported. The sample size needed is 21 (round up to the next integer). Value of the confidence factor used for planning purposes was 2: n=(2*2.4)/1.06)2=20.51, or n=21.

  15. Example sample size calculation (continued) If n=21, the confidence factor is 2.086. Repeat the calculation using 2.086 instead of z to get n=(2.086*2.4)/1.06)2=22.3, or n=24. One more iteration: n=24, confidence factor is 2.069. Repeat: n=(2.069*2.4)/1.06)2=21.94, or n=22, so keep n=24. If smaller value of M is chosen, for example, 0.53 (quarter length of the range of normal values), then n=82.

  16. Example: pre- to post- • Another hypothesis: in a population of people with diabetes, a treatment will reduce the hemoglobin A1C level from the average of 8% to 7%. Introduce D=8%-7%=1% • Introduce H0: d=0 versus H1: d ≠ 0 • Level of significance alpha (usually set at 0.05) is the probability of rejecting correct null hypothesis • Power is the probability of rejecting incorrect null hypothesis. In this case the mean difference is D=1%, not equal to 0. • In this example, power has to do with detecting differences from pre- to post- when the differences exist (mean difference is not zero)

  17. Sample size calculation: confidence interval In addition to test of hypotheses, consider the corresponding 95% confidence interval for the mean difference. When the null hypothesis is false, the confidence interval should not contain zero Large-sample confidence interval for the difference: Note: the standard deviation is for the differences

  18. Sample size calculation: CI We have Or With probability .80=1-beta, so set Finally

  19. Example In the example, D=1. Use the standard deviation of 2.4 for planning. For 5% level of significance and for power of .80 Calculation yields n=48. (Gpower demonstration). The ratio (1/2.4=.42 in the example) is referred to as effect size that is based on the standard deviation of the difference. Note that technically only the ratio is needed for the calculation. If 2.4 is standard deviation of the scores (not differences), then correlation coefficient is needed. Assuming r=0.4, difference between 2 means of 1 unit (does not matter what they are!), we get ES=0.38, and n=57 (Gpower example). The effect size is often estimated in pilot work to use in sample size determination. Use example data in SAS and Gpower.

  20. Remarks The sample size formulas provided are based on standard normal distribution In practice, sample standard deviation is used instead of sigma, and appropriate confidence factors are based on t distribution When n is large, normal approximations work well When n may be small, sample size is determined based on non-central t distribution instead of standard normal. Equations can not be solved in closed form. Tables, graphs and software are available (see Julious 2004)

  21. Remarks Continued Approximate formulas provided work very well (often give the answer 1-2 fewer than exact) Results from approximate formulas are used in numerical algorithms as starting value in iterations to solve equations for sample size Correction for normal approximation can be added to sample size formulas, it is usually small (Julious 2004)

  22. Level 1B designs • Many choices for the design • We will consider parallel group designs • Other possibility: cross-over design (e.g. AB/BA) • Factor to consider: variability between subjects versus variability within subject. • Factor to consider: carry-over problem (results for a given period of treatment reflect not only the current treatment, but also the effect of previous treatments) • Statistical tests for carry-over are available, but are not very powerful. Wash out periods based on pharmacokinetic and pharmaceutical theories may be useful

  23. Parallel group designs • Research question: treatment versus control or treatment versus another treatment (active control) • Treatment versus control (placebo): superiority trial • Treatment versus another treatment: may be superiority trial, or equivalence trial, or non-inferiority trial. Acronym ACES (active control equivalence study) • Hypotheses are needed

  24. Superiority trial comparing 2 means • Testing of H0: 1 = 2 versus H1: 1 ≠ 2 Comparing the means of 2 groups using a 2–tailed test. Level of significance α =.05. Suppose 2 samples are independent, and have the same size n. Assuming normal distributions and known equal variances for two populations represented by 2 groups (e.g. treated and untreated), the test statistic is calculated Then find the p-value=P(Z>|z|), where Z is the standard normal random variable. If p-value<.05, reject the null hypothesis, establish the difference between 2 groups.

  25. Sample size considerations for superiority trial • Testing of H0: 1 = 2 versus H1: 1 ≠ 2 Power is probability of rejecting the null hypothesis when it is false, i.e. finding the difference between 2 groups when it exists. Usually desired power is set to be 0.80. (or .90). Probability of type II error, that is retaining incorrect null hypothesis, is β=.20 (or .10). For power considerations the statistician needs Δ= |1 - 2| and estimate of the common variance σ. Then the required sample size per group is

  26. Sample size for superiority trial • Δ must be clinically meaningful, the difference that one would regret missing due to inadequate sample size • Δ must be attainable under the treatment • Standardized effect size such as Cohen’s d= Δ/σ can be used with care; d=.2 is small, d=.5 is medium, and d=.8 is large in Cohen’s classification • When considering effect size, need to look at both numerator Δ and denominator σ, the latter may be affected by design or analysis approach (inclusion of covariates).

  27. Effect size When sample preliminary data are used, a pooled standard deviation at baseline is used as an estimate for sigma Standard deviation of control group may be used, then effect size is referred to as the standardized mean difference (SMD). Term SMD is preferred by Cochrane Collaboration for meta-analyses Using larger effect size in power calculations yields smaller required sample size, however, the attainability of the large effect needs to be justified. How can one be sure that the treatment can produce such large effect? Using smaller effect size yields larger sample size requirements. Caution: small effect sizes may not be clinically significant

  28. Example sample size calculation: R01 Find the sample size that is needed to detect the effect size of .3 with power of .80 and level of significance .05 for two-tailed test Sample size formula via effect size: For d=.3, n=2(0.84+1.96)2/(0.3)2=174.2, so n=175 per group.

  29. Software Demonstration GPower SAS PROC POWER Open the example SAS code Example d=.5 total n=128 (64 per group) Example d=.5, allocation ratio n2/n1=2: total n=144 (48+96)

  30. Quantifying treatment effect Effect size or SMD is a distributional measure of treatment effect. It is based on means and standard deviations of distributions of treated and untreated Another approach: based on absolute change in outcome (numerator in effect size). Example: reduce severity of pain measured on a rating scale from 0=not present to 10=worst it can be, by 2 points. Drawback: a change of 7 to 5 is equated with change from 3 to 1. Effect size has the same drawback Another approach: based on percent change in outcome (different denominator). Example drawback: 33% change from 9 to 6 versus 3 to 2

  31. Anchor-based measures The values of the means are linked to another measure (an anchor). For example, severity of pain may be linked to ability to walk a block, climb a flight of stairs, perform normal work (physical function, Ware et al. 1996) Another example: a group of patients with mean score of 40 have a mortality of 20%, and those with means scores of 50 have a mortality of 10% (Guyatt et al, 2002) Another example: relation between quality of life scores with global ratings of change (with certain effect size, improvements corresponded to “sizably better” rating )

  32. Anchor based measures (continued) Minimum important difference (MID) or minimal clinically important difference (MCID) is the smallest difference in score that patients perceive as important (either beneficial or harmful), and which would lead the clinician to consider a change in treatment (Guyatt et al. 2002) Traditionally, 33% change has been considered clinically important (success in reducing tumor size) Effect sizes around .3 or .33 or large has been deemed clinically significant (empirical evidence for MID in the range of .3 standard deviation) These guidelines may vary depending on the field

  33. Another Example: R21 In planning a pilot or an exploratory study, sample size is often determined based on feasibility considerations and is relatively small It may be unethical or fiscally irresponsible to conduct a large scale study when little or none preliminary data are available Example: based on 12 month of recruitment in a clinic where 4-5 patients a month are available, a feasible sample size is 54 total, or 27 per group Given sample size, find the minimum detectable effect size (MDES) For d=.3, n=2(0.84+1.96)2/(0.3)2=174.2, so n=175 per group.

  34. Example (continued) Solving sample size formula for effect size d: Calculation with n=27 yields d=.76 Demonstration in software: GPower, SAS (d=.772-.776) use t-distribution critical values instead of z.

  35. Example Write-up (R21) If treatment produces differences between experimental and control groups that correspond to effect size d=.76, then these differences can be detected as statistically significant If differences between means of 2 populations represented by two groups are less than .76 of the standard deviation, then statistical significance will not be reached, but this may be appropriate for exploratory study The exploratory study will provide estimate of d to appropriately power a larger study (R01)

  36. Reporting of R21-type studies Study is complete but statistical significance may not be reached. Do not focus on p-values! Report the estimated effect sizes and confidence intervals. Comment on whether the confidence interval includes or is above MCID Discuss how this pilot study informs a future confirmatory trial (R01-type) Use of alpha of 0.2 (0.25 for one-sided tests) has been proposed for pilot studies (see Lee et al. 2014) Post-hoc power calculations are not advisable

  37. Superiority trial: comparing two proportions (R01) Testing of H0: p1 = p2 versus H1: p1 ≠ p2 The sample size formula in this case is where Need values of both proportions under the alternative hypothesis, or value of one of them and relative risk (or odds ratio)

  38. Example Calculation A treatment is hypothesized to increase survival by 80%. This means that p1=p2+.8p2=1.8p2 and RR=1.8. Suppose that p2=.03 (control or standard care group). Then p1=.054 - hypothesized value for treatment group Sample size calculation can be carried out using formula, or using software resulting in n=1096 per group: Sample size formula is based on normal approximation, so np should be at least 5 (or 15) – worth checking. Exact methods are available but not frequently used

  39. More Examples Consider the same RR=1.8, but p2=0.5. Then p1=0.9, and the required sample size is n=20 per group Consider the same p2=0.5, but 20% increase to p1=0.6. Then the required sample size is 388 per group Odds ratio OR=(p2/(1-p2)) : (p1/(1-p1)) Example: p2=0.5, p1=0.9, then OR=1 : 9=.11 for group 2 relative to 1. If group 1 relative to 2, then OR=9 Example: p1=.054, p2=.03 and OR=.0309 : .057=.52 for group 2 relative to one. If group 1 relative to 2, then OR=1.845. More examples are in an example SAS code

  40. Arguing Clinical Significance Absolute difference between two rates may be important Relative risk=p1/p2 or OR may be important Relative risk may be misleading when value of p2 is not brought to attention. Example: a new treatment for an advanced cancer increases chance of survival by 50% compared to existing treatment. Relative risk of survival is 1.5 because p1=p2+.5p2=1.5p2. However if p2=.002, then p1=.003, then RR=.003/.002=1.5 (same as above), and side effects of the new treatment need to be considered

  41. Adjusting for attrition The sample size calculation yields sample size for analysis Attrition: loss to follow up between intake and trial endpoint Attrition rates differ depending on population, study length and other factors

  42. Adjusting for Attrition Assume that the attrition rate is 20%, and sample size calculation resulted in n=175 per group. Then 175/0.8=219 patients need to be randomized into each group. Total recruitment target: 219*2=438 Additional attrition may occur between consent and randomization, also needs to be taken into account especially if randomization is done in blocks (“freeze and thaw” approach, see Senn 2007)

  43. Adjusting for Attrition Cont. Suppose the required sample size for analysis is 54 (27 per group) Attrition between consent and randomization is 10%. Another 25% of patients drop out between randomization and post-intervention assessment Sample size adjustment: 54/0.75=72 patients to be randomized; 72/0.9=80 consented patients are needed Another consideration: 70% of the approached patients consent. Then number needed to be approached is 80/0.7=114.3 or 115 patients

  44. Superiority trials: adjustment for baseline Power calculations need to be consistent with proposed analysis strategy Trial analysis may not be limited to a t-test (or z-test) comparing means or proportions in two groups Useful analysis strategy: analysis of covariance where trial outcome measured at endpoint (e.g. post-intervention) is related to study group variable and outcome at baseline is included as a covariate (Porter and Raudenbush 1987) Reason for adjustment at baseline: explain portion of the variation post-intervention using baseline

  45. Superiority trials: adjustment for baseline An approach to power calculation: instead of sigma, use “adjusted sigma”. When outcome at post-intervention and outcome at baseline are correlated, the adjusted standard deviation is smaller than unadjusted If adjusted effect size used in sample size calculation, then resulting sample size requirement will be smaller If preliminary data are available, square root of mean square error from the analysis of covariance can be used as an estimate of adjusted sigma

  46. Superiority trials: adjustment for baseline If no preliminary data are available, come up with an estimate of the correlation coefficient r between baseline and post-intervention Frison and Pocock (Stat Med 1992): With the adjustment for baseline, the standard deviation is reduced by a factor of

  47. Example: adjustment for baseline Suppose the trial is to be powered to detect the effect size of d=.3, then earlier we computed n=2(0.84+1.96)2/(0.3)2=174.2, so n=175 per group. Suppose that this effect size corresponds to the difference in means of 1.2 (e.g. pain severity) with the standard deviation of 4, so d=1.2/4=0.3 Suppose that the correlation between pain severity at baseline and post-intervention in r=0.6. This correlation would be different for different outcomes, time length of the intervention etc.

  48. Example: adjustment for baseline If the analysis of covariance is used to test for treatment effect, then the adjusted standard deviation is Adjusted effect size is 1.2/3.2=.375, and n=2(0.84+1.96)2/(0.375)2=111.5, so n=112 per group.

  49. Repeated Measures Patients are followed up over time Different models can be used to investigate different questions Simplest question: additive group effect across time or comparison of average post-intervention values of the outcome between groups One solution is appropriate adjustment to the variance to account for repeated measures (variation between subjects and variation within subject)

  50. Repeated Measures Assume that there are k post-intervention data collection points Assume that the correlation between repeated measures is r, and it is the same between all time points Then the adjusted variance is Here sigma is the variance at one time point (assumed to be the same at each separate time point)

More Related