500 likes | 924 Views
Outline. Background and how DSMBs functionGroup sequential methodsExamples. Suggested Readings . Ellenberg S, Fleming TR, DeMets DL.Data Monitoring Committees in Clinical Trials. A Practical PerspectiveJohn Wiley
E N D
1. Interim Analyses ofClinical Trials
2. Outline Background and how DSMBs function
Group sequential methods
Examples
3. Suggested Readings Ellenberg S, Fleming TR, DeMets DL.
Data Monitoring Committees in Clinical Trials. A Practical Perspective
John Wiley & Sons, LTD, 2002
Jennison C and Turnbull BW.
Group Sequential Methods with Applications to Clinical Trials
Chapman & Hall/CRC 2000
4. DSMB Decision Making Can Be Complex Internal consistency
Benefit/Risk
External consistency
Current versus future patients
Clinical and public health impact
Statistical issues – monitoring guidelines
5. Overall Probability of Achieving a Result with Given Nominal Significance of 0.05 After N Repeated Tests Under Ho 1 .05
2 .083
3 .107
4 .126
5 .142
10 .193
25 .266
6. Value of Nominal Significance Level Necessary to Achieve a True Level of 0.05 After N Repeated Tests 1 .05
2 .0296
3 .0221
4 .0183
5 .0159
10 .0107
7. Simulated Trial(T. Fleming Example) Patients enter trial over a 3-year period
1 year minimum follow-up
60 on A; 60 on B
Survival distributions are equal
Log-rank test used for analysis
5 situations
1 Log-rank test - at 4 years
2 Log-rank tests - every 2 years
4 Log-rank tests - every year
8 Log-rank tests - every 6 months
16 Log-rank tests - every 3 months
100 simulations of the study
8. Results The log-rank p-value was less than 0.05 at
the final test (i.e., at 4 years) in 5 of 100 studies
either the 2- or 4-year test in 10 of 100 studies
at least 1 of 4 yearly tests in 17 of 100 studies
at least 1 of 8 semi-annual tests in 21 of 100 studies
at least 1 of 16 3-month tests in 25 of 100 studies
9. Conclusions 1. If one monitors one’s data closely and considers a p = .05 result obtained at any time to be reflective of a true difference in survival distributions, one will obtain a grossly excessive number of false positives.
This excessiveness will be further magnified
– if one follows multiple endpoints
– if one uses multiple test statistics
2. 25 of 100 studies have a log-rank p-value < 0.05 at at least one of the 16 3-month tests, whereas only 5 of 100 studies do at the last test.
Thus, if there are no differences between true survival curves, very many of the closely monitored studies will have “statistically significant” early differences which will disappear later in time.
10. Early Work Acceptance sampling
Wald (1947) sequential probability ratio test
Manufacturing problems, continuous monitoring of the data, no upper bound on sample size
11. Useful References Ellenberg SS, Fleming TR, DeMets DL, Data Monitoring Committees in Clinical Trials, Wiley, 2002.
DeMets DL, Furberg CD, Friedman LM. Data Monitoring in Clinical Trials. A Case Studies Approach, Springer, 2006.
Jennison C and Turnbull BW, Group Sequential Methods with Applications to Clinical Trials, Chapman and Hall, 2000.
Reboussin DM et al, Cont Clin Trials, 2000.
http://www.biostat.wisc.edu/landemets
12. Critical Values (z) for 2-sided Group Sequential Design with .05 Overall Significance and 7 Looks Interim O-Brien/ Haybittle/
Analysis Pocock Fleming Peto
14. Critical Values
15. Choosing Critical Values
19. General Approach Compute sample size as if a single look (fixed sample approach)
Specify number of interim analyses and stopping boundary.
Inflate sample size to preserve assumed power (not always done as adjustment is minor).
Compute the standardized statistic Zk at each analysis and compare with critical values corresponding to monitoring boundary.
At the end or upon early termination determine P-values and confidence intervals possessing the usual frequentist interpretations.
20. Problems with Initial Approach Difficult to specify number of analyses in advance
Logistically difficult to organize reviews after equal increments of information.
Solutions: Slud and Wei and Lan-DeMets
21. Flexible Approaches Slud and Wei (JASA, 1982) – specify exit probabilities for each look (stage) such that they sum to ?, e.g., the prob of exiting the kth stage is the joint prob of not exiting the 1st k-1 stages and exiting the kth one.
Lan-DeMets (Biometrika, 1983) – specify a use function or type I error spending function, e.g., at time zero, ? used = 0 and with full information ? used = 0.05 (or nominal level)
22. Spending Function ? (t) (number of events observed at monitoring)
(total number of anticipated events)
23. Plots of Pocock-type and O’Brien Fleming-type spending functions for a one-sided 0.025 significance level, for four analyses at 25%, 50%, 75% and 100% of the expected information. Full titleFull title
24. Approximate O’Brien Fleming Boundaries Using Lan-DeMets Spending Function Approach: Overall Significance =0.05 and 4 Looks Interim O-Brien OBF
Analysis Fleming Lan-DeMets
25. Usual Choices for Information Planned number of events in event-driven trial
Follow-up time
Calendar time
26. Beta-Blocker Heart Attack Trial (BHAT) Placebo-controlled trial of propranolol in patients with a recent MI
Recruitment began in June 1978; planned termination June 1982; average of 3 years of follow-up and maximum of 4
Primary endpoint – all-cause mortality
Event target - 629 deaths
Stopped early in October 1981
27. Interim Monitoring of BHAT Study 1 May 1979 11 (.23) 56 (.09) 1.68
2 Oct 1979 16 (.33) 77 (.12) 2.24
3 Mar 1980 21 (.44) 126 (.20) 2.37
4 Oct 1980 28 (.58) 177 (.28) 2.30
5 Apr 1981 34 (.71) 247 (.39) 2.34
6 Oct 1981 40 (.83) 318 (.51) 2.82
28. Critical Values (z) for 2-sided Group Sequential Design with .05 Overall Significance and 7 Looks(BHAT) Interim Lan-DeMets (OBF)
Analysis OBF Events Calendar
29. Flexible Number of Looks Another advantage of the Lan-DeMets spending function approach is the flexibility with the number of looks.
Suppose BHAT was not stopped and there were 3 more looks before the end (10 total).
Looks 7-10 correspond to information fractions considering the number of events of 0.65, 0.75, 0.85 and 1.0.
Stopping boundaries can be calculated conditioned upon the previous tests
30. Critical Values (z) for 2-sided Group Sequential Design with .05 Overall Significance and 7 Looks(BHAT) Interim Lan-DeMets (OBF)
Analysis 7 Looks 10 Looks
31. Suppose We Get To the 6th Analysis by A Different Route Information fractions are .05, .20, .30, .40, .45
Instead of .09, .12, .20, .28, and .39
32. Critical Values (z) for 2-sided Group Sequential Design with .05 Overall Significance and 7 Looks(BHAT) Interim Lan-DeMets (OBF)
Analysis 7 Looks 7 Looks
33. Variations of the Theme Asymmetric boundaries (e.g., non-significant harmful effect of new treatment)
Use upper boundary for superiority and less conservative boundary for harm (Z= -1.5 or –2.0).
Use OBF for efficacy and Pocock for harm
Multiple outcomes, e.g., efficacy and safety, and composites
Multiple trials (CHARM heart failure, Cox-2 chemo-prevention)
Futility and curtailed sampling procedures (conditional and unconditional power)
Repeated confidence intervals (e.g., use OBF critical values to compute interim CIs)
34. SMART Study Design
35. SMART Guideline “…it is recommended that the DSMB consider early termination or protocol modification only when the O’Brien-Fleming boundary is crossed for the primary endpoint and the findings for the primary and the composite cardiovascular, metabolic endpoint are consistent...”
39. Futility Usual definition - convincing evidence exists that the new treatment is not beneficial.
If this is the case, minimizing exposure to an ineffective treatment with potential toxicities and saving resources should lead to a consideration to stop the trial.
What is convincing?
Futility, more generally, can also be impacted by low event rate or slow enrollment (e.g., CVD mortality outcome in the Physician’s Health Study).
40. Conditional Power (or Stochastic Curtailment) What is the probability of rejecting the null hypothesis (i.e., getting a significant result) given the data to date and my best guess about the future, e.g.,
will look like the past
no difference
like assumed in the design
41. Conditional Power: Caution and Applications The future may be hard to predict, e.g., non-proportional hazards
Conditional power can be useful to assess the likelihood of a “positive” result becoming neutral/negative, e.g. BHAT, SMART
Most useful for futility, e.g., likelihood of a negative or neutral trend reversing.
42. Unconditional Power What is the probability of rejecting the null hypothesis (i.e., getting a significant result) based on the original design assumptions for the treatment effect, but considering:
revised estimate of control group event rate
duration of follow-up accounting for recruitment period and minimum follow-up originally planned for each participant
Is a null result still meaningful?
43. Guideline for HIV Early Treatment Trial (START) 1st consider unconditional power. If < 70%, consider conditional power.
If conditional power is < 20%, consider stopping for futility.
Rationale: Unconditional power could be low in the presence of a large treatment effect.
44. Summary (1) Many studies require a DSMB
Trials with morbidity and mortality outcomes
Trials of treatments that may be associated with serious toxicities (need to have a group look a controlled comparisons)
Trials of novel, high risk treatments (e.g., gene therapy)
Trials involving frail populations (elderly, infants)
45. Summary (2) A DSMB can be most effective in its role of protecting the interests of patients if it is independent of the sponsor and trial investigators – peer review works!
Operating procedures should be agreed upon in advance
An informed statistician who performs interim analyses is important
To carry out interim analyses data must be collected in a timely way
Reports should focus on comparisons of clinical outcomes and their validity
46. Summary (3) Monitoring guidelines should be pre-specified
Guidelines need to be accompanied with common sense, a careful assessment of risks and benefits, and and opinions from experts from different backgrounds.
This is a fruitful area for research (e.g., multiple outcomes (safety and efficacy, multiple efficacy)
47. Recommendation from Paul Canner based on his experiences in Coronary Drug Project “…no single statistical decision rule or procedure can take the place of the well-reasoned consideration of all aspects of the data by a group of concerned, competent, and experienced persons with a wide range of scientific backgrounds and points of view.”