1 / 46

Interim Analyses of Clinical Trials

Outline. Background and how DSMBs functionGroup sequential methodsExamples. Suggested Readings . Ellenberg S, Fleming TR, DeMets DL.Data Monitoring Committees in Clinical Trials. A Practical PerspectiveJohn Wiley

kevyn
Download Presentation

Interim Analyses of Clinical Trials

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Interim Analyses of Clinical Trials

    2. Outline Background and how DSMBs function Group sequential methods Examples

    3. Suggested Readings Ellenberg S, Fleming TR, DeMets DL. Data Monitoring Committees in Clinical Trials. A Practical Perspective John Wiley & Sons, LTD, 2002 Jennison C and Turnbull BW. Group Sequential Methods with Applications to Clinical Trials Chapman & Hall/CRC 2000

    4. DSMB Decision Making Can Be Complex Internal consistency Benefit/Risk External consistency Current versus future patients Clinical and public health impact Statistical issues – monitoring guidelines

    5. Overall Probability of Achieving a Result with Given Nominal Significance of 0.05 After N Repeated Tests Under Ho 1 .05 2 .083 3 .107 4 .126 5 .142 10 .193 25 .266

    6. Value of Nominal Significance Level Necessary to Achieve a True Level of 0.05 After N Repeated Tests 1 .05 2 .0296 3 .0221 4 .0183 5 .0159 10 .0107

    7. Simulated Trial (T. Fleming Example) Patients enter trial over a 3-year period 1 year minimum follow-up 60 on A; 60 on B Survival distributions are equal Log-rank test used for analysis 5 situations 1 Log-rank test - at 4 years 2 Log-rank tests - every 2 years 4 Log-rank tests - every year 8 Log-rank tests - every 6 months 16 Log-rank tests - every 3 months 100 simulations of the study

    8. Results The log-rank p-value was less than 0.05 at the final test (i.e., at 4 years) in 5 of 100 studies either the 2- or 4-year test in 10 of 100 studies at least 1 of 4 yearly tests in 17 of 100 studies at least 1 of 8 semi-annual tests in 21 of 100 studies at least 1 of 16 3-month tests in 25 of 100 studies

    9. Conclusions 1. If one monitors one’s data closely and considers a p = .05 result obtained at any time to be reflective of a true difference in survival distributions, one will obtain a grossly excessive number of false positives. This excessiveness will be further magnified – if one follows multiple endpoints – if one uses multiple test statistics 2. 25 of 100 studies have a log-rank p-value < 0.05 at at least one of the 16 3-month tests, whereas only 5 of 100 studies do at the last test. Thus, if there are no differences between true survival curves, very many of the closely monitored studies will have “statistically significant” early differences which will disappear later in time.

    10. Early Work Acceptance sampling Wald (1947) sequential probability ratio test Manufacturing problems, continuous monitoring of the data, no upper bound on sample size

    11. Useful References Ellenberg SS, Fleming TR, DeMets DL, Data Monitoring Committees in Clinical Trials, Wiley, 2002. DeMets DL, Furberg CD, Friedman LM. Data Monitoring in Clinical Trials. A Case Studies Approach, Springer, 2006. Jennison C and Turnbull BW, Group Sequential Methods with Applications to Clinical Trials, Chapman and Hall, 2000. Reboussin DM et al, Cont Clin Trials, 2000. http://www.biostat.wisc.edu/landemets

    12. Critical Values (z) for 2-sided Group Sequential Design with .05 Overall Significance and 7 Looks Interim O-Brien/ Haybittle/ Analysis Pocock Fleming Peto

    14. Critical Values

    15. Choosing Critical Values

    19. General Approach Compute sample size as if a single look (fixed sample approach) Specify number of interim analyses and stopping boundary. Inflate sample size to preserve assumed power (not always done as adjustment is minor). Compute the standardized statistic Zk at each analysis and compare with critical values corresponding to monitoring boundary. At the end or upon early termination determine P-values and confidence intervals possessing the usual frequentist interpretations.

    20. Problems with Initial Approach Difficult to specify number of analyses in advance Logistically difficult to organize reviews after equal increments of information. Solutions: Slud and Wei and Lan-DeMets

    21. Flexible Approaches Slud and Wei (JASA, 1982) – specify exit probabilities for each look (stage) such that they sum to ?, e.g., the prob of exiting the kth stage is the joint prob of not exiting the 1st k-1 stages and exiting the kth one. Lan-DeMets (Biometrika, 1983) – specify a use function or type I error spending function, e.g., at time zero, ? used = 0 and with full information ? used = 0.05 (or nominal level)

    22. Spending Function ? (t) (number of events observed at monitoring) (total number of anticipated events)

    23. Plots of Pocock-type and O’Brien Fleming-type spending functions for a one-sided 0.025 significance level, for four analyses at 25%, 50%, 75% and 100% of the expected information. Full titleFull title

    24. Approximate O’Brien Fleming Boundaries Using Lan-DeMets Spending Function Approach: Overall Significance =0.05 and 4 Looks Interim O-Brien OBF Analysis Fleming Lan-DeMets

    25. Usual Choices for Information Planned number of events in event-driven trial Follow-up time Calendar time

    26. Beta-Blocker Heart Attack Trial (BHAT) Placebo-controlled trial of propranolol in patients with a recent MI Recruitment began in June 1978; planned termination June 1982; average of 3 years of follow-up and maximum of 4 Primary endpoint – all-cause mortality Event target - 629 deaths Stopped early in October 1981

    27. Interim Monitoring of BHAT Study 1 May 1979 11 (.23) 56 (.09) 1.68 2 Oct 1979 16 (.33) 77 (.12) 2.24 3 Mar 1980 21 (.44) 126 (.20) 2.37 4 Oct 1980 28 (.58) 177 (.28) 2.30 5 Apr 1981 34 (.71) 247 (.39) 2.34 6 Oct 1981 40 (.83) 318 (.51) 2.82

    28. Critical Values (z) for 2-sided Group Sequential Design with .05 Overall Significance and 7 Looks (BHAT) Interim Lan-DeMets (OBF) Analysis OBF Events Calendar

    29. Flexible Number of Looks Another advantage of the Lan-DeMets spending function approach is the flexibility with the number of looks. Suppose BHAT was not stopped and there were 3 more looks before the end (10 total). Looks 7-10 correspond to information fractions considering the number of events of 0.65, 0.75, 0.85 and 1.0. Stopping boundaries can be calculated conditioned upon the previous tests

    30. Critical Values (z) for 2-sided Group Sequential Design with .05 Overall Significance and 7 Looks (BHAT) Interim Lan-DeMets (OBF) Analysis 7 Looks 10 Looks

    31. Suppose We Get To the 6th Analysis by A Different Route Information fractions are .05, .20, .30, .40, .45 Instead of .09, .12, .20, .28, and .39

    32. Critical Values (z) for 2-sided Group Sequential Design with .05 Overall Significance and 7 Looks (BHAT) Interim Lan-DeMets (OBF) Analysis 7 Looks 7 Looks

    33. Variations of the Theme Asymmetric boundaries (e.g., non-significant harmful effect of new treatment) Use upper boundary for superiority and less conservative boundary for harm (Z= -1.5 or –2.0). Use OBF for efficacy and Pocock for harm Multiple outcomes, e.g., efficacy and safety, and composites Multiple trials (CHARM heart failure, Cox-2 chemo-prevention) Futility and curtailed sampling procedures (conditional and unconditional power) Repeated confidence intervals (e.g., use OBF critical values to compute interim CIs)

    34. SMART Study Design

    35. SMART Guideline “…it is recommended that the DSMB consider early termination or protocol modification only when the O’Brien-Fleming boundary is crossed for the primary endpoint and the findings for the primary and the composite cardiovascular, metabolic endpoint are consistent...”

    39. Futility Usual definition - convincing evidence exists that the new treatment is not beneficial. If this is the case, minimizing exposure to an ineffective treatment with potential toxicities and saving resources should lead to a consideration to stop the trial. What is convincing? Futility, more generally, can also be impacted by low event rate or slow enrollment (e.g., CVD mortality outcome in the Physician’s Health Study).

    40. Conditional Power (or Stochastic Curtailment) What is the probability of rejecting the null hypothesis (i.e., getting a significant result) given the data to date and my best guess about the future, e.g., will look like the past no difference like assumed in the design

    41. Conditional Power: Caution and Applications The future may be hard to predict, e.g., non-proportional hazards Conditional power can be useful to assess the likelihood of a “positive” result becoming neutral/negative, e.g. BHAT, SMART Most useful for futility, e.g., likelihood of a negative or neutral trend reversing.

    42. Unconditional Power What is the probability of rejecting the null hypothesis (i.e., getting a significant result) based on the original design assumptions for the treatment effect, but considering: revised estimate of control group event rate duration of follow-up accounting for recruitment period and minimum follow-up originally planned for each participant Is a null result still meaningful?

    43. Guideline for HIV Early Treatment Trial (START) 1st consider unconditional power. If < 70%, consider conditional power. If conditional power is < 20%, consider stopping for futility. Rationale: Unconditional power could be low in the presence of a large treatment effect.

    44. Summary (1) Many studies require a DSMB Trials with morbidity and mortality outcomes Trials of treatments that may be associated with serious toxicities (need to have a group look a controlled comparisons) Trials of novel, high risk treatments (e.g., gene therapy) Trials involving frail populations (elderly, infants)

    45. Summary (2) A DSMB can be most effective in its role of protecting the interests of patients if it is independent of the sponsor and trial investigators – peer review works! Operating procedures should be agreed upon in advance An informed statistician who performs interim analyses is important To carry out interim analyses data must be collected in a timely way Reports should focus on comparisons of clinical outcomes and their validity

    46. Summary (3) Monitoring guidelines should be pre-specified Guidelines need to be accompanied with common sense, a careful assessment of risks and benefits, and and opinions from experts from different backgrounds. This is a fruitful area for research (e.g., multiple outcomes (safety and efficacy, multiple efficacy)

    47. Recommendation from Paul Canner based on his experiences in Coronary Drug Project “…no single statistical decision rule or procedure can take the place of the well-reasoned consideration of all aspects of the data by a group of concerned, competent, and experienced persons with a wide range of scientific backgrounds and points of view.”

More Related