1 / 48

Estimation of Effect Size in Trials Stopped Early

Estimation of Effect Size in Trials Stopped Early. Janet Wittes Statistics Collaborative University of Pennsylvania Annual Conference on Statistical Issues in Clinical Trials April 13, 2011. The problem. We know how to stop trials early for benefit Most common boundary: O’Brien-Fleming

thanh
Download Presentation

Estimation of Effect Size in Trials Stopped Early

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Estimation of Effect Size in Trials Stopped Early Janet Wittes Statistics Collaborative University of Pennsylvania Annual Conference on Statistical Issues in Clinical TrialsApril 13, 2011

  2. The problem • We know how to stop trials early for benefit • Most common boundary: O’Brien-Fleming • We do not generally operate algorithmically • So “boundary” is a “guideline” • We know the observed effect overestimates truth • So…how do we estimate effect size?

  3. The problem is very hard… • So, my recommendations will not be “wrong”

  4. The problem is very hard… • So, my recommendations will not be “wrong” • But, unfortunately, they won’t be “right”

  5. The solution • Frequentist – we have many choices • Bayesian/likelihood – we have no problem • It is what it is….

  6. Some examples of early stopping • MERIT-HF • Study stopped at 2nd interim analysis • RR 0.66; 95% CI= (0.53, 0,81) • Sunitinib in pancreatic islet cell tumors • Study stopped with ½ patients • PFS 5.5 mo vs 11.1 mo; HR=0.4; p<0.001 • RALES – will discuss later

  7. The Bassler paper • Objective: to compare the treatment effect from truncated RCTs with that from meta-analyses of RCTs addressing the same question but not stopped early (nontruncated RCTs) and to explore factors associated with overestimates of effect

  8. The Bassler paper • Objective: to compare the treatment effect from truncated RCTs with that from meta-analyses of RCTs addressing the same question but not stopped early (nontruncated RCTs) and to explore factors associated with overestimates of effect • Conclusions: Truncated RCTs were associated with greater effect sizes than RCTs not stopped early. • This difference was independent of the presence of statistical stopping rules and was greatest in smaller studies.

  9. The responses: paper is off base • JAMA allows at most three letters!!!! • Scott Berry, Carlin, Connor • Ellenberg, DeMets, Fleming • Goodman, Don Berry, Wittes • Korn, Freidlin, Mooney

  10. Reasons: math faulty • Berry-Carlin-Connor • “paper incorporated an important scientific and logical error that led to invalid conclusions” • Goodman-Berry-Wittes • “unfortunately, their conclusions are based on faulty mathematical reasoning.”

  11. Reasons: don’t prevent early stopping • Korn-Freidlin-Mooney • “stopping a trial and releasing the information early allows current and future patients to benefit from new therapies as soon as possible.” • Ellenberg-DeMets-Fleming • “they seem to be warning against early trial termination. This is a much more complex issue on which the problem of modest upward bias of the effect estimate, readily remediable by existing methodology, should have little bearing”

  12. Frequentist approach: the p-value • No monitoring – two definitions • Smallest a for which results would be stat sign’t • Prob under Ho that the test stat is  observed

  13. Definition #1: p is smallest a needed for statistical significance • Imagine we observe a z=2.94 • What is the smallest a for significance?

  14. Definition #1: p is smallest a needed for statistical significance • Imagine we observe a z=2.94 • What is the smallest a for significance?

  15. Definition #1: p is smallest a needed for statistical significance • Imagine we observe a z=2.94 • Smallest a for significance=0.0016 =0.0016

  16. Definition #2: p: probability under Ho that test stat is  observed • Imagine we observe a z=2.94 • Under Ho, Prob(z  2.94) = o.oo16 0.0016

  17. So the definitions are equivalent!

  18. So the definitions are equivalent! • But this is not true in group sequential designs

  19. Group-sequential p-values • Smallest a for which results would be stat sign’t • Think of class of similar boundaries with different a • E.g. O-F boundaries with k equally spaced looks • What is the smallest a giving stat’l significance? • Example: 5 planned looks

  20. Imagine we observe Z=2.94

  21. Z=2.94  “p” = 0.002

  22. Z=2.94  B=zt1/2=2.28

  23. Z=2.94  B=2.28

  24. Z=2.94  B=2.28 p=0.013

  25. But this is not exactly right… • What is the probability of all paths with B2.28 • Answer: 0.010 • (why? some paths are not possible – if we had observed B(1/5) = 1.8, B(2/5) = 2.2, we would have stopped at look 2).

  26. And something is odd… • Our exit probabilities depend on the future • But the future hasn’t happened yet • “Prediction is hard – especially about the future” • Niels Bohr (not Yogi Berra)

  27. Definition #2. p: probability under Ho that test stat is  observed • We have observed (t, Z(t)). • What is more extreme?

  28. Which point is more extreme?

  29. Which point is more extreme?

  30. Which point is more extreme?

  31. Which point is more extreme?

  32. Which point is more extreme?

  33. Orderings • Unadjusted: 0.0016 • B-value ordering: 0.010 • Z-value ordering: 0.003 • MLE ordering: 0.002

  34. Stagewise ordering • Earlier stopping: more compelling evidence • Two trials stopping at same time • one with larger z is more extreme • If our last observation is (tj, zj), the p-value is: • PHo {stop before tj}+ • PHo {don’t stop before tjand Z(tjzj) • Note: this does not need to consider future looks

  35. Stagewise p • PHo {stop before tj}+ • PHo {don’t stop before tjand Z(tjzj)= • .000395+(.999605)(.001568) = 0.0020

  36. Orderings • Unadjusted 0.0016 • B-value ordering 0.010 • Z-value ordering 0.003 • MLE ordering 0.002 • Stagewise 0.0020

  37. Confidence intervals • Can use stagewise ordering for CL • Find upper and lower bounds such that • PL{[t, z(t)][t, z(t)]}=a/2 • PU{[t, z(t)]≤[t, z(t)]}=a/2 • Confidence limit may exclude the MLE

  38. Estimates • Use Emerson & Fleming, Biometrika 1990 • See also Liu & Hall, Biometrika 1999 • These are not easy to calculate

  39. Estimates • These are not easy to calculate • Instead of x

  40. Estimates

  41. Proposal • For p-value, use stagewise ordering • For confidence intervals, back-calculate from p • Use numerical integration • Or grid search on landem • For estimate, back-calculate from CI • Note: the CI may not include naïve estimator

  42. Example: RALES • Spironolactone • Class III and IV heart failure • Primary outcome: total mortality (a=0.04) • O-F boundaries

  43. Results

  44. Results

  45. Results

  46. Results

  47. Results

  48. Summary • Protocols that include monitoring rules • Suggest that we add how you will calculate • p-values • Confidence intervals • Estimates • My recommendation: use stagewise ordering

More Related