1 / 31

Biostatistics in Practice

Biostatistics in Practice. Session 6: Data and Analyses: Too Little or Too Much. Peter D. Christenson Biostatistician http://research.LABioMed.org/Biostat. Too Little or Too Much: Data . Too Little Too few subjects: study not sufficiently powered ( Session 4 )

bonnie
Download Presentation

Biostatistics in Practice

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biostatistics in Practice Session 6: Data and Analyses: Too Little or Too Much Peter D. Christenson Biostatistician http://research.LABioMed.org/Biostat

  2. Too Little or Too Much:Data • Too Little • Too few subjects: study not sufficiently powered (Session 4) • A biasing characteristic not measured: attributability of effects questionable (Session 5) • Subjects do not complete study, or do not comply, e.g., take all doses (This session) • “Too Much” • All subjects, not a sample (This session) • Irrelevant detectability (This session)

  3. Too Little or Too Much:Analyses • Too Few: Miss an Effect • Too Many: Spurious Results • Numerous analyses due to: • Multiple possible outcomes. • Ongoing analyses as more subjects accrue. • Many potential subgroups.

  4. Non-Completing or Non-Complying Subjects

  5. All Study Subjects or “Appropriate” Subset What is the most relevant group of studied subjects: all randomized, or mostly compliant, or completed study, or …?

  6. Criteria for Appropriate Subset Study Goal: Scientific effect? Societal impact? Primarily Compliance Potential Biased Conclusions: Why not completed? Study arms equivalent? Primarily Dropout

  7. Possible Study Populations Per-Protocol Subjects: Had all measurements, visits, doses, etc. “Modified”: relaxations , e.g., 85% of doses. Emphasis on scientific effect. Intention-to-Treat Subjects: Everyone who was randomized. “Modified”: slight relaxations, e.g., ≥ 1 dose. Emphasis on non-biased policy conclusion.

  8. Possible Bias Using Only Completers Comparison: % cured, placebo vs. treated. Many more placebo subjects are not curing and go elsewhere; do not complete study. Cure rate is biased upward in placebo completers. Conclude treatment not as good as it really is. Other scenarios?

  9. Intention-to-Treat (ITT) ITT specifies the population; it includes non-completers. Still need to define outcomes for non-completers, i.e., “impute” values. Example from last slide: Typical to define non-completers as not cured.

  10. ITT: Two Ways to Impute Unknown Values Observations LOCF: Ignore Presumed Progression 0 Change from Baseline Individual Subjects Baseline Intermediate Visit Final Visit Ranks LRCF: Maintain Expected Relative Progression 0 Change from Baseline Intermediate Visit Baseline Final Visit

  11. “Too Much” Data

  12. All Possible Data, No Sample “Too much” data to need probabilistic statements; already have the whole truth. Not always as obvious as it sounds. Examples: EMT records, some chart reviews; site-specific, not samples. Confidence intervals usually irrelevant. Reference ranges, some non - generalizable comparisons may be valid.

  13. Irrelevant (?) Detectability with Large Study Significant differences (p<0.05) in %s between placebo and treatment groups: N/Group Difference #Treated* to Cure 1 100 50% vs. 63.7% 7 1000 50% vs. 54.4% 23 5000 50% vs. 52.0% 50 10000 50% vs. 51.4% 71 50000 50% vs. 50.6% 167 *NNT = Number Needed to Treat = 100/Δ

  14. Too Little or Too Much:Analyses

  15. Too Little or Too Much:Analyses Multiple: Outcomes Subgroups Ongoing effects Exploring vs. Proving

  16. Multiple Outcomes • Balance Between Missing an Effect and Spurious Results • Food Additives and Hyperactivity Study: • Uses composite score. • Many other indicators of hyperactivity.

  17. Multiple Outcomes Parent ADHD 10 Items … GHA: Global Hyperactivity Aggregate Teacher ADHD 10 Items … Class ADHD 12 Items … 4 Items Conner … Could perform: 10 + 10 + 12 + 4 = 36 item analyses.

  18. Multiple Subgroup Analyses: Example Editorial: pp. 1667-69

  19. Multiple Subgroup Analyses: Example Comparing Two Treatments in 25 Subgroups + Overall

  20. Multiple Subgroup Analyses Lagakos NEJM 354(16):1667-1669. False Positive Conclusions 72% chance of claiming at least one false effect with 25 comparisons Next Slide

  21. A Correction for Multiple Analyses No Correction: If using p<0.05, then P[correct neg conclusion] = 0.95. If 25 comparisons are independent, P[no false pos] = P[all correct neg] = (1-0.05)25=(0.95)25= 0.28. So, P[at least 1 false pos] = 1 - 0.28 = 0.72. Bonferroni Correction: To maintain P[no false pos in k tests] = 0.95 = (1-p*)k, need to use p* = 1 - (0.95)1/k≈ 0.05/k So, use p<0.05/k to maintain <5% overall false positive rate.

  22. Accounting for Multiple Analyses • Some formal corrections “built-in” to p-values: • Bonferroni: general purpose • Tukey: for pairs of group means, >2 groups • Dunnett: for means of 1 control group vs. each of ≥2 treatment groups • Formal corrections not necessary: • Transparency of what was done is most important. • Should be aware yourself of number of analyses and report it with any conclusions.

  23. Reporting Multiple Analyses Clopidogrel paper 4 slides back: No p-values or probabilistic conclusions for 25 subgroups, and: Another paper’s transparency: Cohan, Crit Care Med 33(10):2358-2366.

  24. Multiple Mid-Study Analyses Should effects be monitored as more and more subjects complete? • Some mid-study analyses: • Interim analyses • Study size re-evaluation • Feasibility analyses

  25. Mid-Study Analyses Too many analyses Effect 0 Wrong early conclusion Time→ Number of Subjects Enrolled Need to monitor, but also account for many analyses

  26. Mid-Study Analyses • Mid-study comparisons should not be made before study completion unless planned for (interim analyses). Early comparisons are unstable, and can invalidate final comparisons. • Interim analyses are planned comparisons at specific times, usually by an unmasked advisory board. They allow stopping the study early due to very dramatic effects, and final comparisons, if study continues, are adjusted to validly account for “peeking”. Continued …

  27. Mid-Study Analyses • Mid-study reassessment of study sizeis advised for long studies. Only standard deviations to date, not effects themselves, are used to assess original design assumptions. • Feasibility analysis: • may use the assessment noted above to decide whether to continue the study. • may measure effects, like interim analyses, by unmasked advisors, to project ahead on the likelihood of finding effects at the planned end of study. Continued …

  28. Mid-Study Analyses Examples: Studies at Harbor Randomized; not masked; data available to PI. Compared treatment groups repeatedly, as more subjects were enrolled. Study 1: Groups do not differ; plan to add more subjects. Consequence→ final p-value not valid; probability requires no prior knowledge of effect. Study 2:Groups differ significantly; plan to stop study. Consequence→ use of this p-value not valid; the probability requires incorporating later comparison.

  29. Conclusions:Bad Science That Seems So Good • Re-examining data, or using many outcomes, seeming to be due diligence. • Adding subjects to a study that is showing marginal effects; stopping early due to strong results. • Looking for effects in many subgroups. • Actually bad? Could be negligent NOT to do these, but need to account for doing them.

  30. Course Over? Already? Nils Simonson, in Furberg & Furberg, Evaluating Clinical Research

More Related