440 likes | 583 Views
Why some/many (all?) published clinical trials are false. John P.A. Ioannidis Professor and Chairman, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece Professor of Medicine (adjunct), Tufts University School of Medicine, Boston, USA.
E N D
Why some/many (all?) published clinical trials are false John P.A. Ioannidis Professor and Chairman, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece Professor of Medicine (adjunct), Tufts University School of Medicine, Boston, USA
Why research findings may not be credible? • There is bias • There is random error • Usually there is plenty of both
Discrepancies over time occur in randomized trials Ioannidis and Lau, PNAS 2001
Diminishing effects are common in clinical medicine • Across 100 meta-analyses of mental health related interventions, when it comes to pharmacotherapies, it was far more likely for effect sizes to diminish rather than increase with the appearance of newer trials Trikalinos et al. J Clin Epidemiol 2004
Highly-cited contradicted findings in early randomized trials • Vitamin E and cardiovascular mortality (two large prospective cohorts, but also one large trial of 2,002 subjects claimed large decreases in mortality) • Hormone replacement therapy and coronary artery disease (major benefits claimed by the Nurses’ Health Study, but also by small trials) • A well-conducted randomized trial suggested that the monoclonal antibody HA-1A halves mortality from gram(-) sepsis; no effect was seen in a 10-times larger RCT
Overall credibility • Depends on the pre-evidence odds • Depends on the data (the study at hand) • Depends on bias • Depends on the field • All of these may depend on each other
Illustrative PPV for clinical research designsIoannidis. Why most published research findings are false? PLoS Medicine 2005
Post-study odds of a true finding are small • When effect sizes are small • When studies are small • When field are “hot” (many furtively competitively teams work on them) • When there is strong interest in the results • When databases are large • When analyses are more flexible Ioannidis JP. PLoS Medicine 2005
A research finding cannot reach credibility over 50% unless u<Ri.e., bias must be less than the pre-study odds
“Quality” of studies • Early empirical evaluations suggested that effect sizes may depend on aggregate quality scores; this has been dismissed, since there are so many quality scores, that inferences are widely different • Other empirical evaluations suggested that specific quality items such as lack of blinding and lack of allocation concealment in RCTs may inflate treatment effects (e.g. Shultz et al. JAMA 1995) • Now it seems more likely that such quality deficits may be associated either with inflated or with deflated treatment effects (e.g. Balk et al. JAMA 2002)
“Averaging” quality is wrong • A randomized trial with one major flaw may get the wrong answer • A randomized trial with two major flaws may get an even more wrong answer or may paradoxically get a somehow more correct answer • Flaws do not cancel out of course, and they may even have multiplicative detrimental effects
The two kinds of bad quality • Quality is bad on (evil) purpose = the effect sizes are almost always inflated • Quality is bad because of stupidity = the effect sizes may be anything; usually, but not always, they are deflated
Potential conflicts Patsopoulos et al. BMJ 2006
Ioannidis PLoS Clinical Trials 2006 and Clinical Trials 2007
Spurious claims of subgroups Rothwell P. Lancet 2005
Time lag: bad news take longer to appear Ioannidis JP. JAMA 1998
Trial registration • Upfront study registration has been adopted for randomized clinical trials, as a means for minimizing publication and reporting biases and maximizing transparency • This is an extremely important step forward. • Still many trials are not registered and also among those that are registered there is room for eventual selective reporting of outcomes and analyses • Even with transparent and complete reporting there is room for biases that act before the level of study design
Biases that precede the study design • Setting the wider research agenda • Poor scientific relevance • Poor clinical utility • Poor consideration of prior evidence • Non-consideration of prior evidence • Biased consideration of prior evidence • Consideration of biased prior evidence • Setting the specific research agenda • Straw man effects • Avoidance of head-to-head comparisons • Head-to-head comparisons bypassing demonstration of effectiveness • Overpowered studies • Unilateral aims • Benefits versus harms • Research as bulk advertisement • Ghost management of the literature
Inflated effects with early stopping Pocock et al. Clinical Trials 1989
Biases after study completion • Interpretation biases for the single study • Bias related to metric selection • OR vs. RR • Absolute versus relative effects • P-values versus effect sizes • Selective discussion of results • Selective invocation of external evidence • Silencing of limitations • Inappropriate generalization • Interpretation biases in the wider scientific field • Publication bias • Time lag bias • Selective outcome and analysis reporting bias • Bias related to metrics of effect • Ghost management of the literature • Scientific citation bias • Skewed public dissemination • Resistance to independent replication
Correct, but unilateral = false evidenceNeglecting harms • Among 375,143 entries in the Cochrane Central Register of Controlled Trials, the search terms harm OR harms yielded 337 references • Compare: 55,374 retrieved using efficacy and 23,415 retrieved using safety • Of the 337, excluding several cases articles on self-harm or harm-reduction (an efficacy-equivalent term): only 3 trial reports and 2 abstracts had these words in their titles • Of the 3 trial reports, one started with the clause “more good than harm” • The other two actually focused on the harms of the placebo arm
Harms • An intervention is usually considered safe unless proven otherwise • It may be more appropriate to consider an intervention potentially harmful until proven otherwise
Reporting of harms in RCTs is neglected • The space allocated to harms in the Results section is typically the same or smaller than the space allocated to the author names and affiliations • Ioannidis and Lau, JAMA 2001
Emphasis on harms is often further limited… • When no dose comparison is involved • When a trial appears in a high-impact factor journal • When there is a prior indication for the intervention • When the trial shows significant results for efficacy
Reporting of harms is worse for NP than for pharmacological interventions
Concluding comments • Randomized controlled trials are a brilliant, simple design with solid history of successful utilization in clinical research • They can offer extremely useful evidence and they are a must for documenting the effectiveness of proposed interventions • This does not mean that they cannot suffer from important major biases. • Caveat lector