Why some/many (all?) published clinical trials are false

Why some/many (all?) published clinical trials are false John P.A. Ioannidis Professor and Chairman, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece Professor of Medicine (adjunct), Tufts University School of Medicine, Boston, USA

Why research findings may not be credible? • There is bias • There is random error • Usually there is plenty of both

Discrepancies over time occur in randomized trials Ioannidis and Lau, PNAS 2001

Diminishing effects are common in clinical medicine • Across 100 meta-analyses of mental health related interventions, when it comes to pharmacotherapies, it was far more likely for effect sizes to diminish rather than increase with the appearance of newer trials Trikalinos et al. J Clin Epidemiol 2004

Highly-cited contradicted findings in early randomized trials • Vitamin E and cardiovascular mortality (two large prospective cohorts, but also one large trial of 2,002 subjects claimed large decreases in mortality) • Hormone replacement therapy and coronary artery disease (major benefits claimed by the Nurses’ Health Study, but also by small trials) • A well-conducted randomized trial suggested that the monoclonal antibody HA-1A halves mortality from gram(-) sepsis; no effect was seen in a 10-times larger RCT

Overall credibility • Depends on the pre-evidence odds • Depends on the data (the study at hand) • Depends on bias • Depends on the field • All of these may depend on each other

Simple model: no bias, one team of researchers

Bias present

Many teams of researchers

Illustrative PPV for clinical research designsIoannidis. Why most published research findings are false? PLoS Medicine 2005

Post-study odds of a true finding are small • When effect sizes are small • When studies are small • When field are “hot” (many furtively competitively teams work on them) • When there is strong interest in the results • When databases are large • When analyses are more flexible Ioannidis JP. PLoS Medicine 2005

A research finding cannot reach credibility over 50% unless u<Ri.e., bias must be less than the pre-study odds

“Quality” of studies • Early empirical evaluations suggested that effect sizes may depend on aggregate quality scores; this has been dismissed, since there are so many quality scores, that inferences are widely different • Other empirical evaluations suggested that specific quality items such as lack of blinding and lack of allocation concealment in RCTs may inflate treatment effects (e.g. Shultz et al. JAMA 1995) • Now it seems more likely that such quality deficits may be associated either with inflated or with deflated treatment effects (e.g. Balk et al. JAMA 2002)

“Averaging” quality is wrong • A randomized trial with one major flaw may get the wrong answer • A randomized trial with two major flaws may get an even more wrong answer or may paradoxically get a somehow more correct answer • Flaws do not cancel out of course, and they may even have multiplicative detrimental effects

The two kinds of bad quality • Quality is bad on (evil) purpose = the effect sizes are almost always inflated • Quality is bad because of stupidity = the effect sizes may be anything; usually, but not always, they are deflated

Potential conflicts Patsopoulos et al. BMJ 2006

Ioannidis PLoS Clinical Trials 2006 and Clinical Trials 2007

Exploratory test for significance chasing

Spurious claims of subgroups Rothwell P. Lancet 2005

Month of birth and benefit from endarterectomy

Time lag: bad news take longer to appear Ioannidis JP. JAMA 1998

… even though they are obtained as fast..

…but publication is delayed

Trial registration • Upfront study registration has been adopted for randomized clinical trials, as a means for minimizing publication and reporting biases and maximizing transparency • This is an extremely important step forward. • Still many trials are not registered and also among those that are registered there is room for eventual selective reporting of outcomes and analyses • Even with transparent and complete reporting there is room for biases that act before the level of study design

Biases that precede the study design • Setting the wider research agenda • Poor scientific relevance • Poor clinical utility • Poor consideration of prior evidence • Non-consideration of prior evidence • Biased consideration of prior evidence • Consideration of biased prior evidence • Setting the specific research agenda • Straw man effects • Avoidance of head-to-head comparisons • Head-to-head comparisons bypassing demonstration of effectiveness • Overpowered studies • Unilateral aims • Benefits versus harms • Research as bulk advertisement • Ghost management of the literature

Clinical trials and burden of disease in sub-Saharan Africa

Geometry of treatment networks

Inflated effects with early stopping Pocock et al. Clinical Trials 1989

Biases after study completion • Interpretation biases for the single study • Bias related to metric selection • OR vs. RR • Absolute versus relative effects • P-values versus effect sizes • Selective discussion of results • Selective invocation of external evidence • Silencing of limitations • Inappropriate generalization • Interpretation biases in the wider scientific field • Publication bias • Time lag bias • Selective outcome and analysis reporting bias • Bias related to metrics of effect • Ghost management of the literature • Scientific citation bias • Skewed public dissemination • Resistance to independent replication

Correct, but unilateral = false evidenceNeglecting harms • Among 375,143 entries in the Cochrane Central Register of Controlled Trials, the search terms harm OR harms yielded 337 references • Compare: 55,374 retrieved using efficacy and 23,415 retrieved using safety • Of the 337, excluding several cases articles on self-harm or harm-reduction (an efficacy-equivalent term): only 3 trial reports and 2 abstracts had these words in their titles • Of the 3 trial reports, one started with the clause “more good than harm” • The other two actually focused on the harms of the placebo arm

Harms • An intervention is usually considered safe unless proven otherwise • It may be more appropriate to consider an intervention potentially harmful until proven otherwise

Reporting of harms in RCTs is neglected • The space allocated to harms in the Results section is typically the same or smaller than the space allocated to the author names and affiliations • Ioannidis and Lau, JAMA 2001

Emphasis on harms is often further limited… • When no dose comparison is involved • When a trial appears in a high-impact factor journal • When there is a prior indication for the intervention • When the trial shows significant results for efficacy

Reporting of harms is worse for NP than for pharmacological interventions

Mental health trials- no harms recorded for any NP trials

Large-scale evidence is very sparse

Integration after the fact is not easy

Concluding comments • Randomized controlled trials are a brilliant, simple design with solid history of successful utilization in clinical research • They can offer extremely useful evidence and they are a must for documenting the effectiveness of proposed interventions • This does not mean that they cannot suffer from important major biases. • Caveat lector

Why some/many (all?) published clinical trials are false

Why some/many (all?) published clinical trials are false

Presentation Transcript