Randomisation: necessary but not sufficient

Randomisation: necessary but not sufficient Doug Altman Centre for Statistics in Medicine University of Oxford

Randomisation is not enough • The aim of an RCT is to compare groups equivalent in all respects other than the treatment itself • Randomisation can only produce groups that are comparable at the start of a study • Other aspects of good trial design are required to retain comparability to the end • Randomised trials are conceptually simple but • easy to do badly • hard to do well

… what could possibly go wrong?”

“Clinical trials are only as strong as the weakest elements of design, execution and analysis” [Hart HG. NEJM 1992]

Randomised trials An incomplete compendium of errors: Design Analysis Interpretation Selective publication Reporting Implications

DESIGN

Trial was not really randomised Pediatrics 2009;123;e661-7

The study population comprised children attending the second and third grades of elementary schools in deprived neighborhoods of 2 neighboring cities, namely, Dortmund and Essen, Germany … Schools in Dortmund represented the intervention group (IG) and schools in Essen the control group (CG). For each city, 20 schools were selected randomly (Fig 1). The study population comprised children attending the second and third grades of elementary schools in deprived neighborhoods of 2 neighboring cities, namely, Dortmund and Essen, Germany … Schools in Dortmund represented the intervention group (IG) and schools in Essen the control group (CG). For each city, 20 schools were selected randomly (Fig 1).

Improper randomisation “Randomization was alternated every 10 patients, such that the first 10 patients were assigned to early atropine and the next 10 to the regular protocol, etc. To avoid possible bias, the last 10 were also assigned to early atropine.” [Lessick et al, Eur J Echocardiography 2000]

Inadequate blinding “… the patients were randomly assigned to prophylaxis or nonprophylaxis groups according to hospital number. Both the physician and the nurse technician were blind as to which assignment the patient received. Patients in group A received nitrofurantoin 50 mg four times and phenazopyridine hydrochloride 200 mg three times for 1 day. Patients in group B received phenazopyridine hydrochloride only. The code was broken at the completion of the study.”

Sources of bias • Pre-randomisation • Post-randomisation

Sample size • The aim should be to have a large enough sample size to have a high probability (power) of detecting a clinically worthwhile treatment effect if it exists • Larger trials have greater power to detect beneficial (or detrimental) effects • Many clinical trials are far too small • Median 40 patients per arm in 616 trials on PubMed in 2006 • Most trials have very low power to detect clinically meaningful treatment effects

ANALYSIS

Analysis does not match design

Analysis does not match designswitched from crossover to parallel

Analysis does not match design Higgins et al. J Am CollCardiol 2003

Analysis does not match design Primary end point: Progression of heart failure, defined as a composite of all-cause mortality, hospitalization for worsening HF, or ventricular tachyarrhythmias requiring device therapy

Analysis does not match design TARGET trial, Lancet 2004 In fact this was two separate 1:1 comparisons: Lumiracoxib vs naproxen Lumiracoxib vs ibuprofen

Stender et al, Lancet 2000

What is an intention to treat analysis? • Which patients are included in an intention to treat analysis? • Should be all randomised patients, retained in the original groups as randomised • Most RCTs with ‘intention to treat’ analyses have some missing data on the primary outcome variable • 75% of 119 RCTs - Hollis & Campbell, BMJ 1999 • 58% of 100 RCTs - Kruse et al, J Fam Pract 2002 • 77% of 249 RCTs – Gravel et al, Clin Trials 2007 • Really ‘available case analysis’

Improper comparison Labrie et al, Prostate 2004;59:311-318.

Post hoc data and analysis decisions • Huge scope for post hoc selection from multiple analyses • omitting data • adjustment • categorisation/cutpoints • log transformation • etc “The “art” part of science is focussed in large part on dealing with these matters in a way that is most likely to preserve fundamental truths, but the way is open for deliberate skewing of results to reach a predetermined conclusion.” Bailar JC. How to distort the scientific record without actually lying: truth, and the arts of science. Eur J Oncol 2006;11:217-24.

INTERPRETATION

Spin in a representative sample of 72 trials [Boutron et al, JAMA 2010] • Title • 18% Title • Abstract • 38% Results section of abstract • 58% Conclusions section of abstract • Maintext • 29% Results • 41% Discussion • 50% Conclusions • >40% had spin in at least 2 sections of main text

“Spin” • Review of breast cancer trials “… spin was used frequently to influence, positively, the interpretation of negative trials, by emphasizing the apparent benefit of a secondary end point. We found bias in reporting efficacy and toxicity in 32.9% and 67.1% of trials, respectively, with spin and bias used to suggest efficacy in 59% of the trials that had no significant difference in their primary endpoint.” [Vera-Badillo et al, Ann Oncol2013]

SELECTIVE PUBLICATION

Consistent evidence of study publication bias • Studies with significant results are more likely to be published than those with non-significant results • Statistically significant results are about 20% more likely to be published [Song et al, HTA 2000] • Studies reported at conferences are less likely to be fully published if not significant [Scherer et al, CDMR 2004] • Even when published, nonsignificant studies take longer to reach publication that those with significant findings [Hopewell et al, CDMR 2001]

Of 635 clinical trials completed by Dec 2008, 294(46%) were published in a peer reviewed biomedical journal, indexed by Medline, within 30 months of trial completion. Country Size Phase Funder Ross JS, MulveyGK, Hines EM, Nissen SE, Krumholz HM. Trial publication after registration in ClinicalTrials.gov: a cross-sectional analysis. PLoS Med 2009.

Consequences of failure to publish • Non-publication of research findings always leads to a reduced evidence-base • Main concern is that inadequate publication distorts the evidence-base – if choices are driven by results • Even if there is no bias the evidence-base is diminished and thus there is extra (and avoidable) imprecision and clinical uncertainty

Clustering of P values just below 0.05 P=0.05 P=0.01 Pocock et al, BMJ 2004

PLoS One 2013 “There is strong evidence of an association between significant results and publication; studies that report positive or significant results are more likely to be published and outcomes that are statistically significant have higher odds of being fully reported. Publications have been found to be inconsistent with their protocols.”

REPORTING

Evidence of poor reporting • Poor reporting: key information is missing or ambiguous • There is considerable evidence that many published articles do not contain the necessary information • We cannot tell exactly how the research was done

Poor description of non-pharmacological interventions in RCTs Hoffmann et al, BMJ2013 • Only 53/137 (39%) interventions were adequately described • increased to 59% by using responses from contacted authors

Perry et al, J ExpCriminol2010

Reporting of harms in randomized controlled trials of psychological interventions for mental and behavioral disorders: A review of current practice [Jonsson et al, CCT 2014] • 104 (79%) reports did not indicate that adverse events, side effects, or deterioration had been monitored

“None of the psychological intervention trials mentioned the occurrence of an adverse event in their final report. Trials of drug treatments were more likely to mention adverse events in their protocols compared with those using psychological treatments. When adverse events were mentioned, the protocols of psychological interventions relied heavily on severe adverse events guidelines from the National Research Ethics Service (NRES), which were developed for drug rather than psychological interventions and so may not be appropriate for the latter.”

CONSORT – reporting RCTs • Structured advice, checklist and flow diagram • Based on evidence, consensus of relevant stakeholders • Explanation and elaboration paper

Liu et al.,TransplantInt 2013

Review of 87 RCTs • Primary Outcome specification never matched precisely! • 21% failed to register or publish primary outcomes [PO] • discrepancies in 79% of the registry–publication pairs • Percentages did not differ significantly between industry and non-industry-sponsored trials • 30% of trials contained unambiguous PO discrepancies • e.g., omitting a registered PO from the publication, ‘‘demoting’’ a registered PO to a published secondary outcome • 48% non-industry-sponsored, 21% industry-sponsored (P=0.01)

State of play • Not all trials are published • Methodological errors are common • Research reports are seriously inadequate • Improvement over time is very slow • Reporting guidelines exist • It’s much easier to continue to document the problems than to change behaviour

Can we do better?

Some (partial) solutions to improving published randomised trials • Prevention of outcome reporting bias requires changing views about P<0.05 • All primary and secondary outcomes should be specified a priori and then fully reported • Monitoring/regulation • Ethics committees, data monitoring committees, funders • Trial registration • Journal restrictions • Publication of protocols • Availability of raw data (data sharing or publication)

Publication of protocols • Publication is strongly desirable • Copy of protocol is required by some journals • Some publish this as a Web Appendix • Practice is likely to increase

Randomisation: necessary but not sufficient