Title - Critical Evaluation of Clinical Trial Data

Title - Critical Evaluation of Clinical Trial Data Erick Turner, M.D. Oregon Health & Science University Dept of Psychiatry; Dept of Pharmacology Portland VA Medical Center Mood Disorders Center

Disclosure • No trade names, advertising, or product-group messages • Recovering promotional speaker • Last “slip” was in fall of 2005

Objectives • Things to watch for in evaluating medical information • Heighten your level of skepticism and paranoia • May or may not apply to today’s talks • More about clinical trials in general, esp. industry-sponsored

Studies Presented Today • CATIE • STAR*D • STEP-BD • BOLDER • The A*C*R*O*N*Y*M Study

Effect of Acronym Name • Doubled the citation rate • Independent of study size, quality, outcome • Source • Poster: What's in a NAME? • Peer Review Congress 2005 (AMA)

Standard Clinical Trials vs. Large Simple Clinical Trials • Signal-to-noise • Small & clean N (standard clinical trials) • Big & dirty N • “Dirt” “comes out in the wash”

Efficacy vs. Effectiveness • Patients: “squeaky clean” vs. “real world” • Comorbidities • EtOH, other drugs • Depression + anxiety

“The clinical evidence” • Whose evidence? • Intellectual COI • “I was right! I’ve been vindicated!” • Attracting grant money - “the Midas touch” • Which evidence? • Available evidence-based medicine

Selective Publication • Nonsignificant studies tend not to get published • Some studies never see the light of day • Among studies that are published • Selective presentation of endpoints within those studies • “Outcome reporting bias”

Why the Need for Selective Publication? • Unimpressive effect sizes in psychiatry • Many NS antidepressant trials • 47/92 (51%) active tx arms NS • Khan 2003 Neuropsychopharm • Later-approved drugs and dosages

“The Emperor’s New Drugs” • 80% of drug effect duplicated by placebo • 2-point difference between drug and placebo • HAMD-17-item max = 50 points • 21-item max = 62 points Kirsch I. Prevention & Treatment, Volume 5, Article 23, posted July 15, 2002

There Must Be 50 Ways . . .…to put lipstick on a pig

Splice the Y-AxisDepakote and Lithium *p < 0.05 (Bowden et al, JAMA, 271:12, March 1994)

Show Change from Baseline (not Absolute) Scores (Keck et al, Am J. Psychiatry, 160:4, April 2003)

Same numbers Graph in PDR Absolute scores Change scores Non-Psychiatric Example

Don’t Show Variability in Data • Noise in data • random variability • Interindividual differences • Perhaps your patient isn’t “Mr. Mean” • Showing just means can be misleading • Liquid N2 • Prefer error bars (or even raw data points)

But how much/little overlap do you want the error bars to show?Have it Your Way

Overpower Your Study • Unnecessarily large N • Clinically insignificant result  statistically significant

Candidate A vs. Candidate BEffect of the Number of Voters The split: Disclaimer: Assumes that popular vote matters

Limitation of P Values • P values confounded by sample size • Clinically insignificant difference can be statistically very significant • P values tell about precision, • how likely the difference observed could have occurred by chance • Clinicians and pts also interested in magnitude of effect • Effect size • Confidence intervals • Reading: Jacob Cohen: The Earth is Round, P<.05

Underpowered Studies • Could have clinically significant difference • N too small to reach statistical significance

Michael Jordan free-throw shootout • MJ vs. ET -- 7 free throws each • MJ makes 7, I make 3 • P = .07 (NS, Fisher Exact test) • Conclusions • There was “no difference” between us. • I’m as good as Michael Jordan! Vickers A, Medscape 2006. Michael Jordan Won't Accept the Null Hypothesis: Notes on Interpreting High P Values

Lack of a significant difference does not mean equality! • If it’s not black, it’s not necessarily white, either… could be gray • Study could be underpowered • Beware claims of equivalence • But what if Ns are adequate?

Claims of Equivalence • Example: Two drugs performed “the same”. • Were both medications really equally effective? • Or were they equally ineffective?

St. John’s Wort vs. Sertraline Mean decrease = 47% for Zoloft (vs. 38%) p = .06 JAMA Apr 10, 2002 -- Vol 287, No. 14, 1807-1814

. . . and with Placebo in the Picture Comparison p Hyp vs. Pbo .59 Ser vs. Pbo .18 Ser vs. Hyp .06

St. John’s Wort vs. Sertraline Analysis of other primary efficacy endpoint Chi-squared test, Yates corrected

. . . with Placebo in the Picture

Comparative Claims • FDA leery • …of equivalence claims • …of superiority claims • FDA does not allow them in labeling (package insert, advertising) • Efficacy advantage • Underdose competing drug • Safety advantage • Dose competing drug too high and/or too fast

Transitivity Am J Psychiatry 163:185-194, February 2006

Consider the Source • RESULTS: Of the 42 reports identified by the authors, 33 weresponsored by a pharmaceutical company. In 90.0% of the studies,the reported overall outcome was in favor of the sponsor’sdrug. This pattern resulted incontradictory conclusions acrossstudies when the findings of studies of the same drugs but withdifferent sponsors were compared.

Beware the Comparison to Nothing! • Open-label study - pts know what they are getting • Voice alteration in VNS trials • Often single-arm w/ no placebo control • Anyone ever seen an open-label study in which pts did not get better compared to baseline? • (How do they get published?)

Single-Blind Studies • A step above open-label in rigor • Investigators know what tx the study pt is getting • Examples: • Acupuncture studies • Many device studies (e.g. rTMS)

The Problem with Single-Blind Studies:Clever Hans

Observer-based MADRS CGI CGI-I (improvement) CGI-S (severity) HAMD in all its flavors 17-item 21-item 28-item 33-item Self-report BDI (Beck) QIDS-SR (STAR*D) Quality of life scales Use Lots of ScalesDon’t Put All Your Eggs in One Basket

Pros and Cons of Many Scales • The upside of multiple endpoints: • Internal replication • Robustness (vs. fragile finding) • The downside • Increased probability of chance finding • Multiplicity, aka multiple comparisons

Put Enough Monkeys at Enough Typewriters . . . …and sooner or later you’ll have the complete works of William Shakespeare

Multiple Subscales • HAMD-33 item, you also get . . . • 28-item • 21-item • 17- • 6- (“core items”) • Anxiety subscale of the HAMD • Depression subscale of the PANSS • But was it in the original protocol?

What Can You Do With All These Scales? • Continuous measure • Use each score as-is (absolute score) • Change from baseline • Transform into categorical measure • Cutoffs  patients either above or below • Remitters • Responders

Responders • Just “responders” • >= 50% decrease from baseline • Ex. Baseline score 40 -> endpoint score = 20 • < 50% ==> “nonresponder” • Baseline = 40, endpoint score = 21 • Gradations of responders • Partial responders (25-50% decrease from baseline) • Full responders (>50% decrease)

Remitters • “Remission” usually = absolute score (HAMD < 8) • STAR*D defines remission as 75% decrease from baseline • Advantage - set threshold deemed clinically significant • But % remitters may still differ between groups to extent that is just statistically significant (remember the “election” slide)

Handling Dropouts • LOCF • last observation carried forward • OC • Observed cases • aka. completers • MMRM • Mixed model repeated measures

HARKing • Hypothesizing • After the • Results are • Known • A priori vs. post hoc

How the FDA Guards Against This • FDA gets protocol before study begins • Sponsors can’t “censor” studies that don’t go well • Drugs approved based on all studies

It’s the Protocol, Stupid! • “If the Devil is in the Details, Salvation is in the Protocol” • Talk by Paul Andreason, FDA • Primary endpoints • a priori hypothesis • Where you’re placing your bet • Secondary endpoints • Exploratory • If you make it, fine, but don’t make a big deal about it. • Repeat study, designate it as primary, see if it replicates

Off-Label Use • Drug used for something FDA has not approved it for • (FDA does not regulate prescribing) • Often appropriate to prescribe off-label • No approved drugs for condition (but why not?) • You’ve exhausted approved drugs • Ask why isn’t drug approved for this condition? • Could they have submitted and gotten it rejected? • If they haven’t submitted an application, why not?

How do you Know Whether a Drug is FDA-Approved for the Condition You’re Treating? • Beware of sources that talk about “uses” • AHFS Drug Information (“The Red Book”) • Fluoxetine uses: obesity, bipolar d/o, myoclonus, cataplexy, EtOH dependence • Gabapentin has never been approved for any psych indication • Just look in the package insert or PDR • Indications & Usage section • More details in Clinical Trials section

The End

Title - Critical Evaluation of Clinical Trial Data

Title - Critical Evaluation of Clinical Trial Data

Presentation Transcript

Clinical Trial Efficacy

Clinical Trial Budgets

Clinical Trial Agreements

CLINICAL TRIAL

Clinical Trial Commentary

Clinical Trial Commentary

Clinical Trial Commentary

CRItical evaluation

Clinical Trial

Clinical Trial Software

Clinical Trial Update

Critical evaluation

DATA FLOW FOR A CLINICAL TRIAL

Clinical Trial Commentary

Clinical Trial Overview

Economic evaluation alongside clinical trials: The GALA trial

Rescuing Clinical Trial Data For Economic Evaluation

Clinical Trial Regulations

Clinical Trial Systems

Clinical Trial Billing

CLINICAL TRIAL REGISTRIES