490 likes | 728 Views
Title - Critical Evaluation of Clinical Trial Data. Erick Turner, M.D. Oregon Health & Science University Dept of Psychiatry; Dept of Pharmacology Portland VA Medical Center Mood Disorders Center. Disclosure. No trade names, advertising, or product-group messages
E N D
Title - Critical Evaluation of Clinical Trial Data Erick Turner, M.D. Oregon Health & Science University Dept of Psychiatry; Dept of Pharmacology Portland VA Medical Center Mood Disorders Center
Disclosure • No trade names, advertising, or product-group messages • Recovering promotional speaker • Last “slip” was in fall of 2005
Objectives • Things to watch for in evaluating medical information • Heighten your level of skepticism and paranoia • May or may not apply to today’s talks • More about clinical trials in general, esp. industry-sponsored
Studies Presented Today • CATIE • STAR*D • STEP-BD • BOLDER • The A*C*R*O*N*Y*M Study
Effect of Acronym Name • Doubled the citation rate • Independent of study size, quality, outcome • Source • Poster: What's in a NAME? • Peer Review Congress 2005 (AMA)
Standard Clinical Trials vs. Large Simple Clinical Trials • Signal-to-noise • Small & clean N (standard clinical trials) • Big & dirty N • “Dirt” “comes out in the wash”
Efficacy vs. Effectiveness • Patients: “squeaky clean” vs. “real world” • Comorbidities • EtOH, other drugs • Depression + anxiety
“The clinical evidence” • Whose evidence? • Intellectual COI • “I was right! I’ve been vindicated!” • Attracting grant money - “the Midas touch” • Which evidence? • Available evidence-based medicine
Selective Publication • Nonsignificant studies tend not to get published • Some studies never see the light of day • Among studies that are published • Selective presentation of endpoints within those studies • “Outcome reporting bias”
Why the Need for Selective Publication? • Unimpressive effect sizes in psychiatry • Many NS antidepressant trials • 47/92 (51%) active tx arms NS • Khan 2003 Neuropsychopharm • Later-approved drugs and dosages
“The Emperor’s New Drugs” • 80% of drug effect duplicated by placebo • 2-point difference between drug and placebo • HAMD-17-item max = 50 points • 21-item max = 62 points Kirsch I. Prevention & Treatment, Volume 5, Article 23, posted July 15, 2002
Splice the Y-AxisDepakote and Lithium *p < 0.05 (Bowden et al, JAMA, 271:12, March 1994)
Show Change from Baseline (not Absolute) Scores (Keck et al, Am J. Psychiatry, 160:4, April 2003)
Same numbers Graph in PDR Absolute scores Change scores Non-Psychiatric Example
Don’t Show Variability in Data • Noise in data • random variability • Interindividual differences • Perhaps your patient isn’t “Mr. Mean” • Showing just means can be misleading • Liquid N2 • Prefer error bars (or even raw data points)
But how much/little overlap do you want the error bars to show?Have it Your Way
Overpower Your Study • Unnecessarily large N • Clinically insignificant result statistically significant
Candidate A vs. Candidate BEffect of the Number of Voters The split: Disclaimer: Assumes that popular vote matters
Limitation of P Values • P values confounded by sample size • Clinically insignificant difference can be statistically very significant • P values tell about precision, • how likely the difference observed could have occurred by chance • Clinicians and pts also interested in magnitude of effect • Effect size • Confidence intervals • Reading: Jacob Cohen: The Earth is Round, P<.05
Underpowered Studies • Could have clinically significant difference • N too small to reach statistical significance
Michael Jordan free-throw shootout • MJ vs. ET -- 7 free throws each • MJ makes 7, I make 3 • P = .07 (NS, Fisher Exact test) • Conclusions • There was “no difference” between us. • I’m as good as Michael Jordan! Vickers A, Medscape 2006. Michael Jordan Won't Accept the Null Hypothesis: Notes on Interpreting High P Values
Lack of a significant difference does not mean equality! • If it’s not black, it’s not necessarily white, either… could be gray • Study could be underpowered • Beware claims of equivalence • But what if Ns are adequate?
Claims of Equivalence • Example: Two drugs performed “the same”. • Were both medications really equally effective? • Or were they equally ineffective?
St. John’s Wort vs. Sertraline Mean decrease = 47% for Zoloft (vs. 38%) p = .06 JAMA Apr 10, 2002 -- Vol 287, No. 14, 1807-1814
. . . and with Placebo in the Picture Comparison p Hyp vs. Pbo .59 Ser vs. Pbo .18 Ser vs. Hyp .06
St. John’s Wort vs. Sertraline Analysis of other primary efficacy endpoint Chi-squared test, Yates corrected
Comparative Claims • FDA leery • …of equivalence claims • …of superiority claims • FDA does not allow them in labeling (package insert, advertising) • Efficacy advantage • Underdose competing drug • Safety advantage • Dose competing drug too high and/or too fast
Transitivity Am J Psychiatry 163:185-194, February 2006
Consider the Source • RESULTS: Of the 42 reports identified by the authors, 33 weresponsored by a pharmaceutical company. In 90.0% of the studies,the reported overall outcome was in favor of the sponsor’sdrug. This pattern resulted incontradictory conclusions acrossstudies when the findings of studies of the same drugs but withdifferent sponsors were compared.
Beware the Comparison to Nothing! • Open-label study - pts know what they are getting • Voice alteration in VNS trials • Often single-arm w/ no placebo control • Anyone ever seen an open-label study in which pts did not get better compared to baseline? • (How do they get published?)
Single-Blind Studies • A step above open-label in rigor • Investigators know what tx the study pt is getting • Examples: • Acupuncture studies • Many device studies (e.g. rTMS)
Observer-based MADRS CGI CGI-I (improvement) CGI-S (severity) HAMD in all its flavors 17-item 21-item 28-item 33-item Self-report BDI (Beck) QIDS-SR (STAR*D) Quality of life scales Use Lots of ScalesDon’t Put All Your Eggs in One Basket
Pros and Cons of Many Scales • The upside of multiple endpoints: • Internal replication • Robustness (vs. fragile finding) • The downside • Increased probability of chance finding • Multiplicity, aka multiple comparisons
Put Enough Monkeys at Enough Typewriters . . . …and sooner or later you’ll have the complete works of William Shakespeare
Multiple Subscales • HAMD-33 item, you also get . . . • 28-item • 21-item • 17- • 6- (“core items”) • Anxiety subscale of the HAMD • Depression subscale of the PANSS • But was it in the original protocol?
What Can You Do With All These Scales? • Continuous measure • Use each score as-is (absolute score) • Change from baseline • Transform into categorical measure • Cutoffs patients either above or below • Remitters • Responders
Responders • Just “responders” • >= 50% decrease from baseline • Ex. Baseline score 40 -> endpoint score = 20 • < 50% ==> “nonresponder” • Baseline = 40, endpoint score = 21 • Gradations of responders • Partial responders (25-50% decrease from baseline) • Full responders (>50% decrease)
Remitters • “Remission” usually = absolute score (HAMD < 8) • STAR*D defines remission as 75% decrease from baseline • Advantage - set threshold deemed clinically significant • But % remitters may still differ between groups to extent that is just statistically significant (remember the “election” slide)
Handling Dropouts • LOCF • last observation carried forward • OC • Observed cases • aka. completers • MMRM • Mixed model repeated measures
HARKing • Hypothesizing • After the • Results are • Known • A priori vs. post hoc
How the FDA Guards Against This • FDA gets protocol before study begins • Sponsors can’t “censor” studies that don’t go well • Drugs approved based on all studies
It’s the Protocol, Stupid! • “If the Devil is in the Details, Salvation is in the Protocol” • Talk by Paul Andreason, FDA • Primary endpoints • a priori hypothesis • Where you’re placing your bet • Secondary endpoints • Exploratory • If you make it, fine, but don’t make a big deal about it. • Repeat study, designate it as primary, see if it replicates
Off-Label Use • Drug used for something FDA has not approved it for • (FDA does not regulate prescribing) • Often appropriate to prescribe off-label • No approved drugs for condition (but why not?) • You’ve exhausted approved drugs • Ask why isn’t drug approved for this condition? • Could they have submitted and gotten it rejected? • If they haven’t submitted an application, why not?
How do you Know Whether a Drug is FDA-Approved for the Condition You’re Treating? • Beware of sources that talk about “uses” • AHFS Drug Information (“The Red Book”) • Fluoxetine uses: obesity, bipolar d/o, myoclonus, cataplexy, EtOH dependence • Gabapentin has never been approved for any psych indication • Just look in the package insert or PDR • Indications & Usage section • More details in Clinical Trials section