1 / 48

Title - Critical Evaluation of Clinical Trial Data

Title - Critical Evaluation of Clinical Trial Data. Erick Turner, M.D. Oregon Health & Science University Dept of Psychiatry; Dept of Pharmacology Portland VA Medical Center Mood Disorders Center. Disclosure. No trade names, advertising, or product-group messages

burian
Download Presentation

Title - Critical Evaluation of Clinical Trial Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Title - Critical Evaluation of Clinical Trial Data Erick Turner, M.D. Oregon Health & Science University Dept of Psychiatry; Dept of Pharmacology Portland VA Medical Center Mood Disorders Center

  2. Disclosure • No trade names, advertising, or product-group messages • Recovering promotional speaker • Last “slip” was in fall of 2005

  3. Objectives • Things to watch for in evaluating medical information • Heighten your level of skepticism and paranoia • May or may not apply to today’s talks • More about clinical trials in general, esp. industry-sponsored

  4. Studies Presented Today • CATIE • STAR*D • STEP-BD • BOLDER • The A*C*R*O*N*Y*M Study

  5. Effect of Acronym Name • Doubled the citation rate • Independent of study size, quality, outcome • Source • Poster: What's in a NAME? • Peer Review Congress 2005 (AMA)

  6. Standard Clinical Trials vs. Large Simple Clinical Trials • Signal-to-noise • Small & clean N (standard clinical trials) • Big & dirty N • “Dirt” “comes out in the wash”

  7. Efficacy vs. Effectiveness • Patients: “squeaky clean” vs. “real world” • Comorbidities • EtOH, other drugs • Depression + anxiety

  8. “The clinical evidence” • Whose evidence? • Intellectual COI • “I was right! I’ve been vindicated!” • Attracting grant money - “the Midas touch” • Which evidence? • Available evidence-based medicine

  9. Selective Publication • Nonsignificant studies tend not to get published • Some studies never see the light of day • Among studies that are published • Selective presentation of endpoints within those studies • “Outcome reporting bias”

  10. Why the Need for Selective Publication? • Unimpressive effect sizes in psychiatry • Many NS antidepressant trials • 47/92 (51%) active tx arms NS • Khan 2003 Neuropsychopharm • Later-approved drugs and dosages

  11. “The Emperor’s New Drugs” • 80% of drug effect duplicated by placebo • 2-point difference between drug and placebo • HAMD-17-item max = 50 points • 21-item max = 62 points Kirsch I. Prevention & Treatment, Volume 5, Article 23, posted July 15, 2002

  12. There Must Be 50 Ways . . .…to put lipstick on a pig

  13. Splice the Y-AxisDepakote and Lithium *p < 0.05 (Bowden et al, JAMA, 271:12, March 1994)

  14. Show Change from Baseline (not Absolute) Scores (Keck et al, Am J. Psychiatry, 160:4, April 2003)

  15. Same numbers Graph in PDR Absolute scores Change scores Non-Psychiatric Example

  16. Don’t Show Variability in Data • Noise in data • random variability • Interindividual differences • Perhaps your patient isn’t “Mr. Mean” • Showing just means can be misleading • Liquid N2 • Prefer error bars (or even raw data points)

  17. But how much/little overlap do you want the error bars to show?Have it Your Way

  18. Overpower Your Study • Unnecessarily large N • Clinically insignificant result  statistically significant

  19. Candidate A vs. Candidate BEffect of the Number of Voters The split: Disclaimer: Assumes that popular vote matters

  20. Limitation of P Values • P values confounded by sample size • Clinically insignificant difference can be statistically very significant • P values tell about precision, • how likely the difference observed could have occurred by chance • Clinicians and pts also interested in magnitude of effect • Effect size • Confidence intervals • Reading: Jacob Cohen: The Earth is Round, P<.05

  21. Underpowered Studies • Could have clinically significant difference • N too small to reach statistical significance

  22. Michael Jordan free-throw shootout • MJ vs. ET -- 7 free throws each • MJ makes 7, I make 3 • P = .07 (NS, Fisher Exact test) • Conclusions • There was “no difference” between us. • I’m as good as Michael Jordan! Vickers A, Medscape 2006. Michael Jordan Won't Accept the Null Hypothesis: Notes on Interpreting High P Values

  23. Lack of a significant difference does not mean equality! • If it’s not black, it’s not necessarily white, either… could be gray • Study could be underpowered • Beware claims of equivalence • But what if Ns are adequate?

  24. Claims of Equivalence • Example: Two drugs performed “the same”. • Were both medications really equally effective? • Or were they equally ineffective?

  25. St. John’s Wort vs. Sertraline Mean decrease = 47% for Zoloft (vs. 38%) p = .06 JAMA Apr 10, 2002 -- Vol 287, No. 14, 1807-1814

  26. . . . and with Placebo in the Picture Comparison p Hyp vs. Pbo .59 Ser vs. Pbo .18 Ser vs. Hyp .06

  27. St. John’s Wort vs. Sertraline Analysis of other primary efficacy endpoint Chi-squared test, Yates corrected

  28. . . . with Placebo in the Picture

  29. Comparative Claims • FDA leery • …of equivalence claims • …of superiority claims • FDA does not allow them in labeling (package insert, advertising) • Efficacy advantage • Underdose competing drug • Safety advantage • Dose competing drug too high and/or too fast

  30. Transitivity Am J Psychiatry 163:185-194, February 2006

  31. Consider the Source • RESULTS: Of the 42 reports identified by the authors, 33 weresponsored by a pharmaceutical company. In 90.0% of the studies,the reported overall outcome was in favor of the sponsor’sdrug. This pattern resulted incontradictory conclusions acrossstudies when the findings of studies of the same drugs but withdifferent sponsors were compared.

  32. Beware the Comparison to Nothing! • Open-label study - pts know what they are getting • Voice alteration in VNS trials • Often single-arm w/ no placebo control • Anyone ever seen an open-label study in which pts did not get better compared to baseline? • (How do they get published?)

  33. Single-Blind Studies • A step above open-label in rigor • Investigators know what tx the study pt is getting • Examples: • Acupuncture studies • Many device studies (e.g. rTMS)

  34. The Problem with Single-Blind Studies:Clever Hans

  35. Observer-based MADRS CGI CGI-I (improvement) CGI-S (severity) HAMD in all its flavors 17-item 21-item 28-item 33-item Self-report BDI (Beck) QIDS-SR (STAR*D) Quality of life scales Use Lots of ScalesDon’t Put All Your Eggs in One Basket

  36. Pros and Cons of Many Scales • The upside of multiple endpoints: • Internal replication • Robustness (vs. fragile finding) • The downside • Increased probability of chance finding • Multiplicity, aka multiple comparisons

  37. Put Enough Monkeys at Enough Typewriters . . . …and sooner or later you’ll have the complete works of William Shakespeare

  38. Multiple Subscales • HAMD-33 item, you also get . . . • 28-item • 21-item • 17- • 6- (“core items”) • Anxiety subscale of the HAMD • Depression subscale of the PANSS • But was it in the original protocol?

  39. What Can You Do With All These Scales? • Continuous measure • Use each score as-is (absolute score) • Change from baseline • Transform into categorical measure • Cutoffs  patients either above or below • Remitters • Responders

  40. Responders • Just “responders” • >= 50% decrease from baseline • Ex. Baseline score 40 -> endpoint score = 20 • < 50% ==> “nonresponder” • Baseline = 40, endpoint score = 21 • Gradations of responders • Partial responders (25-50% decrease from baseline) • Full responders (>50% decrease)

  41. Remitters • “Remission” usually = absolute score (HAMD < 8) • STAR*D defines remission as 75% decrease from baseline • Advantage - set threshold deemed clinically significant • But % remitters may still differ between groups to extent that is just statistically significant (remember the “election” slide)

  42. Handling Dropouts • LOCF • last observation carried forward • OC • Observed cases • aka. completers • MMRM • Mixed model repeated measures

  43. HARKing • Hypothesizing • After the • Results are • Known • A priori vs. post hoc

  44. How the FDA Guards Against This • FDA gets protocol before study begins • Sponsors can’t “censor” studies that don’t go well • Drugs approved based on all studies

  45. It’s the Protocol, Stupid! • “If the Devil is in the Details, Salvation is in the Protocol” • Talk by Paul Andreason, FDA • Primary endpoints • a priori hypothesis • Where you’re placing your bet • Secondary endpoints • Exploratory • If you make it, fine, but don’t make a big deal about it. • Repeat study, designate it as primary, see if it replicates

  46. Off-Label Use • Drug used for something FDA has not approved it for • (FDA does not regulate prescribing) • Often appropriate to prescribe off-label • No approved drugs for condition (but why not?) • You’ve exhausted approved drugs • Ask why isn’t drug approved for this condition? • Could they have submitted and gotten it rejected? • If they haven’t submitted an application, why not?

  47. How do you Know Whether a Drug is FDA-Approved for the Condition You’re Treating? • Beware of sources that talk about “uses” • AHFS Drug Information (“The Red Book”) • Fluoxetine uses: obesity, bipolar d/o, myoclonus, cataplexy, EtOH dependence • Gabapentin has never been approved for any psych indication • Just look in the package insert or PDR • Indications & Usage section • More details in Clinical Trials section

  48. The End

More Related