Evaluating Evidence in Medicine: What Can Go Wrong?

Evaluating Evidence in Medicine:What Can Go Wrong? Skeptic’s Toolbox 2012 Harriet Hall, MD The SkepDoc

Overview • What constitutes evidence in medicine? • What can go wrong in clinical studies? • Why even “evidence-based medicine” is flawed.

Is This Evidence?

Is This Evidence? MRI Study of Salmon • A salmon was shown photographs of humans in social situations. It was asked to think about what emotion the individual in the photo must have been experiencing. • The salmon couldn’t talk, but: • On the fMRI scan, areas in the salmon’s brain lit up, indicating increased blood flow, indicating that the salmon was thinking.

Is This Evidence That: • Salmon can see pictures? • Salmon know what human emotions are? • Salmon can identify emotions from pictures? • Salmon can respond to requests of what to think about?

What’s Wrong With This Picture? The Salmon Was Dead and Gutted!

Statistical Artifact • Each fMRI scan measures 50,000 voxels (3-D pixels) and each study involves thousands of scans. • If you mine the data, you can find practically anything you want. • Brain scans are the new phrenology • A blunt instrument • Scans are pooled to establish normal average • Often don’t mean what people think they mean

Amen poster

Would You Accept This Evidence? • I tried it. I got better. It worked for me. • Lots of people tried it and got better. • We gave it to a lot of people in a study and they improved. • We compared it to a no-treatment group or a usual-treatment group and it worked better. • We compared it to a placebo and it worked better. • The weight of evidence from a large body of studies shows that it works better than placebo

Is This Evidence? • I tried it. I got better. It worked for me. • Anecdote. Plural of anecdote is not data. • Post hoc ergo propter hoc fallacy • Does Echinacea prevent colds? • Removing glucosamine didn’t remove effects • We gave it to a lot of people in a study and they improved. • Uncontrolled study. Maybe they would have improved without treatment. • Cold got better in a week with treatment, lasted 7 days without treatment.

Is This Evidence? • Our study compared it to a no-treatment group or a usual-treatment group and it worked better. • Hawthorne effect: Doing something is better than doing nothing. • Our study compared it to a placebo and it worked better. • Was the study blinded? • Double blind, placebo-controlled randomized study is the Gold Standard. • BUT: What if we do a Gold Standard study on something totally implausible and it works better than a placebo?

There’s A Lot of Evidence:A Fire Hose of Information • 21 million papers are listed in PubMed: • 700,000 more each year • One a minute • PubMed lists 23,000 journals, and there are many more not listed • You can find a study to support any belief.

Never Believe One Study • Early positive studies often superseded by better, negative studies (HRT). • Ioannidis: Most published research findings are wrong.

Ioannidis • The smaller the study, the less likely the research findings are to be true • The smaller the effect, the less likely the research findings are to be true. • The greater the financial and other interests, the less likely the research findings are to be true • The hotter a scientific field, (with more research teams involved), the less likely the research findings are to be true.

Evaluating a Study • Ask a lot of questions • I’ll cover some

Skeptics Question Everything

What Kind of Study? • Case report • Case series • Case-control • Cohort • Epidemiologic • RCT • Placebo-controlled • Blinded (single or double)

Who’s Paying? • Studies sponsored by pharmaceutical companies more likely to be positive • Subtle bias • Unpublished negative information • Studies by researchers with financial conflicts of interest (consulting fees, honoraria from pharmaceutical company) more likely to be positive 91% vs. 67%

Big Pharma Distortion • Turner looked at all antidepressant studies registered with FDA • Published studies: 94% positive • Unpublished studies: 51% positive Evidence that antidepressants don’t work? No.

Effect Size

Turner vs. Kirsch • Kirsch said < .5 means ineffective • Effect size from journals: .41 • True effect size: .31 • Therefore antidepressants are not effective • Turner said glass not empty, 1/3 full • Patients’ responses not all-or-none; partial responses can be meaningful • Antidepressants DO work, just not as well as originally thought. • Kirsch supports psychotherapy, but its effect size is much less than .5.

Scam Product Testing • In-house: by non-academics on company’s payroll • Worthless. Tweaked to get desired results • Independent testing companies: guns for hire • Minuscule effects touted as significant • Effects found, but not specific to product • Amino acids may improve muscle strength • Effects may not apply to average people (i.e. taping injuries)

Are the Researchers Biased? • Homeopathy studies done by homeopaths • Chiropractic studies done by chiropractors • Surgical studies done by surgeons • Studies published in specialty journals for a biased audience

Who Are the Subjects? • Self selection bias: who volunteers? • Believers? • Professional subjects? • Select group not typical of the general population. • Men only? No children? Limited age group? • Subjects with concurrent diseases not accepted • Subjects taking other medications not accepted.

Were Negative Studies Suppressed? • File drawer effect • Negative studies not submitted for publication. • What if 4/5 studies were negative but only the positive one published? • Publication bias • Journals don’t like to publish negative studies. • Journals don’t like to publish replications that debunk original results. (Bem, Wiseman)

Did Workers Mislead Author? • Technicians and subordinates know what the researcher hopes to find. • May try to please the boss, consciously or unconsciously • May circumvent blinding procedures • Can record 4.5 as 4 or 5. • Faking to make job easier (homeopathy prep)

Did Workers Mislead Author? • Benveniste homeopathy study • Counting basophil degranulation under the microscope is somewhat subjective • Only one technician got positive results

What Are the Odds? • 9 out of ten drugs in Phase I clinical trials fail. • 50% of drugs that reach Phase III trials fail. • A far higher percentage of promising drugs never make it to clinical trials; they fail in animal and in vitro trials.

Do the Data Justify the Conclusion? • Teaching exercise: • Read the data section first • Draw your own conclusions • Read the paper’s conclusions • Scratch your head

Do the Data Justify the Conclusion? Conclusion: low cholesterol kills children. The higher the cholesterol, the better for health.

Do the Data Justify the Conclusion? • Sample of opportunity: data not collected systematically • Too few points to show correlation • Correlation doesn’t prove causation • Other explanations: • Hygiene • Poverty • Disease • Starvation • Genetic factors • Less access to medical care • Better explanation: undernourished children have abnormally low cholesterol levels

Do the Data Justify the Conclusion? Conclusion: by the year 2038 100% of children will be autistic

What Aren’t They Telling Us? • Selection methods • Randomization methods • Identity of placebo • Whether people were fooled by placebo • Proper blinding procedures? • Other factors • Glassware not thoroughly washed? • Contaminants in lab? • Mouse XMRV virus contaminated cell cultures in CFS study • Did they really do what they said they did?

How Many Dropouts? - 10 total patients: 7 neg. 3 pos. = 30% pos. - 6 drop out because it’s not working - 30% success rate now looks like 75%

Where Was the Study Done? Percent of Acupuncture Trials with Positive Results • Canada, Australia, New Zealand 30% • US 53% • Scandinavia 55% • UK 60% • Rest of Europe 78% • Asia 98% • Brazil, Israel, Nigeria 100%

What was the sample size? • 1/3 of the chickens got better • 1/3 of the chickens stayed the same • What about the other third?

Were There Errors in Statistics? • Wrong statistical test used • Errors in calculation

What About Noncompliance? • Did all subjects take their pills? • Did they take them on time?

Noncompliance • HIV Prophylaxis study in Africa • 95% said they usually or always took meds on time • Pill count data: 88% • Tests showed adequate plasma levels of drug: 15-26%

Tooth Fairy Science • Are they trying to study something that doesn’t exist?

Emily Rosa and the Emperor's New Clothes

Inaccurate Measuring Methods? • Questionnaires rely on unreliable memories and patient honesty. • “30% less pain” • “I eat like a bird” • “Only one drink”

Using a Bogus Test? Measuring the Components of ASEA • Amixture of 16 chemically recombined products of salt and water with completely new chemical properties. • They used a fluorescent indicator as a probe for unspecified “highly reactive oxygen species”

How Many Endpoints Were There? • Multiple endpoints: some will show false correlations just by chance • Statistical corrections applied? • Inappropriate data mining? • The heart prayer study • 6 positive out of 26 factors studied • Inconsistent pattern

Were Goalposts Moved? • AIDS prayer study: endpoint death • Not enough subjects died: AIDS drugs kept them alive • They went back and looked at a lot of other factors and found some apparent successes (i.e., fewer doctor visits) but no change in objective tests like CD4 count. • Only 40 patients. Study wasn’t designed to test non-death outcomes.

Statistical Significance ≠Clinical Significance • Did the drug lower the BP by 1% or 30%? • Was the endpoint a lab value or a clinical benefit? • B vitamin supplements lower homocysteine but don’t lower risk of heart disease • PSA screening finds cancers; doesn’t improve survival • Are the results POEMS – Patient Oriented Evidence that Matters?

Was There Fraud? • Dipak Das, resveratrol researcher • Review board found him guilty of 145 counts of fabrication or falsification of data • 12 of his papers retracted so far

“I was blinded by work and my drive for achievement” • Hwang Woo-suk, stem cell researcher in South Korea, claimed to have cloned human embryonic stem cells • Fabricated crucial data • Embezzlement and bioethics law violations • Prison sentence (suspended) • 2 papers in Science retracted. • Fired from his job

Columbia Prayer Study • Prayer doubled success of in vitro fertilization • Seriously flawed study • Convoluted design with 3 levels of overlapping prayer groups • No controls for prayers outside study • Investigated for lack of informed consent • Authors • Lobo, lead author, only learned of study 6-12 months after it was completed. Denied any involvement other than editorial help. • Cha severed his relationship with Columbia, refused to comment • Wirth: • Paranormal researcher with no medical degree • Con man who went to federal prison for fraud and conspiracy • Bruce Flamm debunked it in Skeptical Inquirer • Retracted by journal, but only years later • Still being cited as a valid study

How Were the Data Reported? • NNT and NNH • Lipitor for primary prevention of heart attacks: • 19% Reduction • NNT 75-250, NNH 200. • Absolute risk vs. relative risk • Cellphones increase the risk of acoustic neuroma. Relative risk 200%. • Baseline risk is 1:100,000 • 200% of 1 is 2 • Absolute risk 1 more in 100,000, or 0.00001%

Evaluating Evidence in Medicine: What Can Go Wrong?