680 likes | 793 Views
Problems with Science. “The first principle is that you must not fool yourself.” – Richard Feynman. The final. Final Exam. 17 December 2013 (Tuesday) 18:30-20:30 In the Gymnasium 20 Questions All short answer 5 marks each Worth 20% of the course grade. Problems with science.
E N D
“The first principle is that you must not fool yourself.” – Richard Feynman
Final Exam • 17 December 2013 (Tuesday) • 18:30-20:30 • In the Gymnasium • 20 Questions • All short answer • 5 marks each • Worth 20% of the course grade
Replication In science, we require that our results be reproducible. In a scientific article about an experiment, scientists are forced to describe every detail of what they did, so that someone else can do the same experiment and get the same result.
Replication This is called replication. It is a basic principle of the scientific method. If a finding cannot be replicated, then we must reject it.
Does Replication Happen? • Biotech firm Amgen tried to reproduce 53 “landmark” cancer studies, but only reproduced 6. • Drug company Bayer tried to reproduce 67 different studies, but only reproduced 17. • In the decade 2000-2010, there were 80,000 people involved in studies that later could not be replicated.
Classic Article In a classic article titled “Why Most Published Research is False,” John Ioannidis found: “Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true.”
Video Time! http://www.economist.com/multimedia?bclid=1294626183001&bctid=2719450974001
False Positives Why do we get 5% false positives? In science we require p < .05. If the null hypothesis is true, we would obtain these results only 1 in 20 times, or 5%. So 5% of our results involve accidental correlations. Repeating the experiment is unlikely to result in the same accident.
False Negatives And why can there be more than 5% false negatives? There’s no cap on false negatives: you can’t punish people for not finding the truth: it’s difficult! But what this does mean is that most published findings are false.
Solving the Problem Remember that p = .05 is the maximum p-value that scientific journals will accept. There’s no reason you can’t have p = .01 (one percent false positives) or p = .001 (one in 1000 false positives). How do you do that? Just have more people in your experiment.
Finding the Hypothesis in the Data There’s a difficult-to-understand fallacy in science that goes by different names– “finding hypotheses in the data,” “the problem of multiple comparisons,” “hypothesis fishing” etc.
Random Correlations Everywhere Suppose I decide to test a new drug I made. I don’t have any idea what it does or doesn’t do. I’m just going to give the drug to the experimental group and a placebo to the control group, then see what happens.
Questionnaire • How much have you slept in the past two weeks? • How much sex did you have? • Have you had any headaches? How many? • Did you find yourself getting angry for no reason? • How easy or hard did you find it to concentrate? • Do you have more or fewer pimples? • What is your blood pressure? • What’s 14723 plus 9843?
Finding the Hypothesis in the Data By pure random chance, if I ask enough questions, there will be an accidental correlation between the experimental group and some answers that is not found in the control group. Look! My drug gives you increased mathematical ability!
Use New Data! It’s not unusual to find accidental correlations in data. So in science we require that hypotheses be tested by new data. After I decide that my drug gives people better math skills, I then need to do a new experiment. This is to avoid the problem of multiple comparisons.
Why New Data Is Important It would be really unlikely if • I propose a correlation • I test it against some new data • The new data confirm the correlation • All of that was just an accident Compare this to the fact that it is really likely to find random correlations in the data.
fMRI fMRI = functional Magnetic Resonance Imaging. It’s a way of measuring change in blood flow in the brain, that allows us to get an understanding of change in brain activity.
fMRI Neuroscience Common neuroscience involving fMRI might go something like this: I put a bunch of people in fMRI machines, and have them look at various pictures.
Information Processing When they look at pictures of happy things, like smiling babies, double rainbows, cute puppies, or whatever, I might notice that certain parts of their brains are active (and not active when I’m not showing them these pictures). I might then conclude that these parts of the brain (the active ones) are responsible for processing information about happy things.
Multiple Comparisons But this methodology is ripe for the problem of multiple comparisons. There are lots of areas of the brain and lots of different aspects of any picture. If I look at all the areas of the brain and all the aspects of the pictures, I will find many correlations totally by random chance.
Dead Fish Craig Bennett is a neuroscience graduate student. He wanted to test out his fMRI machine, so he bought a whole dead salmon. He put the dead salmon in the machine and showed it “a series of photographs depicting human individuals in social situations.”
Experimental Design The salmon “was asked to determine what emotion the individual in the photo must have been experiencing.” Then Bennett looked to see whether there were correlations between changes in the blood flow in the salmon’s brain, and the pictures.
Correlations! Unsurprisingly, there were. 16 out of 8,064 voxels (volumetric pixels) were correlated with picture-viewing. The important thing is that lots of neuroscientists use these same methods for humans. The risk of error is great.
Solving the Problem This problem is easily solved: don’t find your hypotheses in your data. Well… it’s not that easy. You have to convince neuroscientists to behave!
Cheating in Science There are lots of ways to cheat in science. If you want your study to show that antidepressants do better than placebos, you can not double blind your studies, or use improper randomization techniques (this is obvious to real scientists, though).
You can also: • Only correct the baseline when it suits you. • Ignore dropouts. • Remove outliers when it suits you. • Choose a statistical test that gets the best results. • Publish only positive findings.
The Baseline Often, studies don’t have the power we would ideally desire. Remember that for a 95% confidence interval of 6%, we estimated that we’d need 1,000 subjects in our study. But if you’re studying a new drug, how do you find 1,000 people who need it in your area who are willing to sign up for your trial?
The Baseline Scientists often test much smaller groups, and then aggregate (put together) all the data later. This is called metaanalysis, and we’ll be discussing it later. When you have a small group of people– for example 20 or 30, there is a high probability that by random chance either the control group or the experimental group will be doing better.
The Baseline This is called “the baseline.” If you’re testing a pain medication, for example, the control group might– merely as a matter of chance– have a higher degree of average pain than the experimental group. They have a higher “baseline” degree of pain.
Controlling for the Baseline You can “control for the baseline” by testing how much people’s pain improved over the course of the trial, instead of just testing how much pain they’re in at the end of the trial.
Not Controlling The average pain score in the control group was 65 when the experiment started, and 52 for the experimental group. Nobody improved, so it was also 65 and 52 at the end. But if you report just the end scores, it looks like your treatment worked: the experimental group had 12 less average pain points!
Controlling for the Baseline It’s best to control for the baseline, but it’s OK if you don’t. What’s bad is when you control for the baseline when the control group is doing better, but don’t control for it when the experimental group is doing better. That’s cheating
Ignoring Dropouts Sometimes a treatment won’t work, or will cause harmful side-effects. The people experiencing the worst of these side-effects might drop out of the trial. If you collect data only on people who finished the trial, it will seem like your treatment has fewer side-effects than it actually does.
Outliers An outlier is a data point that is far away from all of your other data points– it doesn’t fit a pattern that is clearly there. For example, in a trial for a pain medication, you might have some people get a little better, some people get a little worse, and one person who dies. Dying is an outlier, in this situation.
Controlling for Outliers Outliers are often due to just random chance. Through no fault of your treatment, sometimes people die. It can’t be helped. It’s accepted practice to control for outliers (which have specific definitions in statistics) by removing them from your data. You can also choose to leave all your data intact.
Controlling for Outliers Nothing is wrong with removing outliers– except when you do it only when it suits you. If you choose to keep negative outliers in the control group and keep positive outliers in the experimental group, but choose to eliminate positive outliers in the control group or negative outliers in the experimental group, you’re cheating!
Publication Bias Suppose I conduct a rigorous, scientific test of the claim that reading causes foot cancer. Ishow (high statistical significance, large effect size) that it is true! That’s big news, and not only will I get published in the best science journals, like Science and Nature, I’ll probably get in the newspapers too.
Publication Bias Instead, suppose I go out and conduct a rigorous, double-blind placebo-controlled randomized trial for the claim that reading does NOT cause foot cancer. I use a large sample of a representative set of the population, and discover, with a high degree of statistical significance, that I’m right.
Publication Bias Well who cares? Not Science or Nature! We all knew that reading didn’t cause foot cancer. That’s silly. Negative results are inherently boring and uninteresting. Positive results are exciting and informative.