400 likes | 539 Views
Why so many Statistical Findings are False Richard E. Neapolitan RE-Neapolitan@neiu.edu. The Journal of Personality and Social Psychology recently published: “Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect”
E N D
Why so many Statistical Findings are False Richard E. Neapolitan RE-Neapolitan@neiu.edu
The Journal of Personality and Social Psychology recently published: “Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect” The term psi denotes anomalous processes of information or energy transfer that are currently unexplained in terms of known physical or biological mechanisms.
Subjects were asked to identify which door had a picture behind it. • The hit rate on erotic pictures was significantly greater than the 50% hit rate expected by chance: 53.1%. p = 0.01. • the hit rate on the nonerotic pictures did not differ significantly from chance: 49.8%. p = 0.56.
Dogmatic Cookbook Statistics A RESULT IS SIGNIFICANT IF AND ONLY IF ITS p-value IS 0.05 OR BETTER.
0.0499 0.0501 significantinsignificant
Pragmatic Economics isProminent. • A psychologist (Dan Kahneman) won the economics Nobel prize in 2004; he showed humans do not reason normatively. • The efficient market hypothesis is no longer the tenet of economics. • The Myth of the Rational Market: A History of Risk, Reward, and Delusion on Wall Street by Justin Fox, 2009.
Pragmatic Philosophy is inVogue. The pragmatism of John Dewey, Richard Rorty, etc. seems more current than logical positivism.
So why is cookbook statistics stuck in this paradigm that we can get an absolute objective answer from data? • How did such statistics become popular in the first place? • Let’s back up and look at the two approaches to statistics.
Predominant approaches to statistics Bayesian approach • Karl Pearson (1857 - 1936) • Frank Ramsey (1903 - 1930) • W.E. Johnson (1858 - 1931) • Bruno de Finetti (1906 - 1985) • Jimmy Savage (1917 - 1971) • Dennis Lindley (1923 - ) • Harold Jeffreys (1891 - 1989) • I.J. Good (1916 - 2009) Frequentist approach • Richard von Mises (1883 - 1953) • Egon Pearson (1895 - 1980) • Ronald A. Fisher (1890 -1962) • Jerzy Neyman (1894 - 1981)
Frequentist Approach Take a coin from your pocket and toss it 10 times. Suppose it lands heads every time. Should we reject the hypothesis that it is fair?
Probability of getting 10 heads if the coin is fair is 0.0009. Called the significance of the result.
Perform a Randomized Control Experiment to test a new drug. Using a chi-square test the probability of getting this result or something more extreme if there is no difference in the two groups is 0.006.
Bayesian Approach • Bayesian statistics involves updating prior belief based on Bayes’ Theorem. • Suppose Joe is about to get married, and he takes a routine blood test for HIV. • The test comes back positive.
significance power prior probability
Suppose Mary suspects she is pregnant. • She takes a pregnancy test and it comes back positive.
significance power prior probability
In the case of the psi experiment, the prior probability of psi phenomena was very small. • In the case of tossing the coin 10 times the prior probability the coin was fair was very large. • In the case of the randomized control experiment concerning blood pressure medicine, the prior probability the medicine would work was about 0.5. • So only in this last case can we ignore the prior.
How did we end up with a statistical doctrine that insists we cannot use prior probabilities?
In the early 20th century forefathers of current statistical methodology were largely Bayesians (e.g. Karl Pearson and R.A. Fisher). • The Bayesian approach “depends upon an arbitrary assumption, so the whole method has been widely discredited.” – R.A. Fisher, 1921 • “Inverse probability, which like an impenetrable jungle arrests progress towards precision of statistical concepts.” – R.A. Fisher, 1922
“it is convenient to draw the line at about the level at which we can say: Either there is something in the treatment, or a coincidence has occurred such as does not occur more than once in twenty trials.” – R.A. Fisher, 1926
Cookbook statisticians sometimes give the impression to their students that cookbooks are enough for practical purposes. Anyone who has been concerned with complex data analysis knows that they are wrong, thatsubjective judgment of probabilities cannot usually be avoided, even if this judgment can later be used for constructing apparently non-Bayesian procedures. – I.J. Good, 1976
Suppose now we toss 1,000,000 coins and one of them lands heads 20 times. • H0 is the hypothesis that all the coins are fair. • P(at least one lands heads 20 times|H0) is 1 – (1 – (.5)20)1,000,000 = .61. • Called the Šidák correction. • The related Bonferroni correction is 1,000,000(. 5)20 = .95.
Thought Experiment • Suppose every one of the 1,000,000 households in Chicago agrees to simultaneously toss a coin 20 times. • I sit alone in my house and toss my coin 20 times. • My coin lands heads every time. • Do I dismiss the result as insignificant because I know 1,000,000 coins were tossed? • If I find out the other households tricked me and did not toss their coins, do I retract my conclusion and now decide it was a significant result?
The mistake is that the following two judgments are confused: • Whether an outcome is surprising. • Whether an outcome is noteworthy. • The number of experiments performed affects the 1st judgment. • The prior probability affects the 2nd judgment.
Corrections like Bonferroni are ordinarily applied when we simultaneously perform many experiments, and each has a low prior probability. • The corrections serve as a surrogate for these low priors. • However, they are very poor surrogates. • The noteworthiness we attach to a particular experimental result cannot depend on how many other experiments we performed along with the given experiment.
Forensic DNA Suppose a woman is murdered and her husband is the primary suspect.
Suppose further that a very accurate blood test is available for 100,000 individuals including the husband. (but there are many more individuals in the city). • The husband’s lawyer argues that there is no reason to ignore this possible evidence. • So the blood of all 100,000 individuals is compared to that found at the scene of the crime. P(match | source) = 0.999999 P(match| not source) = 0.00001 source is the event that the blood tested has the same source as the blood found at the scene of the crime. match is the event that the blood tested matches the blood found at the scene.
P(match | source) = 0.999999 P(match| not source) = 0.00001 • The blood samples of the 100,000 individuals are checked and it is found that only the husband’s blood matches. • Joe the statistician is called in to inform the court as to how much this incriminates the husband. • Joe remembers the problem with multiple hypotheses and the use of the Sidák correction. • So Joe formulates the following hypotheses: H0: the blood at the crime scene is not from any of them. HA: the blood at the crime scene is from one of them. P(any_match | H0) = 1- (1-.00001)100000 = .63212. • Not significant at all.
One might argue that we should not have checked the other individuals since only the husband was the suspect. • But the lawyer said we should not ignore this evidence. • More importantly, isn’t this argument implicitly assuming an un-quantified prior probability as to who was the murderer? • The point is that we can reach different conclusions depending on how we obtain the information about Joe’s blood.
The Bayesian Method Based on evidence so far for Joe we believe P(source) = 0.1.
You might ask how we could obtain this prior probability of 0.1? • Alternatively, we could determine the posterior probability needed to believe the husband is guilty beyond a reasonable doubt. • Let’s say that this probability is 0.999. • We solve
The National Research Council has several reports on forensic DNA giving guidelines on how to handle this problem.
Mathematicians used the Bayesian method to handle multiple hypotheses about 60 years ago. • They broke the German code.
Alan Turing, I.J. Good, Hugh Alexander and others wanted to break the code for the German systems “Enigma” and “Tunny”.
The Tunny Code Breaking Problem • They had 1271 possible streams of 0s and 1s. • 1270 of them were unbiased, and 1 was biased 55% to 45% in favor of 1. • Each stream was about 1000 to 10,000 bits long. • Their mission was to identify the one stream that was biased. • The U.S. ability to read dispatches from German Army headquarters lied in the balance.
If a message was short (say 1000 bits) they would not be able to detect the bias. • If it was long (say 10,000), then it would be easy. • But a lot of messages fell in between these two extremes. • Even given two messages of the same length, one may be more informative than the other. • Both machine and cryptanalyst time was expensive. • So they needed to prioritize which messages were worth working on. • They prioritized the messages by computing a Bayesian score, taking into account prior information such as the originating source and likely subject matter of the message.