Falsificationist Theory of Statistical Inference: Applying Critical Rationalism in Statistics

Max Albert, JLU Giessen Max Albert, Justus Liebig University GiessenA Falsificationist Theory of Statistical Inference 1. The Problem 1.1 Critical Rationalism (CR) 1.2 Extending CR to Statistics 2. Maximizing Empirical Content 3. Conclusions

Max Albert, JLU Giessen 1.1 Critical Rationalism (CR) • Ingredients • Law-like hypothesis H = x (CxFx) • Background knowledge B: corroborated assumptions, uncon- tentious observation statements, neither B  H nor B  H • Severe test: Requires a test situation a („trial“) where Ca holds, and where B does not already imply Fa. • Basic Methodological Rules • Make severe tests. • Falsification: Reject H if you observe Ca Fa. • Corroboration: Accept H if you observe Ca  Fa. • Note: These are decision rules, or rules concerning rational beliefs. The complete argument is deductive (with premises concerning rational beliefs), not inductive (Musgrave 1993).

Max Albert, JLU Giessen • Possible Errors: standard fare in CR • First-kind error (FKE): erroneous falsification (false observation statements, due to observational errors) • Second-kind error (SKE): erroneous corroboration (false ob- servations statements or factors other than Ca leading to Qa) • Safeguards: Improve basic methodological rules. • Accept observation statements only if they come from certain trusted sources (qualified personnel, standard procedures, incentives to be thorough and truthful). • Make several trials. Vary conditions that should be irrelevant. • Error Probabilities • Error probabilities of safeguarding procedures are unknown. • Error probabilites in statistical testing come from an additional source (“sampling error”). • Total error probabilities are therefore always unknown.

Max Albert, JLU Giessen Blocking Ad-hoc Explanations Consider H = x (Cx Fx) and relevant data Ca  Fa, Cb  Fb. Afterwards, one can often find some different initial condition T that was also fulfilled in these trials (i.e., find that Ta, Tb). Therefore, one might claim that, for instance, H*= x (Tx Fx) is also corroborated because it also explains the data. This is usually not accepted: proponents of H* will have to pro-vide new data like, ideally, Ch  Th  Fh and Ck  Tk  Fk. Use Novelty Criterion (UNC, Worrall 2010): Old data that have been used to construct H* do not support H*. In other words: New data are required in order to show that a SKE occurred. UNC regulates scientific competition: It provides incentives for new empirical work and prevents purely parasitic theories.

Max Albert, JLU Giessen 1.2 Extending CR to Statistics • Ingredients • Law-like hypothesis H = (C,X,P): If initial condition C holds, then X is distributed according to probability distribution P. • Propensity interpretation, causal hypothesis: C is a (genera- lized) cause of the X values (Albert 2007). • H implies that X is i.i.d. in different trials. • Add methodological rules for checking the then-part. Fisher‘s Theory of Significance Testing Let H0 = (C,X,P). 1. Choose a level of significance . 2. Choose a rejection region R with P(X  R) =  where densities are lowest („cut off the tails“).

Max Albert, JLU Giessen • Critical Discussion of Fisher‘s Theory • Neyman‘s Paradox: Transformations of X can map low density areas into high density areas and vice versa. • Neyman-Pearson Theory (NPT): Choose H1 = (C,X,P1). Maxi-mize the power P1(X  R) = 1- given P(X  R)  . • But: Unless the statistical model H0 H1 is corroborated, this ignores third-kind errors (misspecification: H0 H1 false). • Mayo & Spanos (2006):Fisher’s theory supplies misspecifica-tion tests for statistical models. • However: If Fisher’s theory works, H0 can be tested in isolation. • My position: • Neyman’s Paradox poses no problems (Albert 2001). • Falsificationism supports Fisher’s basic idea (Albert 1992). • The NPT applies if a statistical model follows from background knowledge. If not, hypotheses should be tested in isolation.

Max Albert, JLU Giessen 2. Maximizing Empirical Content • Fact: All observable random variables are discrete (and finite). • Consequence • Neyman‘s Paradox cannot occur. • “Low probability” replaces “low density”: Choose a rejection region R with P(X  R) =  where probabilities are lowest. Example: For = 0.1, choose R = {1,6}, not R = {2}. But why? Falsificationist Reason: Given , the rule maximizes the “empi-rical content” (R’s share of the sample space).

Max Albert, JLU Giessen Empirical Content (EC) for Sample Size n Consider H = (C,X,P) with X {1,...,k}, P(X=j) = pjand sample space Sn = {(h1,...,hk): jhj = n}, with hj as the frequency of X = j. Let the rejection region be R Sn. Then EC(R) = |R|/|Sn|. Maximizing EC(R) for P(X  R)   yields the multinomial good-ness-of-fit (mgof) test (approx. by the 2 gof test for n·pj 20). For each n, the mgof test yields a trade-off between 1- and EC. The trade-off is concave to the origin and shifts outward with increasing n. Interpretation of EC • Non-probabilistic measure of the vulnerability of the hypothesis. • Substitute for power. Severe test: 1- and EC large (p1,...,p6) = (1,2,5,7,4,1)/20 n = 1, 5, 10, 20

Max Albert, JLU Giessen • Problem: Why do samples differing w.r.t. order, like {Head, Tail} and {Tail, Head} in coin tossing, just count as one element? • Hacking (1965): Neglecting order can only be justified by appeal to alternative hypotheses. • This is not quite right: • H0 = (C,X,P) implies that order is irrelevant. • Ex post, one can always find “suspicious patterns” in the data. But the data do not support new hypotheses Hj = (Cj,X,Pj) ex- plaining these patterns (UNC). • Thus, tests can neglect order. Hypotheses inspired by patterns in the data must be tested on their own. • But: Deciding how to count requires a further argument. The best argument takes the alternatives Hj = (C,X,Pj) into account: • When the sample size n increases, the power of the test w.r.t. these alternatives goes up, approaching 1 for n .

Max Albert, JLU Giessen 4. Conclusions 1. The multinomial goodness-of-fit test fulfills all the require- ments of a falsificationist theory of statistical inference. 2. The NPT can be used by critical rationalists when the statistical model follows from background knowledge. 3. If there is no such model, each hypothesis of a disjunction (compound hypothesis) should be evaluated on its own (“disjunctive test criterion”, defended by Bowley against Fisher, see Baird BJPS 1983 on the 2 controvery). 4. Any kind of test can be used for heuristic purposes (dia- gnostic tests, search for suspicious patterns). This should not be confused with testing. Hypotheses found in this way have to be tested with new data (UNC).

Falsificationist Theory of Statistical Inference: Applying Critical Rationalism in Statistics

Falsificationist Theory of Statistical Inference: Applying Critical Rationalism in Statistics

Presentation Transcript

Markus A. Weigand Department of Anesthesiology and Intensive Care Medicine Justus-Liebig-University of Gie en University

Max Albert, Justus Liebig University Giessen A Falsificationist Theory of Statistical Inference

Statistical Inference

Statistical Inference

Harald Lüngen Justus-Liebig-University, Giessen

Justus-Liebig Universit y , Giessen

Justus von Liebig (1803 -1873)

Statistical Inference

Statistical Inference

FACULTY OF VETERINARY MEDICINE AT THE JUSTUS-LIEBIG-UNIVERSITY GIESSEN

Statistical inference

Statistical Inference and Random Field Theory

Justus-Liebig-University of Giessen

Statistical Inference and Random Field Theory

Justus-Liebig Universit y , Giessen

Statistical Inference and Random Field Theory

Statistical Inference