180 likes | 292 Views
Evidence for Probabilistic Hypotheses : With Applications to Causal Modeling. Vals , Switzerland, August 7, 2013. Malcolm R. Forster Department of Philosophy University of Wisconsin-Madison. 1. References.
E N D
Evidence for Probabilistic Hypotheses: With Applications to Causal Modeling Vals, Switzerland, August 7, 2013 Malcolm R. Forster Department of Philosophy University of Wisconsin-Madison 1
References Forster, Malcolm R. (1984): Probabilistic Causality and the Foundations of Modern Science. Ph.D. Thesis, University of Western Ontario. Forster, Malcolm R. (1988): “Sober’s Principle of Common Cause and the Problem of Incomplete Hypotheses.” Philosophy of Science 55: 538‑59. Forster, Malcolm R. (2006), “Counterexamples to a Likelihood Theory of Evidence,” Mind and Machines, 16: 319-338. Whewell, William (1858): The History of Scientific Ideas, 2 vols, London, John W. Parker. Wright, Sewell (1921). “Correlation and Causation,” Journal of Agricultural Research 20: 557-585. 2
How to discover causes… TWO THESES Thesis (a): Probabilistic independences provide a way to discover causal relations. Thesis (b) Probabilistic independences provide the only way to discover causal relations. The simplest way to argue against (b) is to show how data can favor XY against YX. 3
Back to first principles… Hypothesis testing in general... Modus Tollens: Hypothesis H entails observationO, O is false, therefore His false. Probabilistic Modus Tollens: H entails that observation O is highly probable, O is false, therefore His false. THE PROBLEM: In most situations, all rival hypotheses give the total evidenceE very low probability. Put O = not-E …run prob. modus tollens … end up rejecting EVERY hypothesis!!! 4
A response to the PROBLEM We should not focus exclusively on the total evidence E. We should focus on those aspects of the data O that are central to what the hypothesis says. Example 1: The agreement of independent measurements of the parameters postulated by the model. E.g. in the Bernoulli model, or the agreement of independent measurements of the Earth’s mass. Example 2: The independencies entailed by d-separation in causal models. 5
A response to the PROBLEM …continued. (3) We should look at what is entailed by the models by themselves, without the help of other data. Examples 1 and 2 meet this desideratum. Also justifies a faithfulness principle: Favor models that entail an independency over one that is merely able to accommodate it (even if the likelihoods go the other way). (I don’t see this as appealing to non-empirical biases, such as simplicity.) 6
Now apply the agreement of measurements idea to the testing of causal models… • What does Forward, XY, entail? The independencies entailed by a DAG is part of what a causal model entails. But it often says something more… • It says something the forward probabilities (or densities) p(y|x), and nothing (directly) about p(x) or p(x,y) or p(x|y). • XY says: If p1(x), then p1(x,y) = p1(x) p(y|x), If p2(x), then p2(x,y) = p2(x) p(y|x), and so on. 7
The key idea… • We can use data generated by p1(x,y), to estimate parameters in p(y|x). • We can use data generated by p2(x,y), to estimate the same parameters in p(y|x). • The two data clusters provide independent estimates of the parameters. If the estimates agree then we have an agreement of independent measurements. • The hypothesis “stuck its neck out”, it risked falsification, it survived the test, and is thereby confirmed. 8
Prediction versus Accommodation Cluster 2 generated by p2(x,y) • Both XYand YXare able to accommodate (that is, fit) the total evidence well. So a maximum likelihood comparison is not going to discriminate well. y 15 10 5 x • But suppose we fit a model to Cluster 1, and then to Cluster 2 to see whether the independent measurements of the parameters agree. -15 -10 -5 5 10 15 -5 -10 -15 Cluster 1 generated by p1(x,y). 9
The content of XY • XY says: If p1(x), then p1(x,y) = p1(x) p(y|x), If p2(x), then p2(x,y) = p2(x) p(y|x), and so on. • XY also says: If p1(x), then p1(x|y) = p1(x,y)/p1(y). If p2(x), then p2(x|y) = p2(x,y)/p2(y)., and so on. • In general, p1(x|y) p2(x|y). That is, XY says that the backwards probabilities vary. • If XY is right then YX is wrong. • It’s metaphysically possible that that forward model say that forward probabilities depend on the input distribution. But we need to search for uniformities of nature… 10
The Asymmetry of Regression… y • The data are generated from Y = X + U, where x is N(–10,1), U is N(0,1) and U is independent of X. • The y on x regression is different from the x ony regression. -6 -7 -8 -9 x -14 -12 -8 -6 -11 -12 -13 11 -14
Forward Model: XY XY : XYpasses the test because… Cluster 2 y 15 10 Independent measurements agree! 5 x -15 -10 -5 5 10 15 -5 -10 Cluster 1 12
Backward Model YX Y X says: YXfails the test because… y 15 Not all independent measurements agree. 10 5 x -15 -10 -5 5 10 15 -5 -10 -15 13
Another way of seeing the same thing... y The forwardsmodel fits Cluster 2 (top right) better than the backwards model. 10 5 x -10 -5 5 10 -5 -10 14
Summary Bullets • The phenomenon is completely general. It does not depend on any special features of the distribution, except the judicious splitting of the data into clusters. • The method depends on a judicious splitting of the data. Bayesians (and likelihoods) do not split data. (They consider on the likelihoods relative to the total evidence.) • If you don’t split data, then it more difficult to show that XY is right and YX is wrong. 15
FORWARD CAUSAL MODEL Independent measurements agree! 16
BACKWARD CAUSAL MODEL Independent measurements do NOT agree. 17
Robustness of the Phenomenon In 15 runs the forwards regression is closer to the generating curve, y = x, than the backwards regression. y 40 20 x -40 -20 20 40 -20 -40 18