810 likes | 828 Views
Explore the world of causal inference and graphical models through an in-depth overview, focusing on manipulations, hidden common causes, and the effects of interventions. From efficiently setting insurance rates to analyzing the repercussions of banning smoking, delve into the intricacies of causation, causal DAGs, conditioning, and manipulation notations.
E N D
Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University
Overview • Manipulations • Assuming no Hidden Common Causes • From DAGs to Effects of Manipulation • From Data to Sets of DAGs • From Sets of Dags to Effects of Manipulation • May be Hidden Common Causes • From Data to Sets of DAGs • From Sets of DAGs to Effects of Manipulations
If I were to force a group of people to smoke one pack a day, what what percentage would develop lung cancer? The Evidence
Conditioning on Teeth white = yes P(Lung Cancer = yes|Teeth white = yes) = 1/4
Manipulating Teeth white = yes - After Waiting P(Lung Cancer = yes ||White teeth = yes) = 1/2 P(Lung Cancer = yes|White teeth = yes) = 1/4
Smoking Decision • Setting insurance rates for smokers - conditioning • Suppose the Surgeon General is considering banning smoking? • Will this decrease smoking? • Will decreasing smoking decrease cancer? • Will it have negative side-effects – e.g. more obesity? • How is greater life expectancy valued against decrease in pleasure from smoking?
Manipulations and Distributions • Since Smoking determines Teeth white, P(T,L,R,W) = P(S,L,R,W) • But the manipulation of Teeth white leads to different results than the manipulation of Smoking • Hence the distribution does not always uniquely determine the results of a manipulation
Causation • We will infer average causal effects. • We will not consider quantities such as probability of necessity, probability of sufficiency, or the counterfactual probability that I would get a headache conditional on taking an aspirin, given that I did not take an aspirin • The causal relations are between properties of a unit at a time, not between events. • Each unit is assumed to be causally isolated. • The causal relations may be genuinely indeterministic, or only apparently indeterministic.
Causal DAGs • Probabilistic Interpretation of DAGs • A DAG represents a distribution P when each variable is independent of its non-descendants conditional on its parents in the DAG • Causal Interpretation of DAGs • There is a directed edge from A to B (relative to V) when A is a direct cause of B. • An acyclic graph is not a representation of reversible or feedback processes
Conditioning • Conditioning maps a probability distribution and an event into a new probability distribution: • f(P(V),e) P’(V), where P’(V=v) = P(V=v)/P(e)
Manipulating • A manipulation maps a population joint probability distribution, a causal DAG, and a set of new probability distributions for a set of variables, into a new joint distribution • Manipulating: for {X1,…,Xn} V • f: P(V), population distribution • G, causal DAG • {P’(X1|Non-Descendants(G,X1)),…, manipulated variables P’(Xn|Non-Descendants(G,Xn))} P’(V) manipulated distribution (assumption that manipulations are independent)
Manipulation Notation - Adapting Lauritzen • The distribution of Lung Cancer given the manipulated distribution of Smoking • P(Lung Cancer||P’(Smoking)) • The distribution of Lung Cancer conditional on Radon given the manipulated distribution of Smoking • P(Lung Cancer|Radon||P’(Smoking)) = • P(Lung Cancer,Radon||P’(Smoking))/ P(Radon||P’(Smoking)) • First manipulate, then condition
Ideal Manipulations • No fat hand • Effectiveness • Whether or not any actual action is an ideal manipulation of a variable Z is not part of the theory - it is input to the theory. • With respect to a system of variables containing murder rates, outlawing cocaine is not an ideal manipulation of cocaine usage • It is not entirely effective - people still use cocaine • It affects murder rates directly, not via its effect on cocaine usage, because of increased gang warfare
3 Representations of Manipulations • Structural Equation • Policy Variable • Potential Outcomes
College Plans • Sewell and Shah (1968) studied five variables from a sample of 10,318 Wisconsin high school seniors. • SEX [male = 0, female = 1] • IQ = Intelligence Quotient, [lowest = 0, highest = 3] • CP = college plans [yes = 0, no = 1] • PE = parental encouragement [low = 0, high = 1] • SES = socioeconomic status [lowest = 0, highest = 3]
College Plans - A Hypothesis SES SEX PE CP IQ
Equational Representation • xi = f(pai(G), ei) • If the ei are causes of two or more variables, they must be included in the analysis • There is a distribution over the ei • The equations and the distribution over the ei determine a distribution over the xi • When manipulating variable to a value, replace with xi = c
P(PE,SES,SEX,IQ,CP) Suppose P’(PE=1)=1 P(SES,SEX,IQ,CP,PE=1||P’(PE)) P(CP|PE||P’(PE)) P(PE,SES,SEX,IQ,CP|policy = off) P(PE=1|policy = on) = 1 P(SES,SEX,IQ,CP,PE=1|policy=on) P(CP|PE|policy = on) SES SES SES SEX SEX PE PE CP CP IQ IQ Policy Variable Representation Pre-manipulation Post-manipulation
Effect of Manipulation Causal DAGs Background Knowledge Causal Axioms, Prior Population Distribution Sampling and Distributional Sample Assumptions, Prior From DAG to Effects of Manipulation
SES SES SEX PE CP IQ Causal Sufficiency • A set of variables is causally sufficient if every cause of two variables in the set is also in the set. • {PE,CP,SES} is causally sufficient • {IQ,CP,SES} is not causally sufficient.
SES SES SEX PE CP IQ Causal Markov Assumption • For a causally sufficient set of variables, the joint distribution is the product of each variable conditional on its parents in the causal DAG. • P(SES,SEX,PE,CP,IQ) = P(SES)P(SEX)P(IQ|SES)P(PE|SES,SEX,IQ)P(CP|PE)
SES SES SEX PE CP IQ Equivalent Forms of Causal Markov Assumption • In the population distribution, each variable is independent of its non-descendants in the causal DAG (non-effects) conditional on its parents (immediate causes). • If X is d-separated from Y conditional on Z (written as <X,Y|Z>) in the causal graph, then X is independent of Y conditional on Z in the population distribution) denoted I(X,Y|Z)).
Causal Markov Assumption • Causal Markov implies that if X is d-separated from Y conditional on Z in the causal DAG, then X is independent of Y conditional on Z. • Causal Markov is equivalent to assuming that the causal DAG represents the population distribution. • What would a failure of Causal Markov look like? If X and Y are dependent, but X does not cause Y, Y does not cause X, and no variable Z causes both X and Y.
Causal Markov Assumption • Assumes that no unit in the population affects other units in the population • If the “natural” units do affect each other, the units should be re-defined to be aggregations of units that don’t affect each other • For example, individual people might be aggregated into families • Assumes variables are not logically related, e.g. x and x2 • Assumes no feedback
Manipulation Theorem - No Hidden Variables • P(PE,SES,SEX,CP,IQ||P’(PE)) = • P(PE)P(SEX)P(CP|PE,SES,IQ)P(IQ|SES)P(PE|policy=on) = • P(PE)P(SEX)P(CP|PE,SES,IQ)P(IQ|SES)P’(PE) SES SES Policy SEX PE CP IQ
SES SES Policy SEX PE CP Invariance • Note that P(CP|PE,SES,IQ,policy = on) = P(CP|PE,SES,IQ,policy = off) because the policy variable is d-separated from CP conditional on PE,SES,IQ • We say that P(CP|PE,SES,IQ) is invariant • An invariant quantity can be estimated from the pre-manipulation distribution • This is equivalent to one of the rules of the Do Calculus and can also be applied to latent variable models IQ
SES SES Policy SEX PE CP Calculating Effects IQ
Effect of Manipulation Causal DAGs Background Knowledge Causal Axioms, Prior Population Distribution Sampling and Distributional Sample Assumptions, Prior From Sample to Sets of DAGs
Constraint - Based Uses tests of conditional independence Goal: Find set of DAGs whose d-separation relations match most closely the results of conditional independenc tests Score - Based Uses scores such as Bayesian Information Criterion or Bayesian posterior Goal: Maximize score From Sample to Population to DAGs
Bayesian Information Criterion • D is the sample data • G is a DAG • is the vector of maximum likelihood estimates of the parameters for DAG G • N is the sample size • d is the dimensionality of the model, which in DAGs without latent variables is simply the number of free parameters in the model
3 Kinds of Alternative Causal Models SES SES SES SES SEX PE CP SEX PE CP IQ IQ True Model Alternative 1 SES SES SES SES SEX PE CP SEX PE CP IQ IQ Alternative 3 Alternative 2
Alternative Causal Models SES SES SES SES • Constraint - Based: Alternative 1 violates Causal Markov Assumption by entailing that SES and IQ are independent • Score - Based: Use a score that prefers a model that contains the true distribution over one that does not. SEX PE CP SEX PE CP IQ IQ True Model Alternative 1
Alternative Causal Models SES SES SES SES • Constraint - Based: Assume that if Sex and CP are independent (conditional on some subset of variables such as PE, SES, and IQ) then Sex and CP are adjacent - Causal Adjacency Faithfulness Assumption. • Score - Based: Use a score such that if two models contain the true distribution, choose the one with fewer parameters. The True Model has fewer parameters. SEX PE CP SEX PE CP IQ IQ True Model Alternative 2
Both Assumptions Can Be False Independence holds only for parameters on lower dimensional surface - Lebesgue measure 0 Independence holds for all values of parameters Alternative 2 True Model
When Not to Assume Faithfulness • Deterministic relationships between variables entail “extra” conditional independence relations, in addition to those entailed by the global directed Markov condition. • If A B C, and B = A, and C = B, then not only I(A,C|B), which is entailed by the global directed Markov condition, but also I(B,C|A), which is not. • The deterministic relations are theoretically detectible, and when present, faithfulness should not be assumed. • Do not assume in feedback systems in equilibrium.
Alternative Causal Models SES SES SES SES • Constraint -Based: Alternative 2 entails the same set of conditional independence relations - there is no principled way to choose. SEX PE SEX CP PE CP IQ IQ True Model Alternative 3
Alternative Causal Models SES SES SES SES • Score - Based: Whether or not one can choose depends upon the parametric family. • For unrestricted discrete, or linear Gaussian, there is no way to choose - the BIC scores will be the same. • For linear non-Gaussian, the True Model will be preferred (because while the two models entail the same second order moments, they entail different fourth order moments.) SEX PE SEX CP PE CP IQ IQ True Model Alternative 2
Patterns • A pattern (or p-dag) represents a set of DAGs that all have the same d-separation relations, i.e. a d-separation equivalence class of DAGs. • The adjacencies in a pattern are the same as the adjacencies in each DAG in the d-separation equivalence class. • An edge is oriented as A B in the pattern if it is oriented as A B in every DAG in the equivalence class. • An edge is oriented as A B in the pattern if the edge is oriented as A B in some DAGs in the equivalence class, and as A B in other DAGs in the equivalence class.
Patterns to Graphs • All of the DAGs in a d-separation equivalence class can be derived from the pattern that represents the d-separation equivalence class by orienting the unoriented edges in the pattern. • Every orientation of the unoriented edges is acceptable as long as it creates no new unshielded colliders. • That is A B C can be oriented as A B C, A B C, or A B C, but not as A B C.
SES SES SEX PE CP IQ Patterns SES SES SEX PE CP IQ D-separation Equivalence Class SES SES SEX PE CP IQ Pattern
Search Methods • Constraint Based: • PC (correct in limit) • Variants of PC (correct in limit, better on small sample sizes) • Score - Based: • Greedy hill climbing • Simulated annealing • Genetic algorithms • Greedy Equivalence Search (correct in limit)
Effect of Manipulation Causal DAGs Background Knowledge Causal Axioms, Prior Population Distribution Sampling and Distributional Sample Assumptions, Prior From Sets of DAGs to Effects of Manipulation
Causal Inference in Patterns • Is P(IQ) invariant when SES is manipulated to a constant? Can’t tell. • If SES IQ, then policy is d-connected to IQ given empty set - no invariance. • If SES IQ, then policy is not d-connected to IQ given empty set - invariance. SES SES ? policy SEX PE CP IQ
Causal Inference in Patterns • Different DAGs represented by pattern give different answers as to the effect of manipulating SES on IQ - not identifiable. • In these cases, should ouput “can’t tell”. • Note the difference from using Bayesian networks for classification - we can use either DAG equally well for correct classification, but we have to know which one is true for correct inference about the effect of a manipulation. SES SES ? policy SEX PE CP IQ
Causal Inference in Patterns • Is P(CP|PE,SES,IQ) invariant when PE is manipulated to a constant? Can tell. • policy variable is d-separated from CP given PE, SES, IQ regardless of which way the edge points - invariance in every DAG represented by the pattern. SES SES ? SEX PE CP policy IQ
SES SEX PE CP IQ College Plans not invariant, but is identifiable invariant
Effect of Manipulation Causal DAGs Background Knowledge Causal Axioms, Prior Population Distribution Sampling and Distributional Sample Assumptions, Prior Good News In the large sample limit, there are algorithms (PC, Greedy Equivalence Search) that are arbitrarily close to correct (or output “can’t tell”) with probability 1 (pointwise consistency).