260 likes | 372 Views
Learning Causal Structure from Observational and Experimental Data. Richard Scheines Carnegie Mellon University. Causation , Statistics , and Experiments. Graphical Causal Models. Galileo Galilei. Francis Bacon. Udny Yule. Charles Spearman. Sewall Wright. Sir Ronald A. Fisher.
E N D
Learning Causal Structure from Observational and Experimental Data Richard Scheines Carnegie Mellon University
Causation, Statistics, and Experiments Graphical Causal Models Galileo Galilei Francis Bacon Udny Yule Charles Spearman Sewall Wright Sir Ronald A. Fisher TrygveHaavelmo Potential Outcomes Jerzy Neyman 1500 1600 ….. …… 1900 1930 1960 1990
Causal Graphs Causal Graph G = {V,E} Each edge X Y represents a direct causal claim: X is a direct cause of Y relative to V Years of Education Income Years of Education Skills and Knowledge Income
Bridge Principles: Causal Graph over V Constraints on P(V) • Causal Markov Axiom • Acyclicity • d-separation criterion • Causal Graph • Independence Oracle • Z _||_ Y1 | X Z _||_ Y2 | X • Z _||_ Y1 | X,Y2 Z _||_ Y2 | X,Y1 • Y1 _||_ Y2 | X Y1 _||_ Y2 | X,Z • Z • X • Y1 • Y2
Faithfulness • Constraints on a probability distribution P generated by a causal structure G hold for all parameterizations of G. Revenues = aRate + cEconomy + eRev. Economy = bRate + eEcon. Faithfulness: a ≠ -bc
Faithfulness • - • Gene A • By evolutionary design: • Gene A _||_ Protein 24 • Gene B • + • + • Protein 24 • By evolutionary design: • Air temp _||_ Core Body Temp • Air Temp • Core Body Temp Sampling Rate vs. Equilibration rate • Homeostatic Regulator
Causal Structure Association Obesity TV TV _||_ Obesity TV _||_ Obesity TV _||_ Obesity Obesity TV TV Obesity C
Modeling Ideal Interventions Interventions on the Effect Post Pre-experimental System Room Temperature Sweaters On
Modeling Ideal Interventions Interventions on the Cause Post Pre-experimental System Room Temperature Sweaters On
Pre-intervention graph “Soft” Intervention “Hard” Intervention Interventions & Causal Graphs Model an ideal intervention by adding an “intervention” variable outside the original system as a direct cause of its target. Intervene on Income
Association underdeterminesCausal Structure Obesity TV TV _||_ Obesity TV _||_ Obesity TV _||_ Obesity Obesity TV TV Obesity C SpuriousAssociation
Randomization Association= Causation Obesity TV TV _||_ Obesity Randomizer Randomizer Randomizer Obesity TV _||_ Obesity TV TV TV _||_ Obesity Obesity C
Randomization Association= Causation U Treatment Randomizer Response Dropout Treatment _||_ Response Treatment _||_ Response | Dropout = no U Treatment Assignment Response Treatment Randomizer
Randomization Association= Causation Treatment Assignment Response Randomizer Treatment Treatment _||_ Response Belief
Experimental Control & Statistical Control M C Randomizer Randomizer X1 X1 X3 X3 Statistically control for C Experimentally control for C X3 _||_ X1 | C(set) X3 _||_ X1 | C Experimentally control for M Statistically control for M X3 _||_ X1 | M(set) X3 _||_ X1 | M
Randomizer Experimental Control≠ Statistical Control M X1 X3 Experimentally control for M Statistically control for M U X3 _||_ X1 | M(set) Randomizer Experimentally control for M Statistically control for M U2 M U1 X3 _||_ X1 | M(set) X3 _||_ X1 | M X3 _||_ X1 | M X3 X1
P(V)= f(Causal Model(V), Experimental Setup(V)) Causal Model(V) Manipulated Causal ModelM(V) • Experimental Setup(V) • V = {O, M} • P(M) • X Y Z • X Y Z • Structural Eqs.(V) or CPT(V) • Structural Eqs.M(V) or CPTM (V) PM(V) Sampling I Data
CausalDiscovery • General Assumptions • Markov, Faithfulness • Linearity • Gaussianity • Acyclicity • Etc. • Experimental Setup(V) • V = {O, M} • P(M) Discovery Algorithm Equivalence Class of Causal Structures PM(V) Statistical Inference Data
CausalDiscoveryfrom Passive Observation • PC, GES Patterns (Markov equivalence class - no latent confounding) • FCI PAGs (Markov equivalence - including confounders and selection bias) • CCD Linear cyclic models (no confounding) • BPC Linear latent variable models • Lingam unique DAG (no confounding – linear non-Gaussian – faithfulness not needed) • LVLingam set of DAGs (confounders allowed) • CyclicLingam set of DGs (cyclic models, no confounding) • Non-linear additive noise models unique DAG
CausalDiscoveryfrom Manipulations/Interventions What sorts of manipulation/interventions have been studied? X Y • Do(X=x) : replace P(X | parents(X)) with P(X=x) = 1.0 • Randomize(X): (replace P(X | parents(X)) with PM(X), e.g., uniform) • Soft interventions (replace P(X | parents(X)) with PM(X | parents(X), I), PM(I)) • Simultaneous interventions • Sequential interventions • Sequential, conditional interventions • Time sensitive interventions • Shock and run: Set X at time t, and then let the system run • Clamp : Set X at time t, and hold it fixed until time t + D
CausalDiscoveryfrom Manipulations/Interventions Simultaneous Interventions Destroy Information Equivalence Class Experimental Setup Randomize(X,Y) independently Y X Y X Y X PM(V) X _||_ Y Y X Y X Y X Y X Y X
CausalDiscoveryfrom Manipulations/Interventions Simultaneous Interventions Destroy Information, but: Sequence of single interventions over N variables, N-1 experiments are needed to guarantee causal identification Sequence of simultaneous interventions: 2 log(N) + 1
CausalDiscoveryfrom Manipulations/Interventions Equivalence class oddities True Model Experimental Setup Randomize(Y) X Y X Y PM(V) X _||_ Y I
CausalDiscoveryfrom Manipulations/Interventions Equivalence class oddities Equivalence Class Experimental Setup Randomize(Y) Y X Y X PM(V) X _||_ Y Y X Y X X Y
CausalDiscoveryfrom Manipulations/Interventions Equivalence class oddities Equivalence Class Experimental Setup Randomize(X,Y) independently • X is an ancestor of Z • X has a path to Z not through Y PM(V) X _||_ Z
Issues • Efficiently representing a wider array of information relevant to causal structure discovery, and then efficiently combining it to maximally constrain the possible explanations of data • Rate of reaching equilibrium vs. rate of sampling • Transportability • Constructing appropriate variables from raw measurements • High dimensionality