890 likes | 1.1k Views
THE MATHEMATICS OF CAUSAL MODELING. Judea Pearl Department of Computer Science UCLA. OUTLINE. Modeling: Statistical vs. Causal Causal Models and Identifiability Inference to three types of claims: Effects of potential interventions Claims about attribution (responsibility)
E N D
THE MATHEMATICS OF CAUSAL MODELING Judea Pearl Department of Computer Science UCLA
OUTLINE • Modeling: Statistical vs. Causal • Causal Models and Identifiability • Inference to three types of claims: • Effects of potential interventions • Claims about attribution (responsibility) • Claims about direct and indirect effects • Robustness of Causal Claims
TRADITIONAL STATISTICAL INFERENCE PARADIGM P Joint Distribution Q(P) (Aspects of P) Data Inference e.g., Infer whether customers who bought product A would also buy product B. Q = P(B|A)
THE CAUSAL INFERENCE PARADIGM Joint Distribution Data Generating Model Q(M) (Aspects of M) Data Inference Some Q(M) cannot be inferred from P. e.g., Infer whether customers who bought product A would still buy A if we double the price.
Probability and statistics deal with static relations Statistics Probability inferences from passive observations joint distribution Data FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES
FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES Probability and statistics deal with static relations Statistics Probability inferences from passive observations joint distribution Data • Causal analysis deals with changes (dynamics) • i.e. What remains invariant when P changes. • P does not tell us how it ought to change • e.g. Curing symptoms vs. curing diseases • e.g. Analogy: mechanical deformation
Probability and statistics deal with static relations Statistics Probability inferences from passive observations joint distribution Data Causal analysis deals with changes (dynamics) • Effects of • interventions Data Causal Model • Causes of • effects Causal assumptions • Explanations Experiments FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES
Causal and statistical concepts do not mix. CAUSAL Spurious correlation Randomization Confounding / Effect Instrument Holding constant Explanatory variables STATISTICAL Regression Association / Independence “Controlling for” / Conditioning Odd and risk ratios Collapsibility FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES (CONT)
Causal and statistical concepts do not mix. CAUSAL Spurious correlation Randomization Confounding / Effect Instrument Holding constant Explanatory variables STATISTICAL Regression Association / Independence “Controlling for” / Conditioning Odd and risk ratios Collapsibility • No causes in – no causes out (Cartwright, 1989) } statistical assumptions + data causal assumptions causal conclusions FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES (CONT) • Causal assumptions cannot be expressed in the mathematical language of standard statistics.
Causal and statistical concepts do not mix. CAUSAL Spurious correlation Randomization Confounding / Effect Instrument Holding constant Explanatory variables STATISTICAL Regression Association / Independence “Controlling for” / Conditioning Odd and risk ratios Collapsibility • No causes in – no causes out (Cartwright, 1989) } statistical assumptions + data causal assumptions causal conclusions FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES (CONT) • Causal assumptions cannot be expressed in the mathematical language of standard statistics. • Non-standard mathematics: • Structural equation models (Wright, 1920; Simon, 1960) • Counterfactuals (Neyman-Rubin, Lewis)
WHAT'SIN A CAUSAL MODEL? Oracle that assigns truth value to causal sentences: Action sentences:B if wedoA. Counterfactuals:B would be different if Awere true. Explanation:B occurredbecauseof A. Optional:with whatprobability?
FAMILIAR CAUSAL MODEL ORACLE FOR MANIPILATION X Y Z INPUT OUTPUT
Definition: A causal model is a 3-tuple M = V,U,F with a mutilation operator do(x): MMx where: (i) V = {V1…,Vn} endogenous variables, (ii) U = {U1,…,Um} background variables (iii) F = set of n functions, fi : V \ ViU Vi vi = fi(pai,ui)PAi V \ ViUi U CAUSAL MODELS AND CAUSAL DIAGRAMS
Definition: A causal model is a 3-tuple M = V,U,F with a mutilation operator do(x): MMx where: (i) V = {V1…,Vn} endogenous variables, (ii) U = {U1,…,Um} background variables (iii) F = set of n functions, fi : V \ ViU Vi vi = fi(pai,ui)PAi V \ ViUi U I W Q P CAUSAL MODELS AND CAUSAL DIAGRAMS U1 U2 PAQ
Definition: A causal model is a 3-tuple M = V,U,F with a mutilation operator do(x): MMx where: (i) V = {V1…,Vn} endogenous variables, (ii) U = {U1,…,Um} background variables (iii) F = set of n functions, fi : V \ ViU Vi vi = fi(pai,ui)PAi V \ ViUi U CAUSAL MODELS AND MUTILATION (iv) Mx= U,V,Fx, X V, x X where Fx = {fi: Vi X } {X = x} (Replace all functions ficorresponding to X with the constant functions X=x)
Definition: A causal model is a 3-tuple M = V,U,F with a mutilation operator do(x): MMx where: (i) V = {V1…,Vn} endogenous variables, (ii) U = {U1,…,Um} background variables (iii) F = set of n functions, fi : V \ ViU Vi vi = fi(pai,ui)PAi V \ ViUi U I W Q CAUSAL MODELS AND MUTILATION (iv) U1 U2 P
Definition: A causal model is a 3-tuple M = V,U,F with a mutilation operator do(x): MMx where: (i) V = {V1…,Vn} endogenous variables, (ii) U = {U1,…,Um} background variables (iii) F = set of n functions, fi : V \ ViU Vi vi = fi(pai,ui)PAi V \ ViUi U I W Q CAUSAL MODELS AND MUTILATION (iv) Mp U1 U2 P P = p0
Definition: A causal model is a 3-tuple M = V,U,F with a mutilation operator do(x): MMx where: (i) V = {V1…,Vn} endogenous variables, (ii) U = {U1,…,Um} background variables (iii) F = set of n functions, fi : V \ ViU Vi vi = fi(pai,ui)PAi V \ ViUi U PROBABILISTIC CAUSAL MODELS (iv) Mx= U,V,Fx, X V, x X where Fx = {fi: Vi X } {X = x} (Replace all functions ficorresponding to X with the constant functions X=x) Definition (Probabilistic Causal Model): M, P(u) P(u) is a probability assignment to the variables in U.
CAUSAL MODELS AND COUNTERFACTUALS Definition: Potential Response The sentence: “Y would be y (in unit u), had X been x,” denoted Yx(u) = y, is the solution for Y in a mutilated model Mx, with the equations for X replaced by X = x. (“unit-based potential outcome”)
CAUSAL MODELS AND COUNTERFACTUALS Joint probabilities of counterfactuals: Definition: Potential Response The sentence: “Y would be y (in unit u), had X been x,” denoted Yx(u) = y, is the solution for Y in a mutilated model Mx, with the equations for X replaced by X = x. (“unit-based potential outcome”)
CAUSAL MODELS AND COUNTERFACTUALS In particular: Definition: Potential Response The sentence: “Y would be y (in unit u), had X been x,” denoted Yx(u) = y, is the solution for Y in a mutilated model Mx, with the equations for X replaced by X = x. (“unit-based potential outcome”) Joint probabilities of counterfactuals:
3-STEPS TO COMPUTING COUNTERFACTUALS Abduction TRUE S5. If the prisoner is dead, he would still be dead if A were not to have shot. DDA (Court order) U TRUE (Captain) C (Riflemen) A B (Prisoner) D
3-STEPS TO COMPUTING COUNTERFACTUALS Abduction Action Prediction U U TRUE TRUE C C FALSE FALSE A B A B D D TRUE TRUE S5. If the prisoner is dead, he would still be dead if A were not to have shot. DDA U TRUE C A B D
COMPUTING PROBABILITIES OF COUNTERFACTUALS Abduction Action Prediction U U P(u|D) P(u) P(u|D) P(u|D) C C FALSE FALSE A B A B D D TRUE P(DA|D) P(S5). The prisoner is dead. How likely is it that he would be dead if A were not to have shot. P(DA|D) = ? U C A B D
CAUSAL INFERENCE MADE EASY (1985-2000) • Inference with Nonparametric Structural Equations • made possible through Graphical Analysis. • Mathematical underpinning of counterfactuals • through nonparametric structural equations • Graphical-Counterfactuals symbiosis
IDENTIFIABILITY Definition: Let Q(M) be any quantity defined on a causal model M, andlet A be a set of assumption. Q is identifiable relative to A iff P(M1) = P(M2) ÞQ(M1) = Q(M2) for all M1, M2, that satisfy A.
IDENTIFIABILITY Definition: Let Q(M) be any quantity defined on a causal model M, andlet A be a set of assumption. Q is identifiable relative to A iff P(M1) = P(M2) ÞQ(M1) = Q(M2) for all M1, M2, that satisfy A. In other words, Q can be determined uniquely from the probability distribution P(v) of the endogenous variables, V, and assumptions A.
IDENTIFIABILITY Definition: Let Q(M) be any quantity defined on a causal model M, andlet A be a set of assumption. Q is identifiable relative to A iff P(M1) = P(M2)ÞQ(M1) = Q(M2) for all M1, M2, that satisfy A. In this talk: A: Assumptions encoded in the diagram Q1: P(y|do(x)) Causal Effect (= P(Yx=y)) Q2: P(Yx =y | x, y) Probability of necessity Q3: Direct Effect
THE FUNDAMENTAL THEOREM OF CAUSAL INFERENCE Causal Markov Theorem: Any distribution generated by Markovian structural model M (recursive, with independent disturbances) can be factorized as Where pai are the (values of) the parents of Viin the causal diagram associated with M.
Corollary: (Truncated factorization, Manipulation Theorem) The distribution generated by an intervention do(X=x) (in a Markovian model M) is given by the truncated factorization THE FUNDAMENTAL THEOREM OF CAUSAL INFERENCE Causal Markov Theorem: Any distribution generated by Markovian structural model M (recursive, with independent disturbances) can be factorized as Where pai are the (values of) the parents of Viin the causal diagram associated with M.
Given P(x,y,z),should we ban smoking? U (unobserved) U (unobserved) X = x Y Z X Y Z Smoking Tar in Lungs Cancer Smoking Tar in Lungs Cancer RAMIFICATIONS OF THE FUNDAMENTAL THEOREM Pre-intervention Post-intervention
Given P(x,y,z),should we ban smoking? U (unobserved) U (unobserved) X = x Y Z X Y Z Smoking Tar in Lungs Cancer Smoking Tar in Lungs Cancer RAMIFICATIONS OF THE FUNDAMENTAL THEOREM Pre-intervention Post-intervention
Given P(x,y,z),should we ban smoking? U (unobserved) U (unobserved) X = x Y Z X Y Z Smoking Tar in Lungs Cancer Smoking Tar in Lungs Cancer RAMIFICATIONS OF THE FUNDAMENTAL THEOREM Pre-intervention Post-intervention To compute P(y,z|do(x)), wemust eliminate u. (Graphical problem.)
G Gx THE BACK-DOOR CRITERION Graphical test of identification P(y | do(x)) is identifiable in G if there is a set Z of variables such that Zd-separates X from Y in Gx. Z1 Z1 Z2 Z2 Z Z3 Z3 Z4 Z5 Z5 Z4 X X Z6 Y Y Z6
G Gx Moreover, P(y | do(x)) = åP(y | x,z) P(z) (“adjusting” for Z) z THE BACK-DOOR CRITERION Graphical test of identification P(y | do(x)) is identifiable in G if there is a set Z of variables such that Zd-separates X from Y in Gx. Z1 Z1 Z2 Z2 Z Z3 Z3 Z4 Z5 Z5 Z4 X X Z6 Y Y Z6
RULES OF CAUSAL CALCULUS • Rule 1:Ignoring observations • P(y |do{x},z, w) = P(y | do{x},w) • Rule 2:Action/observation exchange • P(y |do{x}, do{z}, w) = P(y|do{x},z,w) • Rule 3: Ignoring actions • P(y |do{x},do{z},w) = P(y|do{x},w)
DERIVATION IN CAUSAL CALCULUS Genotype (Unobserved) Smoking Tar Cancer Probability Axioms P (c |do{s})=tP (c |do{s},t) P (t |do{s}) Rule 2 = tP (c |do{s},do{t})P (t |do{s}) Rule 2 = tP (c |do{s},do{t})P (t | s) Rule 3 = tP (c |do{t})P (t | s) Probability Axioms = stP (c |do{t},s) P (s|do{t})P(t |s) Rule 2 = stP (c | t, s) P (s|do{t})P(t |s) Rule 3 = stP (c | t, s) P (s) P(t |s)
X Z 1 Z Z k RECENT RESULTS ON IDENTIFICATION • Theorem (Tian 2002): we can identifyP(v |do{x})(x a singleton) if and only if there is no child Z of X connected to X by a bi-directed path.
RECENT RESULTS ON IDENTIFICATION (Cont.) • Do-calculus is complete • A complete graphical criterion available for identifying causal effects of any set on any set • References: Shpitser and Pearl 2006 (AAAI, UAI)
OUTLINE • Modeling: Statistical vs. Causal • Causal models and identifiability • Inference to three types of claims: • Effects of potential interventions, • Claims about attribution (responsibility)
DETERMINING THE CAUSES OF EFFECTS (The Attribution Problem) • Your Honor! My client (Mr. A) died BECAUSE • he used that drug.
DETERMINING THE CAUSES OF EFFECTS (The Attribution Problem) • Your Honor! My client (Mr. A) died BECAUSE • he used that drug. • Court to decide if it is MORE PROBABLE THAN • NOT that A would be alive BUT FOR the drug! • P(? | A is dead, took the drug) > 0.50
THE PROBLEM • Theoretical Problems: • What is the meaning of PN(x,y): • “Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur.”
THE PROBLEM • Theoretical Problems: • What is the meaning of PN(x,y): • “Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur.” • Answer:
THE PROBLEM • Theoretical Problems: • What is the meaning of PN(x,y): • “Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur.” • Under what condition can PN(x,y) be learned from statistical data, i.e., observational, experimental and combined.
WHAT IS INFERABLE FROM EXPERIMENTS? Simple Experiment: Q = P(Yx= y | z) Z nondescendants of X. Compound Experiment: Q = P(YX(z) = y | z) Multi-Stage Experiment: etc…
CAN FREQUENCY DATA DECIDE LEGAL RESPONSIBILITY? ExperimentalNonexperimental do(x) do(x) xx Deaths (y) 16 14 2 28 Survivals (y) 984 986 998 972 1,000 1,000 1,000 1,000 • Nonexperimental data: drug usage predicts longer life • Experimental data: drug has negligible effect on survival • Plaintiff: Mr. A is special. • He actually died • He used the drug by choice • Court to decide (given both data): • Is it more probable than not that A would be alive • but for the drug?
TYPICAL THEOREMS (Tian and Pearl, 2000) • Identifiability under monotonicity (Combined data) • corrected Excess-Risk-Ratio • Bounds given combined nonexperimental and experimental data
WITH PROBABILITY ONE P(yx | x,y) =1 SOLUTION TO THE ATTRIBUTION PROBLEM (Cont) • From population data to individual case • Combined data tell more that each study alone
OUTLINE • Modeling: Statistical vs. Causal • Causal models and identifiability • Inference to three types of claims: • Effects of potential interventions, • Claims about attribution (responsibility) • Claims about direct and indirect effects