1 / 41

The Mathematics of Cause and Effect - Causal Modeling, Inference, and Effects of Interventions

This website provides tutorials, lectures, slides, publications, and a blog on the mathematics of causality and causal inference. It covers topics such as causal models, identifiability, and the effects of potential interventions.

Download Presentation

The Mathematics of Cause and Effect - Causal Modeling, Inference, and Effects of Interventions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea) THE MATHEMATICS OF CAUSE AND EFFECT

  2. REFERENCES ON CAUSALITY Home page: Tutorials, Lectures, slides, publications and blog www.cs.ucla.edu/~judea/ Background information and comprehensive treatment, Causality (Cambridge University Press, 2000) General introduction http://bayes.cs.ucla.edu/IJCAI99/ Gentle introductions for empirical scientists ftp://ftp.cs.ucla.edu/pub/stat_ser/r338.pdf ftp://ftp.cs.ucla.edu/pub/stat_ser/Test_pea-final.pdf Direct and Indirect Effects ftp://ftp.cs.ucla.edu/pub/stat_ser/R271.pdf

  3. OUTLINE • Causality: Antiquity to robotics • Modeling: Statistical vs. Causal • Causal Models and Identifiability • Inference to three types of claims: • Effects of potential interventions • Claims about attribution (responsibility) • Claims about direct and indirect effects

  4. ANTIQUITY TO ROBOTICS “I would rather discoverone causal relationthan be King of Persia” Democritus (430-380 BC) Development of Western science is based on two great achievements: the invention of theformal logical system(in Euclidean geometry) by the Greek philosophers, and the discovery of the possibility to find outcausal relationships by systematic experiment(during the Renaissance). A. Einstein, April 23, 1953

  5. THE BASIC PRINCIPLES Causation= encoding of behavior under interventions Interventions= surgeries on mechanisms Mechanisms= stable functional relationships = equations + graphs

  6. P Joint Distribution Q(P) (Aspects of P) Data Inference TRADITIONAL STATISTICAL INFERENCE PARADIGM e.g., Infer whether customers who bought product A would also buy product B. Q = P(B | A)

  7. FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES Probability and statistics deal with static relations P Joint Distribution P Joint Distribution Q(P) (Aspects of P) Data change Inference What happens when P changes? e.g., Infer whether customers who bought product A would still buy Aif we were to double the price.

  8. FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES What remains invariant when Pchanges say, to satisfy P(price=2)=1 P Joint Distribution P Joint Distribution Q(P) (Aspects of P) Data change Inference Note: P(v)P (v | price = 2) P does not tell us how it ought to change e.g. Curing symptoms vs. curing diseases e.g. Analogy: mechanical deformation

  9. Causal and statistical concepts do not mix. CAUSAL Spurious correlation Randomization Confounding / Effect Instrument Holding constant Explanatory variables STATISTICAL Regression Association / Independence “Controlling for” / Conditioning Odd and risk ratios Collapsibility FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES (CONT)

  10. Causal and statistical concepts do not mix. CAUSAL Spurious correlation Randomization Confounding / Effect Instrument Holding constant Explanatory variables STATISTICAL Regression Association / Independence “Controlling for” / Conditioning Odd and risk ratios Collapsibility • No causes in – no causes out (Cartwright, 1989) } statistical assumptions + data causal assumptions  causal conclusions FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES (CONT) • Causal assumptions cannot be expressed in the mathematical language of standard statistics.

  11. Causal and statistical concepts do not mix. CAUSAL Spurious correlation Randomization Confounding / Effect Instrument Holding constant Explanatory variables STATISTICAL Regression Association / Independence “Controlling for” / Conditioning Odd and risk ratios Collapsibility • No causes in – no causes out (Cartwright, 1989) } statistical assumptions + data causal assumptions  causal conclusions • Non-standard mathematics: • Structural equation models (Wright, 1920; Simon, 1960) • Counterfactuals (Neyman-Rubin (Yx), Lewis (xY)) FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES (CONT) • Causal assumptions cannot be expressed in the mathematical language of standard statistics.

  12. FROM STATISTICAL TO CAUSAL ANALYSIS: 2. THE MENTAL BARRIERS • Every exercise of causal analysis must rest on untested, judgmental causal assumptions. • Every exercise of causal analysis must invoke non-standard mathematical notation.

  13. TWO PARADIGMS FOR CAUSAL INFERENCE Observed: P(X, Y, Z,...) Conclusions needed: P(Yx=y), P(Xy=x | Z=z)... How do we connect observables, X,Y,Z,… to counterfactuals Yx, Xz, Zy,… ? N-R model Counterfactuals are primitives, new variables Super-distribution P*(X, Y,…, Yx, Xz,…) X, Y, Z constrain Yx, Zy,… Structural model Counterfactuals are derived quantities Subscripts modify a data-generating model

  14. THE STRUCTURAL MODEL PARADIGM Joint Distribution Data Generating Model Q(M) (Aspects of M) Data Inference M – Oracle for computing answers to Q’s. e.g., Infer whether customers who bought product A would still buy A if we were to double the price.

  15. FAMILIAR CAUSAL MODEL ORACLE FOR MANIPILATION X Y Z INPUT OUTPUT

  16. STRUCTURAL CAUSAL MODELS • Definition: A structural causal model is a 4-tuple • V,U, F, P(u), where • V = {V1,...,Vn} are observable variables • U={U1,...,Um} are background variables • F = {f1,...,fn} are functions determining V, • vi = fi(v, u) • P(u) is a distribution over U • P(u) and F induce a distribution P(v) over observable variables

  17. Joint probabilities of counterfactuals: The super-distribution P* is derived from M. Parsimonious, consistent, and transparent CAUSAL MODELS AND COUNTERFACTUALS Definition: The sentence: “Y would be y (in situation u), had X beenx,” denoted Yx(u) = y, means: The solution for Y in a mutilated model Mx, (i.e., the equations for X replaced by X = x) with input U=u, is equal to y.

  18. APPLICATIONS • . Predicting effects of actions and policies • . Learning causal relationships from • assumptions and data • . Troubleshooting physical systems and plans • . Finding explanations for reported events • . Generating verbal explanations • . Understanding causal talk • . Formulating theories of causal thinking

  19. AXIOMS OF CAUSAL COUNTERFACTUALS Y would bey, hadXbeenx(in stateU = u) • Definiteness • Uniqueness • Effectiveness • Composition • Reversibility

  20. RULES OF CAUSAL CALCULUS • Rule 1:Ignoring observations • P(y |do{x},z, w) = P(y | do{x},w) • Rule 2:Action/observation exchange • P(y |do{x}, do{z}, w) = P(y|do{x},z,w) • Rule 3: Ignoring actions • P(y |do{x},do{z},w) = P(y|do{x},w)

  21. DERIVATION IN CAUSAL CALCULUS Genotype (Unobserved) Smoking Tar Cancer Probability Axioms P (c |do{s})=tP (c | do{s},t) P (t |do{s}) Rule 2 = tP (c |do{s},do{t})P (t |do{s}) Rule 2 = tP (c |do{s},do{t})P (t | s) Rule 3 = tP (c |do{t})P (t | s) Probability Axioms = stP (c |do{t},s) P (s|do{t}) P(t |s) Rule 2 = stP (c | t, s) P (s|do{t})P(t |s) Rule 3 = stP (c | t, s) P (s) P(t |s)

  22. G Gx Moreover, P(y | do(x)) =åP(y | x,z) P(z) (“adjusting” for Z) z THE BACK-DOOR CRITERION Graphical test of identification P(y | do(x)) is identifiable in G if there is a set Z of variables such thatZd-separates X from YinGx. Z1 Z1 Z2 Z2 Z Z3 Z3 Z4 Z5 Z5 Z4 X X Z6 Y Y Z6

  23. RECENT RESULTS ON IDENTIFICATION • do-calculus is complete • Complete graphical criterion for identifying causal effects (Shpitser and Pearl, 2006). • Complete graphical criterion for empirical testability of counterfactuals (Shpitser and Pearl, 2007).

  24. DETERMINING THE CAUSES OF EFFECTS (The Attribution Problem) • Your Honor! My client (Mr. A) died BECAUSE • he used that drug.

  25. DETERMINING THE CAUSES OF EFFECTS (The Attribution Problem) • Your Honor! My client (Mr. A) died BECAUSE • he used that drug. • Court to decide if it is MORE PROBABLE THAN • NOT that A would be alive BUT FOR the drug! • P(? | A is dead, took the drug) > 0.50 PN =

  26. THE PROBLEM • Semantical Problem: • What is the meaning of PN(x,y): • “Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur.”

  27. THE PROBLEM • Semantical Problem: • What is the meaning of PN(x,y): • “Probability that event y would not have occurred if it were not for event x, given that x and ydid in fact occur.” • Answer: • Computable from M

  28. Analytical Problem: • Under what condition can PN(x,y) be learned from statistical data, i.e., observational, experimental and combined. THE PROBLEM • Semantical Problem: • What is the meaning of PN(x,y): • “Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur.”

  29. TYPICAL THEOREMS (Tian and Pearl, 2000) • Bounds given combined nonexperimental and experimental data • Identifiability under monotonicity (Combined data) • corrected Excess-Risk-Ratio

  30. CAN FREQUENCY DATA DECIDE LEGAL RESPONSIBILITY? ExperimentalNonexperimental do(x) do(x)xx Deaths (y) 16 14 2 28 Survivals (y) 984 986 998 972 1,000 1,000 1,000 1,000 • Nonexperimental data: drug usage predicts longer life • Experimental data: drug has negligible effect on survival • Plaintiff: Mr. A is special. • He actually died • He used the drug by choice • Court to decide (given both data): • Is it more probable than not that A would be alive • but for the drug?

  31. WITH PROBABILITY ONE 1 P(yx | x,y)  1 SOLUTION TO THE ATTRIBUTION PROBLEM • Combined data tell more that each study alone

  32. EFFECT DECOMPOSITION • What is the semantics of direct and indirect effects? • What are their policy-making implications? • Can we estimate them from data? Experimental data?

  33. WHY DECOMPOSE EFFECTS? • Direct (or indirect) effect may be more transportable. • Indirect effects may be prevented or controlled. • Direct (or indirect) effect may be forbidden  Pill Pregnancy + + Thrombosis Gender Qualification Hiring

  34. SEMANTICS BECOMES NONTRIVIAL IN NONLINEAR MODELS (even when the model is completely specified) X Z z = f (x, 1) y = g (x, z, 2) Y Dependent on z? Void of operational meaning?

  35. THE OPERATIONAL MEANING OF DIRECT EFFECTS X Z z = f (x, 1) y = g (x, z, 2) Y “Natural” Direct Effect of X on Y: The expected change in Y per unit change of X, when we keep Z constant at whatever value it attains before the change. In linear models, NDE = Controlled Direct Effect

  36. THE OPERATIONAL MEANING OF INDIRECT EFFECTS X Z z = f (x, 1) y = g (x, z, 2) Y “Natural” Indirect Effect of X on Y: The expected change in Y when we keep X constant, say at x0, and let Z change to whatever value it would have under a unit change in X. In linear models, NIE = TE - DE

  37. GENDER QUALIFICATION HIRING POLICY IMPLICATIONS OF INDIRECT EFFECTS indirect What is the direct effect of X on Y? The effect of Gender on Hiring if sex discrimination is eliminated. X Z IGNORE f Y

  38. SEMANTICS AND IDENTIFICATION OF NESTED COUNTERFACTUALS Consider the quantity Given M, P(u), Q is well defined Given u, Zx*(u) is the solution for Z in Mx*,call it z is the solution for Y in Mxz Can Q be estimated from data?

  39. x* z* = Zx*(u) GENERAL PATH-SPECIFIC EFFECTS (Def.) X X W Z W Z Y Y Form a new model, , specific to active subgraph g Definition: g-specific effect Nonidentifiable even in Markovian models

  40. EFFECT DECOMPOSITION SUMMARY • Graphical conditions for estimability from • experimental / nonexperimental data. • Graphical conditions hold in Markovian models • Useful in answering new type of policy questions • involving mechanism blocking instead of variable fixing.

  41. CONCLUSIONS • Structural-model semantics, enriched with logic • and graphs, provides: • Complete formal basis for causal reasoning • Powerful and friendly causal calculus • Lays the foundations for asking more difficult questions: What is an action? What is free will? Should robots be programmed to have this illusion?

More Related