850 likes | 871 Views
THE MATHEMATICS OF CAUSE AND EFFECT: Thinking Nature and Talking Counterfactuals Judea Pearl Departments of Computer Science and Statistics UCLA. OUTLINE. From Turing test to Bayes networks From Bayes networks to do -calculus From messy science to counterfactuals
E N D
THE MATHEMATICS OF CAUSE AND EFFECT: Thinking Nature and Talking Counterfactuals Judea Pearl Departments of Computer Science and Statistics UCLA
OUTLINE • From Turing test to Bayes networks • From Bayes networks to do-calculus • From messy science to counterfactuals • From counterfactuals to practical victories • policy evaluation • attribution • mediation • generalizability – extend validity • new thrills – missing data
CAN MACHINE THINK? Alan M. Turing (1912 – 1954) • The Turing Test • “Computing Machinery • and Intelligence” (1950) • Turing: Yes, if it acts like it thinks. • Acts = It answers non-trivial questions • about a story, a topic or a situation?
HOW TURING ENVISIONED THE TEST CONVERSATION • Please write me a sonnet on the subject of the Forth Bridge. • Count me out on this one. I never could write poetry. • Add 34,957 and 70,764. • (Pause about 30 seconds and then give an answer) 105,721. • Do you play chess? • Yes. • I have K at my K1, and no other pieces. You have only K at K6 and R at R1. It is your move. What do you play? • (After a pause of 15 seconds) R-R8: mate!
A “MINI” TURING TEST IN CAUSAL CONVERSATION The Story Input: Story Question: What is? What if? Why? Answers: I believe that... Image adapted from Saygin, 2000. Q1: If the season is dry, and the pavement is slipperydid it rain? A1: Unlikely, it is more likely the sprinkler was ON.with a very slight possibility that it is not even wet.
A “MINI” TURING TEST IN CAUSAL CONVERSATION The Story Image adapted from Saygin, 2000. Q2: But what if we SEE that the sprinkler is OFF? A2: Then it is more likely that it rained.
A “MINI” TURING TEST IN CAUSAL CONVERSATION The Story Image adapted from Saygin, 2000. Q3: Do you mean that if we actually turn the sprinkler ON, the rain will be less likely? A3: No, the likelihood of rain would remain the same but the pavement will surely get wet.
A “MINI” TURING TEST IN CAUSAL CONVERSATION The Story Image adapted from Saygin, 2000. Q4: Supposewe SEEthat the sprinkler is ON and the pavement wet.What ifthe sprinklerwereOFF? A4:The pavement would be dry,becausethe season is likely dry.
WHAT’S IN SEARLE’S RULE BOOK? Searle's oversight:there are not enough molecules in the universe to make the book. Even for the sprinkler example. Why causal conversation.
IS PARSIMONY NECESSARY (SUFFICIENT) FOR UNDERSTANDING? Understanding requires translating world constraintsinto a grammar (contraints over symbol strings) and harnessing it to answer queries swiftly and reliably. Parsimony can only be achieved by exploiting the constraints in the world to beat the combinatorial explosion.
THE PLURALITY OF MINI TURING TESTS Poetry Turing Test Medicine Chess Stock market Causal Reasoning . Data-intensive Scientific applications . . . Human Cognition and Ethics Thousands of Hungry and aimless customers Robotics Scientific thinking
THE PLURALITY OF MINI TURING TESTS Poetry Turing Test Medicine Chess Stock market Causal Reasoning . . . . Human Cognition and Ethics
Causal Explanation “She handed me the fruit and I ate” “The serpent deceived me, and I ate”
COUNTERFACTUALS AND OUR SENSE OF JUSTICE Abraham: Are you about to smite the righteous with the wicked? What if there were fifty righteous men in the city? And the Lord said, “If I find in the city of Sodom fifty good men, I will pardon the whole place for their sake.” Genesis 18:26
THE PLURALITY OF MINI TURING TESTS Poetry Turing Test Medicine Chess Stock market Causal Reasoning . . . . Human Cognition and Ethics Scientific thinking
X = 1 Y = 2 The solution WHY PHYSICS IS COUNTERFACTUAL Scientific Equations (e.g., Hooke’s Law) are non-algebraic e.g., Length (Y) equals a constant (2) times the weight (X) Correct notation: Y := 2X X = 3 X = ½ Y Y = X+1 Y = 2X X = 3 X = 1 Alternative Process information Had X been 3, Y would be 6. If we raise X to 3, Y would be 6. Must “wipe out”X = 1.
X = 1 Y = 2 The solution WHY PHYSICS IS COUNTERFACTUAL Scientific Equations (e.g., Hooke’s Law) are non-algebraic e.g., Length (Y) equals a constant (2) times the weight (X) Correct notation: (or) Y 2X X = 3 X = ½ Y Y = X+1 X = 3 X = 1 Alternative Process information Had X been 3, Y would be 6. If we raise X to 3, Y would be 6. Must “wipe out”X = 1.
THE PLURALITY OF MINI TURING TESTS Poetry Turing Test Medicine Chess Stock market Causal Reasoning . . . . Human Cognition and Ethics Robotics Scientific thinking
CAUSATION AS A PROGRAMMER'S NIGHTMARE • Input: • “If the grass is wet, then it rained” • “if we break this bottle, the grass • will get wet” • Output: • “If we break this bottle, then it rained”
WHAT KIND OF QUESTIONS SHOULD THE ROBOT ANSWER? • Observational Questions: • “What if we see A” • Action Questions: • “What if we do A?” • Counterfactuals Questions: • “What if we did things differently?” • Options: • “With what probability?” P(y | A) (What is?) (What if?) P(y | do(A) (Why?) P(yA’ | A) THE CAUSAL HIERARCHY - SYNTACTIC DISTINCTION
THE PLURALITY OF MINI TURING TESTS Poetry Turing Test Medicine Chess Stock market Causal Reasoning . Data-intensive Scientific applications . . . Human Cognition and Ethics Thousands of Hungry and aimless customers Robotics Scientific thinking
e.g., STRUCTURAL CAUSAL MODELS: THE WORLD AS A COLLECTION OF SPRINGS • Definition: A structural causal model is a 4-tuple • <V,U, F, P(u)>, where • V = {V1,...,Vn} are endogenous variables • U={U1,...,Um} are background variables • F = {f1,...,fn} are functions determining V, • vi = fi(v, u) • P(u) is a distribution over U • P(u) and F induce a distribution P(v) over observable variables
P Joint Distribution Q(P) (Aspects of P) Data Inference TRADITIONAL STATISTICAL INFERENCE PARADIGM e.g., Infer whether customers who bought product A would also buy product B. Q = P(B | A)
THE STRUCTURAL MODEL PARADIGM Joint Distribution Data Generating Model Q(M) (Aspects of M) Data M Inference • M – Invariant strategy (mechanism, recipe, law, protocol) by which Nature assigns values to variables in the analysis. “A painful de-crowning of a beloved oracle!”
The Fundamental Equation of Counterfactuals: COUNTERFACTUALS ARE EMBARRASINGLY SIMPLE Definition: The sentence: “Y would be y (in situation u), had X beenx,” denoted Yx(u) = y, means: The solution for Y in a mutilated model Mx, (i.e., the equations for X replaced by X = x) with input U=u, is equal to y.
READING COUNTERFACTUALS FROM SEM 2.30 Data shows: A student named Joe, measured X = 0.5, Z = 1.0, Y = 1.9 Q1: What would Joe’s score be had he doubled his study time? Answer:
THE TWO FUNDAMENTAL LAWS OF CAUSAL INFERENCE The Law of Counterfactuals (Mgenerates and evaluates all counterfactuals.) The Law of Conditional Independence (d-separation) (Separation in the model ⇒ independence in the distribution.)
U3 U2 U4 S C U1 THE LAW OF CONDITIONAL INDEPENDENCE C (Climate) S (Sprinkler) R (Rain) W (Wetness) Each function summarizes millions of micro processes.
U3 U2 U4 S C U1 THE LAW OF CONDITIONAL INDEPENDENCE C (Climate) S (Sprinkler) R (Rain) W (Wetness) Each function summarizes millions of micro processes. Still, if the U 's are independent, the observed distribution P(C,R,S,W) must satisfy certain constraints that are:(1) independent of the f‘s and of P(U) and (2) can be read from the structure of the graph.
D-SEPARATION: NATURE’S LANGUAGE FOR COMMUNICATING ITS STRUCTURE e.g., CW | (S,R) SR | C C (Climate) S (Sprinkler) R (Rain) W (Wetness) Every missing arrow advertises an independency, conditional on a separating set. • Applications • Structure learning • Model testing • Reducing "what if I do" questions to symbolic calculus • Reducing scientific questions to symbolic calculus
SEEING VS. DOING Effect of turning the sprinkler ON
THE FIVE NECESSARY STEPS FOR CAUSAL INFERENCE Define: Assume: Identify: Estimate: Test: Express the target quantity Q as a property of the model M. Express causal assumptions in structural or graphical form. Determine if Q is identifiable. Estimate Q if it is identifiable; approximate it, if it is not. If M has testable implications
THE FIVE NECESSARY STEPS FOR EFFECT ESTIMATION Define: Assume: Identify: Estimate: Test: Express the target quantity Q as a property of the model M. Express causal assumptions in structural or graphical form. Determine if Q is identifiable. Estimate Q if it is identifiable; approximate it, if it is not. If M has testable implications
THE FIVE NECESSARY STEPS FOR AVERAGE TREATMENT EFFECT Define: Assume: Identify: Estimate: Test: Express the target quantity Q as a property of the model M. Express causal assumptions in structural or graphical form. Determine if Q is identifiable. Estimate Q if it is identifiable; approximate it, if it is not. If M has testable implications
THE FIVE NECESSARY STEPS FOR DYNAMICPOLICY ANALYSIS Define: Assume: Identify: Estimate: Test: Express the target quantity Q as a property of the model M. Express causal assumptions in structural or graphical form. Determine if Q is identifiable. Estimate Q if it is identifiable; approximate it, if it is not. If M has testable implications
THE FIVE NECESSARY STEPS FOR TIME VARYING POLICY ANALYSIS Define: Assume: Identify: Estimate: Test: Express the target quantity Q as a property of the model M. Express causal assumptions in structural or graphical form. Determine if Q is identifiable. Estimate Q if it is identifiable; approximate it, if it is not. If M has testable implications
THE FIVE NECESSARY STEPS FOR TREATMENT ON TREATED Define: Assume: Identify: Estimate: Test: Express the target quantity Q a property of the model M. Express causal assumptions in structural or graphical form. Determine if Q is identifiable. Estimate Q if it is identifiable; approximate it, if it is not. If M has testable implications
THE FIVE NECESSARY STEPS FOR INDIRECT EFFECTS Define: Assume: Identify: Estimate: Test: Express the target quantity Q a property of the model M. Express causal assumptions in structural or graphical form. Determine if Q is identifiable. Estimate Q if it is identifiable; approximate it, if it is not. If M has testable implications
THE FIVE NECESSARY STEPS FROM DEFINITION TO ASSUMPTIONS Define: Assume: Identify: Estimate: Test: Express the target quantity Q as a property of the model M. Express causal assumptions in structural or graphical form. Determine if Q is identifiable. Estimate Q if it is identifiable; approximate it, if it is not. If M has testable implications
CAUSAL MODEL (MA) A - CAUSAL ASSUMPTIONS A* - Logical implications of A Causal inference Q Queries of interest Q(P) - Identified estimands T(MA) - Testable implications Statistical inference Data (D) Q - Estimates of Q(P) Goodness of fit Provisional claims Model testing THE LOGIC OF CAUSAL ANALYSIS
THE MACHINERY OF CAUSAL CALCULUS • Rule 1:Ignoring observations • P(y |do{x},z, w) = P(y |do{x},w) • Rule 2:Action/observation exchange • P(y |do{x},do{z}, w) = P(y |do{x},z,w) • Rule 3: Ignoring actions • P(y |do{x},do{z},w) = P(y |do{x},w) Completeness Theorem (Shpitser, 2006)
DERIVATION IN CAUSAL CALCULUS Genotype (Unobserved) Smoking Tar Cancer Probability Axioms Rule 2 Rule 2 Rule 3 Probability Axioms Rule 2 Rule 3
No, no! EFFECT OF WARM-UP ON INJURY (After Shrier & Platt, 2008)
DETERMINING CAUSES OF EFFECTS A COUNTERFACTUAL VICTORY • Your Honor! My client (Mr. A) died BECAUSE • he used that drug. • Court to decide if it is MORE PROBABLE THAN • NOT that A would be alive BUT FOR the drug! • P(? | A is dead, took the drug) > 0.50 PN =
THE ATTRIBUTION PROBLEM • Definition: • What is the meaning of PN(x,y): • “Probability that event y would not have occurred if it were not for event x, given that x and ydid in fact occur.” • Answer: • Computable from M
Identification: • Under what condition can PN(x,y) be learned from statistical data, i.e., observational, experimental and combined. THE ATTRIBUTION PROBLEM • Definition: • What is the meaning of PN(x,y): • “Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur.”
ATTRIBUTION MATHEMATIZED (Tian and Pearl, 2000) • Bounds given combined nonexperimental and experimental data (P(y,x), P(yx), for all y and x) • Identifiability under monotonicity (Combined data)
CAN FREQUENCY DATA DECIDE LEGAL RESPONSIBILITY? ExperimentalNonexperimental do(x) do(x’)xx’ Deaths (y) 16 14 2 28 Survivals (y’) 984 986 998 972 1,000 1,000 1,000 1,000 • Nonexperimental data: drug usage predicts longer life • Experimental data: drug has negligible effect on survival • Plaintiff: Mr. A is special. • He actually died • He used the drug by choice • Court to decide (given both data): • Is it more probable than not that A would be alive • but for the drug?