CAUSAL REASONING FOR DECISION AIDING SYSTEMS COGNITIVE SYSTEMS LABORATORY UCLA

CAUSAL REASONING FOR DECISION AIDING SYSTEMS COGNITIVE SYSTEMS LABORATORY UCLA Judea Pearl, Mark Hopkins, Blai Bonet, Chen Avin, Ilya Shpitser

PRESENTATIONS Judea Pearl Robustness of Causal Claims Ilya Shpitser and Chen Avin Experimental Testability of Counterfactuals Blai Bonet Logic-based Inference on Bayes Networks Mark Hopkins Inference using Instantiations Chen Avin Inference in Sensor Networks Blai Bonet Report from Probabilistic Planning Competition

Probability and statistics deal with static relations Statistics Probability inferences from passive observations joint distribution Data Causal analysis deals with changes (dynamics) • Effects of • interventions Data Causal Model • Causes of • effects Causal assumptions • Explanations Experiments FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES

TYPICAL CAUSAL MODEL X Y Z INPUT OUTPUT

TYPICAL CLAIMS • Effects of potential interventions, • Claims about attribution (responsibility) • Claims about direct and indirect effects • Claims about explanations

ROBUSTNESS: MOTIVATION a In linear systems: y = ax + e a is non-identifiable. Genetic Factors (unobserved) u x y Smoking Cancer The effect of smoking on cancer is, in general, non-identifiable (from observational studies).

ROBUSTNESS: MOTIVATION Z Price of Cigarettes b x a is identifiable Genetic Factors (unobserved) u a y Smoking Cancer Z – Instrumental variable; cov(z,u) = 0

ROBUSTNESS: MOTIVATION Z Genetic Factors (unobserved) u Price of Cigarettes b a y x Smoking Cancer Problem with Instrumental Variables: The model may be wrong!

ROBUSTNESS: MOTIVATION g Z2 Peer Pressure Solution: Invoke several instruments Surprise:a1 = a2 model is likely correct Z1 Genetic Factors (unobserved) u Price of Cigarettes b a y x Smoking Cancer

ROBUSTNESS: MOTIVATION Z3 Anti-smoking Legislation Zn Z1 Genetic Factors (unobserved) u Price of Cigarettes b a g Z2 y x Peer Pressure Smoking Cancer Greater surprise:a1 = a2 = a3….= an = q Claim a = q is highly likely to be correct

ROBUSTNESS: MOTIVATION s Symptom Genetic Factors (unobserved) u a x y Smoking Cancer • Symptoms do not act as instruments • remains non-identifiable Why? Taking a noisy measurement (s) of an observed variable (y) cannot add new information

ROBUSTNESS: MOTIVATION Sn S2 S1 y Symptom Genetic Factors (unobserved) u a x Smoking Cancer • Adding many symptoms does not help. • remains non-identifiable

ROBUSTNESS: MOTIVATION Given a parameter a in a general graph a y x Find if a can evoke an equality surprise a1 = a2 = …an associated with several independent estimands of a Formulate: Surprise, over-identification, independence Robustness: The degree to which a is robust to violations of model assumptions

ROBUSTNESS: FORMULATION Bad attempt: Parameter a is robust (over identifies) f1, f2: Two distinct functions if:

ROBUSTNESS: FORMULATION (b) (c) constraint: ex ey ez x = ex y = bx + ey z = cy + ez b c x y z Ryx = b Rzx = bc Rzy = c y→ zirrelvant to derivation ofb

RELEVANCE: FORMULATION Definition 8Let A be an assumption embodied in model M, and p a parameter in M. A is said to be relevant to p if and only if there exists a set of assumptions S in M such that S and A sustain the identification of p but S alone does not sustain such identification. Theorem 2 An assumption A is relevant to p if and only if A is a member of a minimal set of assumptions sufficient for identifying p.

ROBUSTNESS: FORMULATION Definition 5 (Degree of over-identification) A parameter p (of model M) is identified to degree k (read: k-identified) if there are k minimal sets of assumptions each yielding a distinct estimand of p.

ROBUSTNESS: FORMULATION b c x y z Minimal assumption sets for c. c c c x x z x z z y y y G3 G1 G2 b Minimal assumption sets for b. x z y

FROM MINIMAL ASSUMPTION SETS TO MAXIMAL EDGE SUPERGRAPHS FROM PARAMETERS TO CLAIMS x z e.g., Claim: (Total effect) TE(x,z) = q y x x z z y y TE(x,z)= RzxTE(x,z) = Rzx Rzy ·x Definition A claim C is identified to degree k in model M (graph G), if there are k edge supergraphs of G that permit the identification of C, each yielding a distinct estimand.

FROM MINIMAL ASSUMPTION SETS TO MAXIMAL EDGE SUPERGRAPHS FROM PARAMETERS TO CLAIMS x z e.g., Claim: (Total effect) TE(x,z) = q y Nonparametric Definition A claim C is identified to degree k in model M (graph G), if there are k edge supergraphs of G that permit the identification of C, each yielding a distinct estimand. x x z z y y

CONCLUSIONS • Formal definition to ROBUSTNESS of causal claims. • Graphical criteria and algorithms for computing the degree of robustness of a given causal claim.

CAUSAL REASONING FOR DECISION AIDING SYSTEMS COGNITIVE SYSTEMS LABORATORY UCLA