Theory-based causal induction

Theory-based causal induction Tom Griffiths Brown University Josh Tenenbaum MIT

Three kinds of causal induction

Three kinds of causal induction contingency data

C present (c+) C absent (c-) a c E present (e+) d b E absent (e-) “To what extent does C cause E?” (rate on a scale from 0 to 100)

Three kinds of causal induction physical systems contingency data

The stick-ball machine A B (Kushnir, Schulz, Gopnik, & Danks, 2003)

Three kinds of causal induction perceived causality physical systems contingency data

Michotte (1963)

top-down mechanism knowledge bottom-up covariation information object physics module Three kinds of causal induction perceived causality physical systems contingency data

less data more data less constrained more constrained prior knowledge + statistical inference Three kinds of causal induction perceived causality physical systems contingency data

prior knowledge + statistical inference

generates Hypothesis space X X X Y Y Y Y X Z Z Z Z generates Data Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0 ... Bayesian inference Theory-based causal induction Theory

Grammar generates generates Parse trees Hypothesis space X X X Y Y Y Y X Z Z generates Z Z generates Sentence Data The quick brown fox … Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0 ... An analogy to language Theory

Outline perceived causality physical systems contingency data

C present (c+) C absent (c-) a c E present (e+) d b E absent (e-) “To what extent does C cause E?” (rate on a scale from 0 to 100)

Buehner & Cheng (1997) Chemical C present (c+) C absent (c-) 6 4 E present (e+) Gene 4 2 E absent (e-) “To what extent does the chemical cause gene expression?” (rate on a scale from 0 to 100)

Buehner & Cheng (1997) Humans • Showed participants all combinations of P(e+|c+) and P(e+|c-) in increments of 0.25

Buehner & Cheng (1997) Humans • Showed participants all combinations of P(e+|c+) and P(e+|c-) in increments of 0.25 • Curious phenomenon: “frequency illusion”: • why do people’s judgments change when the cause does not change the probability of the effect?

Causal graphical models • Framework for representing, reasoning, and learning about causality (also called Bayes nets) (Pearl, 2000; Spirtes, Glymour, & Schienes, 1993) • Becoming widespread in psychology (Glymour, 2001; Gopnik et al., 2004; Lagnado & Sloman, 2002; Tenenbaum & Griffiths, 2001; Steyvers et al., 2003; Waldmann & Martignon, 1998)

X Y Z Causal graphical models • Variables

Causal graphical models • Variables • Structure X Y Z

Causal graphical models • Variables • Structure • Conditional probabilities P(Y) P(X) X Y Z P(Z|X,Y) Defines probability distribution over variables (for both observation, and intervention)

Causal graphical models • Provide a basic framework for representing causal systems • But… where is the prior knowledge?

chemicals genes Gemfibrozil Phenobarbital Clofibrate Wyeth 14,643 Carnitine Palmitoyl Transferase 1 p450 2B1 Hamadeh et al. (2002) Toxicological sciences.

chemicals genes X Gemfibrozil Phenobarbital Clofibrate Wyeth 14,643 Carnitine Palmitoyl Transferase 1 p450 2B1 Hamadeh et al. (2002) Toxicological sciences.

peroxisome proliferators + + + chemicals genes Chemical X Gemfibrozil Phenobarbital Clofibrate Wyeth 14,643 Carnitine Palmitoyl Transferase 1 p450 2B1 Hamadeh et al. (2002) Toxicological sciences.

Beyond causal graphical models • Prior knowledge produces expectations about: • types of entities • plausible relations • functional form • This cannot be captured by graphical models A theory consists of three interrelated components: a set of phenomena that are in its domain, the causal laws and other explanatory mechanisms in terms of which the phenomena are accounted for, and the concepts in terms of which the phenomena and explanatory apparatus are expressed. (Carey, 1985)

Component of theory: Ontology Plausible relations Functional form Generates: Variables Structure Conditional probabilities Hypotheses are evaluated by Bayesian inference P(h|data)  P(data|h) P(h) Theory-based causal induction A causal theory is a hypothesis space generator

C B E E = 1 if effect occurs (mouse expresses gene), else 0 C = 1 if cause occurs (mouse is injected), else 0 Theory • Ontology • Types: Chemical, Gene, Mouse • Predicates: Injected(Chemical,Mouse) Expressed(Gene,Mouse)

C B C B E E No hypotheses with E C, B C, C B, …. Theory • Plausible relations • For any Chemical c and Gene g, with prior probability p: For all Mice m, Injected(c,m)  Expressed(g,m) B B P(Graph 0) =1 – p P(Graph 1) = p

Theory • Ontology • Types: Chemical, Gene, Mouse • Predicates: Injected(Chemical,Mouse) Expressed(Gene,Mouse) • Plausible relations • For any Chemical c and Gene g, with prior probability p : For all Mice m, Injected(c,m)  Expressed(g,m) • Functional form of causal relations

Generic p00 p10 p01 p11 p0 p0 p1 p1 Functional form C C B B • Structures: 1 = 0 = • Parameterization: E E C B 1: P(E = 1 | C, B) 0: P(E = 1| C, B) 0 0 1 0 0 1 1 1

w0 w0 w1 w0, w1: strength parameters for B, C “Noisy-OR” 0 w1 w0 w1+ w0 – w1 w0 0 0 w0 w0 Functional form C C B B • Structures: 1 = 0 = • Parameterization: E E C B 1: P(E = 1 | C, B) 0: P(E = 1| C, B) 0 0 1 0 0 1 1 1

Theory • Ontology • Types: Chemical, Gene, Mouse • Predicates: Injected(Chemical,Mouse) Expressed(Gene,Mouse) • Constraints on causal relations • For any Chemical c and Gene g, with prior probability p: For all Mice m, Injected(c,m)  Expressed(g,m) • Functional form of causal relations • Causes of Expressed(g,m) are independent probabilistic mechanisms, with causal strengths wi. An independent background cause is always present with strength w0.

C B C B E E Evaluating a causal relationship B B P(Graph 0) =1 – p P(Graph 1) = p P(D|Graph 1) P(Graph 1) P(Graph 1|D) = i P(D|Graph i) P(Graph i)

DP Causal power (Cheng, 1997) c2 Humans Bayesian

Generativity is essential 0/8 0/8 P(e+|c+) 8/8 8/8 6/8 6/8 4/8 4/8 2/8 2/8 • Predictions result from “ceiling effect” • ceiling effects only matter if you believe a cause increases the probability of an effect • follows from use of Noisy-OR (after Cheng, 1997) P(e+|c-) 100 50 0 Bayesian

Noisy-AND-NOT causes decrease probability of their effects Generativity is essential Generic • probability differs across conditions Noisy-OR • causes increase probability of their effects

Generativity is essential Humans Noisy-OR Generic Noisy AND-NOT

Noisy-AND-NOT causes decrease probability of their effects appropriate for preventive causes Manipulating functional form Generic • probability differs across conditions • appropriate for assessing differences Noisy-OR • causes increase probability of their effects • appropriate for generative causes

Manipulating functional form Generative Difference Preventive Noisy AND-NOT Generic Noisy-OR

Causal induction from contingency data • The simplest case of causal learning: a single cause-effect relationship and plentiful data • Nonetheless, exhibits complex effects of prior knowledge (in the assumed functional form) • These effects reflect appropriate causal theories

Outline perceived causality physical systems contingency data

The stick-ball machine A B (Kushnir, Schulz, Gopnik, & Danks, 2003)

separate causes common cause A causes B B causes A Inferring hidden causal structure • Can people accurately infer hidden causal structure from small amounts of data? • Kushnir et al. (2003): four kinds of structure

separate causes common cause A causes B B causes A Inferring hidden causal structure Common unobserved cause 4 x 2 x 2 x (Kushnir, Schulz, Gopnik, & Danks, 2003)

separate causes common cause A causes B B causes A Inferring hidden causal structure Common unobserved cause 4 x 2 x 2 x Independent unobserved causes 1 x 2 x 2 x 2 x 2 x (Kushnir, Schulz, Gopnik, & Danks, 2003)

separate causes common cause A causes B B causes A Inferring hidden causal structure Common unobserved cause 4 x 2 x 2 x Independent unobserved causes 1 x 2 x 2 x 2 x 2 x One observed cause 2 x 4 x (Kushnir, Schulz, Gopnik, & Danks, 2003)

Theory-based causal induction