650 likes | 686 Views
The causal matrix: Learning the background knowledge that makes causal learning possible Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL). Collaborators. Tom Griffiths. Noah Goodman. Vikash Mansinghka. Charles Kemp.
E N D
The causal matrix: Learning the background knowledge that makes causal learning possible Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL)
Collaborators Tom Griffiths Noah Goodman Vikash Mansinghka Charles Kemp
Learning causal relations Goal: Computational models that explain how people learn causal relations from data. Structure Data
A Bayesian approach Data d Causal hypotheses h X3 X3 X4 X4 X1 X2 X1 X2 1. What is the most likely network h given observed data d ? 2. How likely is there to be a link X4X2 ? (e.g., Griffiths & Tenenbaum, 2005; Steyvers et al 2003)
What’s missing from this account? • Framework theories or causal schemas: domain-specific constraints on “natural” causal hypotheses • Abstract classes of variables and mechanisms • Causal laws defined over these classes • Causal variables: constituents of causal hypotheses • Which variables are relevant • How variables ground out in perceptual and motor experience • Causal understanding: domain-general properties of causal models • Directionality • Locality (sparsity, minimality) • Intervention
The approach • What we want to understand: How are these different aspects of background knowledge represented, used to support causal learning, and themselves acquired? • Abstract domain-specific frameworks or causal schemas • Causal variables grounded in sensorimotor experience • Domain-general causal understanding • What we need to answer these questions: • Bayesian inference in probabilistic generative models. • Probabilities defined over structured representations: graphs, grammars, predicate logic. • Hierarchical probabilistic models, with inference at multiple levels of abstraction. • Flexible representations, growing in response to observed data.
Outline • Framework theories or causal schemas: domain-specific constraints on “natural” causal hypotheses • Abstract classes of variables and mechanisms • Causal laws defined over these concepts • Causal variables:constituents of causal hypotheses • Which variables are relevant • How variables ground out in perceptual and motor experience • Causal understanding:domain-general properties of causal models • Directionality • Locality (sparsity, minimality) • Intervention
See this? It’s a blicket machine. Blickets make it go. Let’s put this one on the machine. Oooh, it’s a blicket! Causal Machines(Gopnik, Sobel, Schulz et al.)
A B “Backward blocking” (Sobel, Tenenbaum & Gopnik, 2004) • Initially: Nothing on detector – detector silent (A=0, B=0, E=0) • Trial 1: A B on detector – detector active (A=1, B=1, E=1) • Trial 2: A on detector – detector active (A=1, B=0, E=1) • 4-year-olds judge if each object is a blicket A: a blicket (100% say yes) B: probably not a blicket (34% say yes) A Trial AB Trial A B ? ? E
Possible hypotheses? A B A B A B A B A B A B A B A B E E E E E E E E A B A B A B A B A B A B A B A B E E E E E E E E A B A B A B A B A B A B A B A B E E E E E E E E
Bayesian causal learning With a uniform prior on hypotheses, generic parameterization: Probability of being a blicket: A B 0.32 0.32 0.34 0.34
A stronger hypothesis space generated by abstract domain knowledge • Links can only exist from blocks to detectors. • Blocks are blickets with prior probability q. • Blickets always activate detectors, detectors never activate on their own (i.e., deterministic OR parameterization, no hidden causes). P(h00) = (1 – q)2 P(h01) = (1 – q) q P(h10) = q(1 – q) P(h11) = q2 A B A B A B A B E E E E P(E=1 | A=0, B=0): 0 0 0 0 P(E=1 | A=1, B=0): 0 0 1 1 P(E=1 | A=0, B=1): 0 1 0 1 P(E=1 | A=1, B=1): 0 1 1 1
Manipulating prior probability(Tenenbaum, Sobel, Griffiths, & Gopnik) A Trial Initial AB Trial
Inferences from ambiguous data I. Pre-training phase: Blickets are rare . . . . After each trial, adults judge the probability that each object is a blicket. II. Two trials: A B detector, B C detector Trial 2 A B C Trial 1
Same domain theory generates hypothesis space for 3 objects: B B A C A C E E • Hypotheses: h000 = h100 = h010 = h001 = h110 = h011 = h101 = h111 = • Likelihoods: B B A C A C E E B B A C A C E E B B A C A C E E if A = 1 and AE exists, or B = 1 and BE exists, or C = 1 and CE exists, else 0. P(E=1| A, B, C; h) = 1
“Rare” condition: First observe 12 objects on detector, of which 2 set it off.
A B 4-year-olds (w/ Dave Sobel) I. “Backward blocking” “Is this a blicket?” 100% 25% (Rare) 100% 81% (Common) Trial 2 Trial 1 II. Two trials: A B detector, B C detector Trial 2 A B C Trial 1 “Is this a 87% 56% 56% blicket?”
Formalizing framework theories Framework theory Causal structure Event data
Grammar Phrase structure You shot the wumpus. Utterance Formalizing framework theories Framework theory Causal structure Event data
A framework theory for detectors: probabilistic first-order logic
Formalizing framework theories Framework theory Causal structure Event data
Alternative framework theories Classes = {C} Laws = {C C} Classes = {R,D, S} Laws = {R D, D S} Classes = {R, D, S} Laws = {S D}
And rules out others: The abstract theory constrains possible hypotheses: • Allows strong inferences about causal structure • from very limited data. • Very different from conventional Bayes net learning.
Learning with a uniform prior on network structures: True network Sample 75 observations… attributes (1-12) observed data patients
z 1 2 3 4 5 6 7 8 0.8 0.0 0.01 Learning a block-structured prior on network structures: (Mansinghka et al. 2006) h 0.0 0.0 0.75 9 1011 12 0.0 0.0 0.0 True network Sample 75 observations… attributes (1-12) observed data patients
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 True structure of graphical model G: # of samples: 20 80 1000 Graph G edge (G) Data D Classes Z 1 2 3 4 5 6 … 7 8 9 10 11 12 13 14 15 16 … class (z) Abstract Theory c1 … c2 c1 c2 h 0.4 c1 0.0 … c2 0.0 0.0 … edge (G) Graph G Data D (Mansinghka, Kemp, Tenenbaum, Griffiths UAI 06)
Human learning of abstract causal frameworks • Lien & Cheng (2000) • Shanks & Darby (1998) • Tenenbaum & Niyogi (2003) • Schulz, Goodman, Tenenbaum & Jenkins (submitted) • Kemp, Goodman & Tenenbaum (in progress)
G F O W C L A C The causal blocks world(Tenenbaum and Niyogi, 2003)
? x Learning curves ? Model predictions
G F O W C L A C Animal learning of abstract causal frameworks? Framework theory Causal structure Event data
Outline • Framework theories or causal schemas:domain-specific constraints on “natural” causal hypotheses • Abstract classes of variables and mechanisms • Causal laws defined over these concepts • Causal variables: constituents of causal hypotheses • Which variables are relevant • How variables ground out in perceptual and motor experience • Causal understanding:domain-general properties of causal models • Directionality • Locality (sparsity, minimality) • Intervention
The problem ? • Option 1: Variables are innate. • Option 2 (“clusters than causes”): Variables are learned first, independent of causal relations, through a kind of bottom-up perceptual clustering. • Option 3: Variables are learned together with causal relations. A child learns that petting the cat leads to purring, while pounding leads to growling. But what are the origins of these symbolic event concepts (“variables”) over which causal links are defined?
A hierarchical Bayesian framework for learning grounded causal models(Goodman, Mansinghka & Tenenbaum, CogSci 07) Hypotheses: Data: … Time t Time t’
“Alien control panel” experiment Condition A Condition B Condition C
Mean responses vs. model Blue bars: human proportion of responses Red bars: model posterior probability
Outline • Framework theories or causal schemas:domain-specific constraints on “natural” causal hypotheses • Abstract classes of variables and mechanisms • Causal laws defined over these concepts • Causal variables:constituents of causal hypotheses • Which variables are relevant • How variables ground out in perceptual and motor experience • Causal understanding: domain-general properties of causal models • Directionality • Locality (sparsity, minimality) • Intervention
Causal Bayesian networks (BNs + interventions) Bayesian networks: minimal structure fitting conditional dependencies. Correlations Temporally directed associative strenghts Domain-general causal understanding y a b World: x z c Possible alternative models: y a b y a b x z x z c c y a b y a b x z x z c c
W A Domain-general causal understanding W A W A An abstract schema for causal learning in any domain. Essentially equivalent to Pearl- style learning for CBNs. System 1 System 2 System 3 W A System X
W A Some alternatives W A W A V V V … V V V
Some alternatives A A W W A W W A W A W A W A W A W A
V Can a Bayesian learner infer the correct domain-general properties of causality, using data from multiple systems, while simultaneously learning how each system works? W A , , V , W A , W A , W A System 1 System 2 System N ... Sample 1 Sample 2 Sample 1 (Goodman & Tenenbaum) ... ... Sample 3 ...
Summary • What we want to understand: How are different aspects of background knowledge represented, used to support causal learning, and themselves acquired? • Abstract domain-specific frameworks or causal schemas • Causal variables grounded in sensorimotor experience • Domain-general causal understanding • What we need to answer these questions: • Bayesian inference in probabilistic generative models. • Probabilities defined over structured representations: graphs, grammars, predicate logic. • Hierarchical probabilistic models, with inference at multiple levels of abstraction. • Flexible representations, growing in response to observed data.
Insights • Aspects of background knowledge which have been either taken for granted or presumed to be innate could in fact be learned from data by rational inferential means, together with specific causal relations. • Domain-specific frameworks or schemas and domain-general properties of causality could be learned by similar means. • Abstract causal knowledge can in some cases be learned more quickly and more easily than specific concrete causal relations (the “blessing of abstraction”).
M1 p(D = d | M ) M2 All possible data sets d Bayesian Occam’s Razor (MacKay, 2003; Ghahramani tutorials) For any model M, • Law of “conservation of belief”: A model that can predict many possible data sets must assign each of them low probability.
Learning causation from contingencies C present (c+) C absent (c-) e.g., “Does injecting this chemical cause mice to express a certain gene?” a c E present (e+) d b E absent (e-) Subjects judge the extent C to which causes E (rate on a scale from 0 to 100)
Learning more complex structures • Tenenbaum et al., Griffiths & Sobel: detectors with more than two objects and noisy mechanisms • Steyvers et al., Sobel & Kushnir: active learning with interventions (c.f. Tong & Koller, Murphy) • Lagnado & Sloman: learning from interventions on continuous dynamical systems
Inferring hidden causes Common unobserved cause 4 x 2 x 2 x Independent unobserved causes 1 x 2 x 2 x 2 x 2 x One observed cause The “stick ball” machine 2 x 4 x (Kushnir, Schulz, Gopnik, & Danks, 2003)