430 likes | 685 Views
Tetrad project. http://www.phil.cmu.edu/projects/tetrad/current.html. Causal Models in the Cognitive Sciences. Two uses. Causal graphical models used in: Practice/methodology of cognitive science Focus on neuroimaging, but lots of other uses Framework for expressing human causal knowledge
E N D
Tetrad project • http://www.phil.cmu.edu/projects/tetrad/current.html
Two uses • Causal graphical models used in: • Practice/methodology of cognitive science • Focus on neuroimaging, but lots of other uses • Framework for expressing human causal knowledge • Are human causal representations “just” these causal graphical models? • Also (but not today): Are other cognitive representations “just” graphical models (perhaps causal, perhaps not)?
Learning from neuroimaging • Given neuroimaging data, what is the causal structure inside the brain? • Ignoring differences in timescale, challenges in inverting the hemodynamic response curve, etc. ??
Learning from neuroimaging • Big challenge: people likely have (slightly) different causal structures in their brains • ⇒ Full dataset is really from a mixed population! • ⇒ “Normal” causal search falls apart • Idea: perhaps the differences are mostly in parameters, not graphs • Note that “no edge” ≡ “parameter = 0”
IMaGES algorithm • Given data from individuals D1, …, Dn, the score for graph G is computed by: • Compute ML estimate of parameters for Di • Use that ML estimate to get BIC for Di • Score for G is the average BIC over all datasets: • Do GES-style search over graphs (i.e., greedy edge addition, then greedy edge removal)
IMaGES application • Standard causal search:IMaGES:
Causal cognition • Causal inference: learning causal structure from a sequence of cases (observations or interventions) • Causal perception: learning causal connections through “direct” perception • Causal reasoning: using prior causal knowledge to predict, explain, control your world
Descriptive theories (in 2000) • Paradigmatic causal inference situation: • A set of binary potential causes: C1, …, Cn • A known binary effect: E • Minimal role for prior beliefs • Observational data about variable values • Possible formats include: sequential, list, or summary
Descriptive theories (in 2000) • Goal of theories: model (mean) “strength ratings” as a function of the observed cases) • Or a series of (mean) ratings • Two theory-types: Dynamical vs. Long-run • Dynamical predict belief change after single cases • Long-run predict stable beliefs after “enough time” • Similar to algorithmic vs. computational distinction
Dynamical theories (in 2000) • Rescorla-Wagner (and variants) • Associative strength for each cue (to the effect) • Causal version: associative strengths are causal • Schematic form of R-W: ΔVi= RateParams × (Outcome – Prediction) • That is, use error-correction to update the associative strengths after each observed case • Variant R-W models explain phenomena such as backwards blocking by changing the prediction function
Long-run theories (in 2000) • In the long-run, causal strength judgments should be proportional to the: • Conditional contrast (Conditional ΔP theory): ΔPC.{X} = P (E |C & X ) – P (E |~C & X ) • Causal strength estimate (Power PC): pC= ΔPC.F / [1 – P (E |~C & F)] • where F is a “focal set” of relevant events
Dynamical & long-run theories • In the long-run, Rescorla-Wagner (and variants) “converges” to conditional ΔP • I.e., R-W is a dynamical version of conditional ΔP • Simple modification of the error-correction equation converges to power PC • Primary debate (in 2000): which family of theories correctly describes causal learning?
… wCn wC1 wB C1 E Cn B Parameter estimation • Connect causal modelsand descriptive theories: • B is a constant background cause • Limited correlations allowed between C1, …, Cn • Additional restriction: • Assume we have: P(E) = f(wC1, …, wCn, wB), or more precisely: P(E | C1, …, Cn) = f(wC1, …, wCn, wB, C1, …, Cn)
Parameter estimation • Essentially every descriptive theory estimates the w-parameters in this causal Bayes net • Different descriptive theories result from different functional forms for P(E) • And all of the research on the descriptive theories implies that people can estimate parameters in this “simple” causal structure
Learning causal structure? • Additional queries: • From a “rational analysis” point-of-view: • Can people learn structure from interventions? • Or from patterns of correlations? • From a “process model” point-of-view: • Is there a psychologically plausible process model of causal graphical model structure learning?
Stick-Ball machine Kushnir, T., Gopnik, A., Schulz, L., & Danks, D. 2003. Inferring hidden causes. Proceedings of the 25th Annual Meeting of the Cognitive Science Society.
Stick-Ball machine Kushnir, T., Gopnik, A., Schulz, L., & Danks, D. 2003. Inferring hidden causes. Proceedings of the 25th Annual Meeting of the Cognitive Science Society.
Stick-Ball machine Kushnir, T., Gopnik, A., Schulz, L., & Danks, D. 2003. Inferring hidden causes. Proceedings of the 25th Annual Meeting of the Cognitive Science Society.
Stick-Ball machine Kushnir, T., Gopnik, A., Schulz, L., & Danks, D. 2003. Inferring hidden causes. Proceedings of the 25th Annual Meeting of the Cognitive Science Society.
Stick-Ball machine Kushnir, T., Gopnik, A., Schulz, L., & Danks, D. 2003. Inferring hidden causes. Proceedings of the 25th Annual Meeting of the Cognitive Science Society.
Experimental conditions • Two conditions with “identical” statistics • Intervention case • A & B move together four times • Intervene on A twice, B doesn’t move • Intervene on B twice, A doesn’t move • Pointing control • A & B move together four times • A moves twice (point at it after), B doesn’t move • B moves twice (point at it after), A doesn’t move Kushnir, T., Gopnik, A., Schulz, L., & Danks, D. 2003. Inferring hidden causes. Proceedings of the 25th Annual Meeting of the Cognitive Science Society.
Experimental logic • For causal models (& close-to-determinism): • Intervention case • Pointing control Kushnir, T., Gopnik, A., Schulz, L., & Danks, D. 2003. Inferring hidden causes. Proceedings of the 25th Annual Meeting of the Cognitive Science Society.
Experimental logic • Non-CGM causal inference theories make no prediction for this case, as there is no cause-effect division • And on plausible variants that do predict, they predict no difference between the conditions
Inference from interventions • Response percentages in each condition: p<.001: each condition is different from chance p<.01: conditions are different from each other Kushnir, T., Gopnik, A., Schulz, L., & Danks, D. 2003. Inferring hidden causes. Proceedings of the 25th Annual Meeting of the Cognitive Science Society.
Other learning from interventions • Learning from interventions • Gopnik, et al. (2004); Griffiths, et al. (2004); Sobel, et al. (2004); Steyvers, et al. (2003) • And many more since 2005 • Planning/predicting your own interventions • Gopnik, et al. (2004); Steyvers, et al. (2003); Waldmann & Hagmayer (2005) • And many more since 2005
Learning from correlations • Lots of evidence that people (and even rats!) can extract causal structure from observed correlations • And those structures are well-modeled as causal graphical models • ⇒ Lots of empirical evidence that we act “as if” we are learning (approx. rationally) causal DAGs
Developing a process model • Process of causal inference is under-studied • To date, very few systematic studies • Ex: Shanks (1995)
Developing a process model • Features of observed data • Slow convergence • Pre-asymptotic “bump” • General considerations • People have memory/computation bounds • Error-correction models (e.g., Rescorla-Wagner; dynamic power PC) work well for simple cases
h h h + 0 – B C B C B C OR AND + + + + – E E E Bayesian structure learning • Three possible causal structures: • Asymptotic prediction: Strength rating (wC) ∝ Computed using Bayesian updating!
Bayesian dynamic learning • When presented with a sequence of data, • After each datapoint, update the structure and parameter probability distributions (in the standard Bayesian manner) • Then use those posteriors as the prior distribution for the next datapoint • Repeat ad infinitum Danks, D., Griffiths, T. L., & Tenenbaum, J. B. 2003. Dynamical causal learning. In Advances in Neural Information Processing Systems 15.
50 0 - 50 5 10 15 20 25 30 35 40 Bayesian dynamic learning • Bayesian learning on the Shanks (1995) data • Assume effects rarely occur without the occurrence of an observed cause Danks, D., Griffiths, T. L., & Tenenbaum, J. B. 2003. Dynamical causal learning. In Advances in Neural Information Processing Systems 15.
50 0 - 50 5 10 15 20 25 30 35 40 Side-by-side comparison Shanks (1995): Bayesian: Danks, D., Griffiths, T. L., & Tenenbaum, J. B. 2003. Dynamical causal learning. In Advances in Neural Information Processing Systems 15.
Bayesian learning as process model • Challenges: • All of the terms in the Bayesian updating equation are quite computationally intensive • Number of hypotheses under consideration, and information needs, grow exponentially with the number of potential causes • No clear way to incorporate inference to unobserved causes
An alternate possibility • Constraint-based structure learning:Given a set of independencies, determine the causal Bayes nets that predict exactly those statistical relationships • Range of algorithms for a range of assumptions • Idea: Use associationist models to make the necessary independence judgments Danks, D. 2004. Constraint-based human causal learning. In Proceedings of the 6th International Conference on Cognitive Modeling (ICCM-2004).
An alternate possibility Wellen, S., & Danks, D. (2012). Learning causal structure through local prediction-error learning. In N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings of the 34th annual conference of the cognitive science society (pp. 2529-2534). Austin, TX: Cognitive Science Society.
Lingering problem • Pos connection for 1st 20 cases, then Neg connection
Lingering problem • Pos connection for 1st 20 cases, then Neg connection
Lingering problem • Pos connection for 1st 20 cases, then Neg connection
Causal inference summary • Very large literature over past 15 years showing that our causal knowledge (from causal inference) is structured like a causal DAG • And we learn (approx.) the right ones from data • But we aren’t quite sure how we do it • And we do appropriate causal reasoning given that causal knowledge • As long as we’re clear about what the knowledge is!
Causal perception • Paradigmatic case: “launching effect” • Similar perceptions/experiences for other causal events (e.g., “exploding”, “dragging”, etc.) • Including social causal events (e.g., “fleeing”)
Causal perception • Driven by fine-grained spatiotemporal details, including broader context
Causal perception vs. inference • Behavioral evidence that they are different • Both in responses & phenomenology • Neuroimaging evidence that they are different • Different brain regions “light up” in the different types of experiments • Theoretical evidence that they are different • “Best models” of the output representations differ