1.58k likes | 1.8k Views
Bayesian Models of Human Learning and Inference Josh Tenenbaum MIT Department of Brain and Cognitive Sciences. Shiffrin Says. “Progress in science is driven by new tools, not great insights.”. Outline. Part I. Brief survey of Bayesian modeling in cognitive science.
E N D
Bayesian Models of Human Learning and Inference Josh TenenbaumMITDepartment of Brain and Cognitive Sciences
Shiffrin Says “Progress in science is driven by new tools, not great insights.”
Outline • Part I. Brief survey of Bayesian modeling in cognitive science. • Part II. Bayesian models of everyday inductive leaps.
Collaborators Tom Griffiths Neville Sanjana Charles Kemp Mark Steyvers Tevye Krynski Sean Stromsten Sourabh Niyogi Fei Xu Dave Sobel Wheeler Ruml Alison Gopnik
Collaborators Tom Griffiths Neville Sanjana Charles Kemp Mark Steyvers Tevye KrynskiSean Stromsten Sourabh Niyogi Fei XuDave Sobel Wheeler Ruml Alison Gopnik
Outline • Part I. Brief survey of Bayesian modeling in cognitive science. • Rational benchmark for descriptive models of probability judgment. • Rational analysis of cognition • Rational tools for fitting cognitive models
Normative benchmark for descriptive models • How does human probability judgment compare to the Bayesian ideal? • Peterson & Beach, Edwards, Tversky & Kahneman, . . . . • Explicit probability judgment tasks • Drawing balls from an urn, rolling dice, medical diagnosis, . . . . • Alternative descriptive models • Heuristics and Biases, Support Theory, . . . .
Rational analysis of cognition • Develop Bayesian models for core aspects of cognition not traditionally thought of in terms of statistical inference. • Examples: • Memory retrieval: Anderson; Shiffrin et al, . . . . • Reasoning with rules: Oaksford & Chater, . . . .
Rational analysis of cognition • Often can explain a wider range of phenomena than previous models, with fewer free parameters. Spacing effects on retention Power laws of practice and retention
Rational analysis of cognition • Often can explain a wider range of phenomena than previous models, with fewer free parameters. • Anderson’s rational analysis of memory: • For each item in memory, estimate the probability that it will be useful in the present context. • Model of need probability inspired by library book access. Corresponds to statistics of natural information sources:
Rational analysis of cognition • Often can explain a wider range of phenomena than previous models, with fewer free parameters. • Anderson’s rational analysis of memory: • For each item in memory, estimate the probability that it will be useful in the present context. • Model of need probability inspired by library book access. Corresponds to statistics of natural information sources: Short lag Long lag Log need odds Log days since last occurrence
Rational analysis of cognition • Often can show that apparently irrational behavior is actually rational. Which cards do you have to turn over to test this rule? “If there is an A on one side, then there is a 2 on the other side”
Rational analysis of cognition • Often can show that apparently irrational behavior is actually rational. • Oaksford & Chater’s rational analysis: • Optimal data selection based on maximizing expected information gain. • Test the rule “If p, then q” against the null hypothesis that p and q are independent. • Assuming p and q are rare predicts people’s choices:
Rational tools for fitting cognitive models • Use Bayesian Occam’s Razor to solve the problem of model selection: trade off fit to the data with model complexity. • Examples: • Comparing alternative cognitive models: Myung, Pitt, . . . . • Fitting nested families of models of mental representation: Lee, Navarro, . . . .
Rational tools for fitting cognitive models • Comparing alternative cognitive models via an MDL approximation to the Bayesian Occam’s Razor takes into account the functional form of a model as well as the number of free parameters.
Rational tools for fitting cognitive models • Fit models of mental representation to similarity data, e.g. additive clustering, additive trees, common and distinctive feature models. • Want to choose the complexity of the model (number of features, depth of tree) in a principled way, and search efficiently through the space of nested models. Using Bayesian Occam’s Razor:
Outline • Part I. Brief survey of Bayesian modeling in cognitive science. • Part II. Bayesian models of everyday inductive leaps. Rational models of cognition where Bayesian model selection, Bayesian Occam’s Razor play central explanatory role.
Everyday inductive leaps How can we learn so much about . . . • Properties of natural kinds • Meanings of words • Future outcomes of a dynamic process • Hidden causal properties of an object • Causes of a person’s action (beliefs, goals) • Causal laws governing a domain . . . from such limited data?
“tufa” “tufa” “tufa” Learning concepts and words Can you pick out the tufas?
Cows can get Hick’s disease. Gorillas can get Hick’s disease. All mammals can get Hick’s disease. Inductive reasoning Input: (premises) (conclusion) Task: Judge how likely conclusion is to be true, given that premises are true.
Inferring causal relations Input: Took vitamin B23 Headache Day 1 yes no Day 2 yes yes Day 3 no yes Day 4 yes no . . . . . . . . . Does vitamin B23 cause headaches? Task: Judge probability of a causal link given several joint observations.
The Challenge • How do we generalize successfully from very limited data? • Just one or a few examples • Often only positive examples • Philosophy: • Induction is a “problem”, a “riddle”, a “paradox”, a “scandal”, or a “myth”. • Machine learning and statistics: • Focus on generalization from many examples, both positive and negative.
Likelihood Prior probability Posterior probability Rational statistical inference(Bayes, Laplace)
History of Bayesian Approaches to Human Inductive Learning • Hunt
History of Bayesian Approaches to Human Inductive Learning • Hunt • Suppes • “Observable changes of hypotheses under positive reinforcement”, Science (1965), w/ M. Schlag-Rey. “A tentative interpretation is that, when the set of hypotheses is large, the subject ‘samples’ or attends to several hypotheses simultaneously. . . . It is also conceivable that a subject might sample spontaneously, at any time, or under stimulations other than those planned by the experimenter. A more detailed exploration of these ideas, including a test of Bayesian approaches to information processing, is now being made.”
History of Bayesian Approaches to Human Inductive Learning • Hunt • Suppes • Shepard • Analysis of one-shot stimulus generalization, to explain the universal exponential law. • Anderson • Rational analysis of categorization.
Theory-Based Bayesian Models • Explain the success of everyday inductive leaps based on rational statistical inference mechanisms constrained by domain theories well-matched to the structure of the world. • Rational statistical inference (Bayes): • Domain theories generate the necessary ingredients: hypothesis space H, priors p(h).
Questions about theories • What is a theory? • Working definition: an ontology and a system of abstract (causal) principles that generates a hypothesis space of candidate world structures (e.g., Newton’s laws). • How is a theory used to learn about the structure of the world? • How is a theory acquired? • Probabilistic generative model statistical learning.
Alternative approaches to inductive generalization • Associative learning • Connectionist networks • Similarity to examples • Toolkit of simple heuristics • Constraint satisfaction
Marr’s Three Levels of Analysis • Computation: “What is the goal of the computation, why is it appropriate, and what is the logic of the strategy by which it can be carried out?” • Representation and algorithm: Cognitive psychology • Implementation: Neurobiology
Descriptive Goals • Principled mathematical models, with a minimum of arbitrary assumptions. • Close quantitative fits to behavioral data. • Unified models of cognition across domains.
Explanatory Goals • How do we reliably acquire knowledge about the structure of the world, from such limited experience? • Which processing models work, and why? • New views on classic questions in cognitive science: • Symbols (rules, logic, hierarchies, relations) versus Statistics. • Theory-based inference versus Similarity-based inference. • Domain-specific knowledge versus Domain-general mechanisms. • Provides a route to studying people’s hidden (implicit or unconscious) knowledge about the world.
The plan • Basic causal learning • Inferring number concepts • Reasoning with biological properties • Acquisition of domain theories • Intuitive biology: Taxonomic structure • Intuitive physics: Causal law
The plan • Basic causal learning • Inferring number concepts • Reasoning with biological properties • Acquisition of domain theories • Intuitive biology: Taxonomic structure • Intuitive physics: Causal law
Injected with X Not injected with X Expressed Y 30 45 Did not express Y 30 15 Learning a single causal relation Given a random sample of mice: • “To what extent does chemical X cause gene Y • to be expressed?” • Or, “What is the probability that X causes Y?”
c- (not injected with X) c+ (injected with X) e+ (expressed Y) c a e- (did not express Y) d b Associative models of causal strength judgment • Delta-P (or Asymptotic Rescorla-Wagner): • Power PC (Cheng, 1997):
DP = 0.75 DP = 0.25 DP = 0.5 DP = 1 DP = 0 Some behavioral data Buehner & Cheng, 1997 People DP Power PC • Independent effects of both causal power and DP. • Neither theory explains the trend for DP=0.
C B w0 w1 C B E w0 E Bayesian causal inference • Hypotheses: h1 = h0 = w0, w1: strength parameters for B, C
C B w0 w1 C B E w0 E Bayesian causal inference • Hypotheses: h1 = h0 = • Probabilistic model: “noisy-OR” w0, w1: strength parameters for B, C C B h1: h0: 0 0 1 0 0 1 1 1 0 w1 w0 w1+ w0 – w1 w0 0 0 w0 w0
C B w0 w1 C B E w0 E Bayesian causal inference B B • Hypotheses: h1 = h0 = • Probabilistic model: “noisy-OR” Background cause B unobserved, always present (B=1) w0, w1: strength parameters for B, C C B h1: h0: 0 0 1 0 0 1 1 1 0 w1 w0 w1+ w0 – w1 w0 0 0 w0 w0
C B w0 w1 C B E w0 E Inferring structure versus estimating strength B B • Hypotheses: h1 = h0 = • Both causal power and DP correspond to maximum likelihood estimates of the strength parameter w1, under different parameterizations for p(E|B,C): • linear DP, Noisy-OR causal power • Causal support model: people are judging the probability that a causal link exists, rather than assuming it exists and estimating its strength.
Role of domain theory (c.f. PRMs, ILP, Knowledge-based model construction) Generates hypothesis space of causal graphical models: • Causally relevant attributes of objects: • Constrains random variables (nodes). • Causally relevant relations between attributes: • Constrains dependence structure of variables (arcs). • Causal mechanisms – how effects depend functionally on their causes: • Constrains local probability distribution for each variable conditioned on its direct causes (parents).
Role of domain theory • Injections may or may not cause gene expression, but gene expression does not cause injections. • No hypotheses with E C • Other naturally occurring processes may also cause gene expression. • All hypotheses include an always-present background cause B C • Causes are probabilistically sufficient and independent (Cheng): Each cause independently produces the effect in some proportion of cases. • “Noisy-OR” causal mechanism
C B w0 w1 C B E w0 E B B • Hypotheses: h1 = h0 = • Bayesian causal inference: noisy-OR Assume all priors uniform....
C B C B w0 E E w0 w1 increasing DP Bayesian Occam’s Razor P( data | model ) low w1 high w1 All possible data sets
C B C B w0 E E w0 w1 Bayesian Occam’s Razor P( data | model ) low w1 high w1
C B C B w0 E E w0 w1 Bayesian Occam’s Razor P( data | model ) low w1 high w1
DP = 0.75 DP = 0.25 DP = 0.5 DP = 1 DP = 0 Buehner & Cheng, 1997 People DP Power PC Bayes