1 / 33

Nonparametric Bayes and human cognition

Nonparametric Bayes and human cognition. Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley. Statistics about the mind. hypothesis. data. Analyzing psychological data. Dirichlet process mixture models for capturing individual differences

kaemon
Download Presentation

Nonparametric Bayes and human cognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley

  2. Statistics about the mind hypothesis data

  3. Analyzing psychological data • Dirichlet process mixture models for capturing individual differences (Navarro, Griffiths, Steyvers, & Lee, 2006) • Infinite latent feature models… • …for features influencing similarity (Navarro & Griffiths, 2007; 2008) • …for features influencing decisions ()

  4. Statistics about the mind hypothesis data Statistics in the mind hypothesis data

  5. Flexible mental representations • Dirichlet

  6. ? dog cat Categorization How do people represent categories?

  7. Prototype Prototypes cat cat cat cat cat (Posner & Keele, 1968; Reed, 1972)

  8. Exemplars cat cat cat Store every instance (exemplar) in memory cat cat (Medin & Schaffer, 1978; Nosofsky, 1986)

  9. Something in between cat cat cat cat cat (Love et al., 2004; Vanpaemel et al., 2005)

  10. A computational problem • Categorization is a classic inductive problem • data: stimulus x • hypotheses: category c • We can apply Bayes’ rule: and choose c such that P(c|x) is maximized

  11. Density estimation • We need to estimate some probability distributions • what is P(c)? • what is p(x|c)? • Two approaches: • parametric • nonparametric • These approaches correspond to prototype and exemplar models respectively (Ashby & Alfonso-Reese, 1995)

  12. Parametric density estimation Assume that p(x|c) has a simple form, characterized by parameters  (indicating the prototype) Probabilitydensity x

  13. Nonparametric density estimation Approximate a probability distribution as a sum of many “kernels” (one per data point) estimated function individual kernels true function n = 10 Probability density x

  14. Something in between Use a “mixture” distribution, with more than one component per data point mixture distribution mixture components Probability x (Rosseel, 2002)

  15. Anderson’s rational model(Anderson, 1990, 1991) • Treat category labels like any other feature • Define a joint distribution p(x,c) on features using a mixture model, breaking objects into clusters • Allow the number of clusters to vary… a Dirichlet process mixture model (Neal, 1998; Sanborn et al., 2006)

  16. A unifying rational model • Density estimation is a unifying framework • a way of viewing models of categorization • We can go beyond this to define a unifying model • one model, of which all others are special cases • Learners can adopt different representations by adaptively selecting between these cases • Basic tool: two interacting levels of clusters • results from the hierarchical Dirichlet process (Teh, Jordan, Beal, & Blei, 2004)

  17. The hierarchical Dirichlet process

  18. prototype Anderson exemplar cluster exemplar A unifying rational model category

  19. exceptions HDP+, and Smith & Minda (1998) • HDP+, will automatically infer a representation using exemplars, prototypes, or something in between (with  being learned from the data) • Test on Smith & Minda (1998, Experiment 2) 000000 100000 010000 001000 000010 000001 111101 111111 011111 101111 110111 111011 111110 000100 Category A: Category B:

  20. HDP exemplar prototype Log-likelihood HDP+, and Smith & Minda (1998) prototype Probability of A exemplar HDP

  21. The promise of HDP+,+ • In HDP+,+, clusters are shared between categories • a property of hierarchical Bayesian models • Learning one category has a direct effect on the prior on probability densities for the next category

  22. Learning the features of objects • Most models of human cognition assume objects are represented in terms of abstract features • What are the features of this object? • What determines what features we identify? (Austerweil & Griffiths, submitted)

  23. Binary matrix factorization

  24. How should we infer the number of features? Binary matrix factorization 

  25. The nonparametric approach Assume that the total number of features is unbounded, but only a finite number will be expressed in any finite dataset  Use the Indian buffet process as a prior on Z (Griffiths & Ghahramani, 2006)

  26. (Austerweil & Griffiths, submitted)

  27. An experiment… Training Testing Seen Correlated Unseen Factorial Shuffled (Austerweil & Griffiths, submitted)

  28. Results (Austerweil & Griffiths, submitted)

  29. Conclusions • Approaching cognitive problems as computational problems allows cognitive science and machine learning to be mutually informative • Machine

  30. Credits Categorization Adam Sanborn Kevin Canini Dan Navarro Learning features Joe Austerweil MCMC with people Adam Sanborn Computational Cognitive Science Lab http://cocosci.berkeley.edu/

More Related