1 / 59

Bayesian models of inductive learning and reasoning Josh Tenenbaum MIT

Explore how people learn about the world with limited evidence through everyday inductive leaps. This includes learning concepts, object properties, word meanings, cause-effect relations, beliefs of others, and social structures. This research aims to answer questions about how background knowledge guides learning, what form background knowledge takes, and how it is acquired.

rocior
Download Presentation

Bayesian models of inductive learning and reasoning Josh Tenenbaum MIT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian models of inductive learning and reasoning Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL)

  2. Collaborators Charles Kemp Noah Goodman Chris Baker Tom Griffiths Amy Perfors Vikash Mansinghka Lauren Schmidt Pat Shafto

  3. Everyday inductive leaps How can people learn so much about the world from such limited evidence? • Learning concepts from examples “horse” “horse” “horse”

  4. “tufa” “tufa” “tufa” Learning concepts from examples

  5. Everyday inductive leaps How can people learn so much about the world from such limited evidence? • Kinds of objects and their properties • The meanings of words, phrases, and sentences • Cause-effect relations • The beliefs, goals and plans of other people • Social structures, conventions, and rules

  6. The solution Prior knowledge (inductive bias).

  7. The solution Prior knowledge (inductive bias). • How does background knowledge guide learning from sparsely observed data? • What form does background knowledge take, across different domains and tasks? • How is background knowledge itself acquired? The challenge: Can we answer these questions in precise computational terms?

  8. Modeling goals • Principled quantitative models of human inductive inferences, with broad coverage and a minimum of free parameters and ad hoc assumptions. • An understanding of how and why human learning and reasoning works, as a species of rational (approximately optimal) statistical inference given the structure of natural environments. • A two-way bridge to artificial intelligence and machine learning.

  9. Bayesian inference • Bayes’ rule: • An example • Data: John is coughing • Some hypotheses: • John has a cold • John has lung cancer • John has a stomach flu • Likelihood P(d|h) favors 1 and 2 over 3 • Prior probability P(h) favors 1 and 3 over 2 • Posterior probability P(h|d) favors 1 over 2 and 3

  10. The Bayesian modeling toolkit • How does background knowledge guide learning from sparsely observed data? Bayesian inference: 2. What form does background knowledge take, across different domains and tasks? Probabilities defined over structured representations: graphs, grammars, predicate logic, schemas, theories. 3. How is background knowledge itself acquired? Hierarchical probabilistic models, with inference at multiple levels of abstraction. Flexible nonparametric models in which complexity grows with the data.

  11. “Similarity”, “Typicality”, “Diversity” A case study: learning about objects and their properties “Property induction”, “category-based induction” (Rips, 1975; Osherson, Smith et al., 1990) Gorillas have T9 hormones. Seals have T9 hormones. Squirrels have T9 hormones. Flies have T9 hormones. Gorillas have T9 hormones. Seals have T9 hormones. Squirrels have T9 hormones. Horses have T9 hormones. Gorillas have T9 hormones. Chimps have T9 hormones. Monkeys have T9 hormones. Baboons have T9 hormones. Horses have T9 hormones.

  12. Experiments on property induction(Osherson, Smith, Wilkie, Lopez, Shafir, 1990) • 20 subjects rated the strength of 45 arguments: X1 have property P. (e.g., Cows have T4 hormones.) X2 have property P. X3 have property P. All mammals have property P. [General argument] • 20 subjects rated the strength of 36 arguments: X1 have property P. X2 have property P. Horses have property P. [Specific argument]

  13. Property induction as acomputational problem ? Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ? ? ? ? ? ? ? ? New property Features 85 features for 50 animals (Osherson & Wilkie feature rating task). e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…

  14. Similarity-based models Data Model X1 have property P. X2 have property P. X3 have property P. All mammals have property P. . Each “ ” represents one argument:

  15. Beyond similarity in induction Poodles can bite through wire. German shepherds can bite through wire. • Reasoning based on dimensional thresholds:(Smith et al., 1993) • Reasoning based on causal relations:(Medin et al., 2004; Coley & Shafto, 2003) Dobermans can bite through wire. German shepherds can bite through wire. Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Salmon carry E. Spirus bacteria.

  16. The Bayesian modeling toolkit • How does background knowledge guide learning from sparsely observed data? Bayesian inference: 2. What form does background knowledge take, across different domains and tasks? Probabilities defined over structured representations: graphs, grammars, predicate logic, schemas, theories. 3. How is background knowledge itself acquired? Hierarchical probabilistic models, with inference at multiple levels of abstraction. Flexible nonparametric models in which complexity grows with the data.

  17. mouse P(form) squirrel chimp gorilla P(structure | form) P(data | structure) Model overview F: form Tree with species at leaf nodes S: structure F1 F2 F3 F4 Has T9 hormones mouse squirrel chimp gorilla ? ? ? D: data …

  18. mouse squirrel chimp gorilla Model overview F: form Tree with species at leaf nodes S: structure F1 F2 F3 F4 Has T9 hormones mouse squirrel chimp gorilla ? ? ? D: data …

  19. Horses have T9 hormones Rhinos have T9 hormones Cows have T9 hormones } X Y Hypotheses h Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ? ? ? ? ? ? ? ? ... ... Prior P(h)

  20. Horses have T9 hormones Rhinos have T9 hormones Cows have T9 hormones } X Y Hypotheses h Prediction P(Y | X) Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ? ? ? ? ? ? ? ? ... ... Prior P(h)

  21. Where does the prior come from? Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ... ... Prior P(h) Why not just enumerate all logically possible hypotheses along with their relative prior probabilities?

  22. Chimps have T9 hormones. Gorillas have T9 hormones. Taxonomic similarity Poodles can bite through wire. Dobermans can bite through wire. Jaw strength Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Food web relations Knowledge-based priors

  23. mouse squirrel chimp gorilla Model overview F: form Tree with species at leaf nodes S: structure F1 F2 F3 F4 Has T9 hormones mouse squirrel chimp gorilla ? ? ? D: data …

  24. P(D|S): How the structure constrains the data of experience • Define a stochastic process over structure S that generates candidate property extensions h. • Intuition: properties should vary smoothly over structure. Smooth: P(h) high Not smooth: P(h) low

  25. P(D|S): How the structure constrains the data of experience S dij = length of the edge between i and j (= if i and j are not connected) y A Gaussian prior ~ N(0, S), with (Zhu, Lafferty & Ghahramani, 2003) h

  26. Structure S Data D Species 1 Species 2 Species 3 Species 4 Species 5 Species 6 Species 7 Species 8 Species 9 Species 10 Features 85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…

  27. Modeling feature covariance based on distance in graph (Zhu et al., 2003; c.f. Sattath & Tversky, 1977)

  28. Modeling feature covariance based on distance in two-dimensional space (Lawrence, 2004; Smola & Kondor 2003; c.f. Shepard, 1987)

  29. Structure S Data D Species 1 Species 2 Species 3 Species 4 Species 5 Species 6 Species 7 Species 8 Species 9 Species 10 ? ? ? ? ? ? ? ? Features New property 85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…

  30. Cows have property P. Elephants have property P. Horses have property P. Tree 2D Gorillas have property P. Mice have property P. Seals have property P. All mammals have property P.

  31. Testing different priors Inductive bias Correct bias Wrong bias Too weak bias x Too strong bias

  32. Spatially varying properties Geographic inference task: “Given that a certain kind of native American artifact has been found in sites near city X, how likely is the same artifact to be found near city Y?” 2D Tree

  33. Property type “has T9 hormones” “can bite through wire” “carry E. Spirus bacteria” Theory Structure taxonomic tree directed chain directed network + diffusion process + drift process + noisy transmission Class A Class B Class C Class D Class E Class F Class G Class D Class D Class A Class A Class F Class E Class C Class C Class B Class G Class E Class B Class F Hypotheses Class G Class A Class B Class C Class D Class E Class F Class G . . . . . . . . .

  34. Biological property Disease property Tree Web Sand shark Mako shark Human Herring Tuna Kelp Dolphin “Given that A has property P, how likely is it that B does?” e.g., P = “has X cells” Herring Tuna Mako shark Sand shark Dolphin e.g., P = “has X disease” Human Kelp

  35. Summary so far • A framework for modeling human inductive reasoning as rational statistical inference over structured knowledge representations • Qualitatively different priors are appropriate for different domains of property induction. • In each domain, a prior that matches the world’s structure fits people’s judgments well, and better than alternative priors. • A language for representing different theories: graph structure defined over objects + probabilistic model for the distribution of properties over that graph. • Remaining question: How can we learn appropriate structures for different domains?

  36. chimp mouse gorilla squirrel squirrel chimp gorilla mouse Model overview F: form Chain Tree Space mouse squirrel S: structure gorilla chimp F1 F2 F3 F4 D: data mouse squirrel chimp gorilla

  37. Snake Turtle Crocodile Robin Ostrich Bat Orangutan Discovering structural forms Snake Turtle Bat Crocodile Robin Orangutan Ostrich Ostrich Robin Crocodile Snake Turtle Bat Orangutan

  38. Snake Turtle Crocodile Robin Ostrich Bat Orangutan Discovering structural forms “Great chain of being” Snake Turtle Bat Crocodile Robin Plant Rock Angel Orangutan Ostrich God Linnaeus Ostrich Robin Crocodile Snake Turtle Bat Orangutan

  39. People can discover structural forms Tree structure for biological species • Periodic structure for chemical elements • Scientific discoveries • Children’s cognitive development • Hierarchical structure of category labels • Clique structure of social groups • Cyclical structure of seasons or days of the week • Transitive structure for value “great chain of being” Systema Naturae Kingdom Animalia  Phylum Chordata   Class Mammalia     Order Primates       Family Hominidae        Genus Homo          Species Homo sapiens (1837) (1735) (1579)

  40. Typical structure learning algorithms assume a fixed structural form Flat Clusters Line Circle K-Means Mixture models Competitive learning Guttman scaling Ideal point models Circumplex models Grid Tree Euclidean Space Hierarchical clustering Bayesian phylogenetics Self-Organizing Map Generative topographic mapping MDS PCA Factor Analysis

  41. The ultimate goal “Universal Structure Learner” K-Means Hierarchical clustering Factor Analysis Guttman scaling Circumplex models Self-Organizing maps ··· Data Representation

  42. A “universal grammar” for structural forms Form Process Form Process

  43. Node-replacement graph grammars Production (Line) Derivation

  44. Node-replacement graph grammars Production (Line) Derivation

  45. Node-replacement graph grammars Production (Line) Derivation

  46. chimp mouse gorilla squirrel squirrel chimp gorilla mouse Linear Tree Grid F: form x Favors simplicity squirrel mouse S: structure chimp gorilla Favors smoothness [Zhu et al., 2003] F1 F2 F3 F4 D: data mouse squirrel chimp gorilla

  47. Learning algorithm • Evaluate each form in parallel • For each form, heuristic search over structures based on greedy growth from a one-node seed:

  48. features animals cases judges

  49. objects similarities objects

  50. Structural forms from relational data Dominance hierarchy Tree Cliques Ring Primate troop Bush administration Prison inmates Kula islands “x beats y” “x told y” “x likes y” “x trades with y”

More Related