1 / 91

Logistics

Join us for an informal and formal introduction to learning as probabilistic inference, mathematical psychology, and Bayes nets. Discover the virtues of the Bayesian framework in explaining cognition across various domains and tasks. Explore how theories are acquired and how beliefs are updated using rational statistical inference. Understand the meaning and cognitive viability of subjective probability.

robbiek
Download Presentation

Logistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logistics • Class size? Who is new? Who is listening? • Everyone on Athena mailing list “concepts-and-theories”? If not write to me. • Everyone on stellar yet? If not, write to Melissa Yeh (mjyeh@mit.edu). • Interest in having a printed course pack, even if a few readings get changed?

  2. Plan for tonight • Why be Bayesian? • Informal introduction to learning as probabilistic inference • Formal introduction to probabilistic inference • A little bit of mathematical psychology • An introduction to Bayes nets

  3. Plan for tonight • Why be Bayesian? • Informal introduction to learning as probabilistic inference • Formal introduction to probabilistic inference • A little bit of mathematical psychology • An introduction to Bayes nets

  4. Virtues of Bayesian framework • Generates principled models with strong explanatory and descriptive power.

  5. Virtues of Bayesian framework • Generates principled models with strong explanatory and descriptive power. • Unifies models of cognition across tasks and domains. • Categorization • Concept learning • Word learning • Inductive reasoning • Causal inference • Conceptual change • Biology • Physics • Psychology • Language • . . .

  6. Virtues of Bayesian framework • Generates principled models with strong explanatory and descriptive power. • Unifies models of cognition across tasks and domains. • Explains which processing models work, and why. • Associative learning • Connectionist networks • Similarity to examples • Toolkit of simple heuristics

  7. Virtues of Bayesian framework • Generates principled models with strong explanatory and descriptive power. • Unifies models of cognition across tasks and domains. • Explains which processing models work, and why. • Allows us to move beyond classic dichotomies. • Symbols (rules, logic, hierarchies, relations) versus Statistics • Domain-general versus Domain-specific • Nature versus Nurture

  8. Virtues of Bayesian framework • Generates principled models with strong explanatory and descriptive power. • Unifies models of cognition across tasks and domains. • Explains which processing models work, and why. • Allows us to move beyond classic dichotomies. • A framework for understanding theory-based cognition: • How are theories used to learn about the structure of the world? • How are theories acquired?

  9. Rational statistical inference(Bayes, Laplace) • Fundamental question: How do we update beliefs in light of data? • Fundamental (and only) assumption: Represent degrees of belief as probabilities. • The answer: Mathematics of probability theory.

  10. What does probability mean? Frequentists: Probability as expected frequency • P(A) = 1: A will always occur. • P(A) = 0: A will never occur. • 0.5 < P(A) < 1: A will occur more often than not. Subjectivists: Probability as degree of belief • P(A) = 1: believe A is true. • P(A) = 0: believe A is false. • 0.5 < P(A) < 1: believe A is more likely to be true than false.

  11. What does probability mean? Frequentists: Probability as expected frequency • P(“heads”) = 0.5 ~ “If we flip 100 times, we expect to see about 50 heads.” Subjectivists: Probability as degree of belief • P(“heads”) = 0.5 ~ “On the next flip, it’s an even bet whether it comes up heads or tails.” • P(“rain tomorrow”) = 0.8 • P(“Saddam Hussein is dead”) = 0.1 • . . .

  12. Is subjective probability cognitively viable? • Evolutionary psychologists (Gigerenzer, Cosmides, Tooby, Pinker) argue it is not.

  13. “To understand the design of statistical inference mechanisms, then, one needs to examine what form inductive-reasoning problems -- and the information relevant to solving them -- regularly took in ancestral environments. […] Asking for the probability of a single event seems unexceptionable in the modern world, where we are bombarded with numerically expressed statistical information, such as weather forecasts telling us there is a 60% chance of rain today. […] In ancestral environments, the only external database available from which to reason inductively was one's own observations and, possibly, those communicated by the handful of other individuals with whom one lived. The ‘probability’ of a single event cannot be observed by an individual, however. Single events either happen or they don’t -- either it will rain today or it will not. Natural selection cannot build cognitive mechanisms designed to reason about, or receive as input, information in a format that did not regularly exist.” (Brase, Cosmides and Tooby, 1998)

  14. Is subjective probability cognitively viable? • Evolutionary psychologists (Gigerenzer, Cosmides, Tooby, Pinker) argue it is not. • Reasons to think it is: • Intuitions are old and potentially universal (Aristotle, the Talmud). • Represented in semantics (and syntax?) of natural language. • Extremely useful ….

  15. Why be subjectivist? • Often need to make inferences about singular events • e.g., How likely is it to rain tomorrow? • Cox Axioms • A formal model of common sense • “Dutch Book” + Survival of the Fittest • If your beliefs do not accord with the laws of probability, then you can always be out-gambled by someone whose beliefs do so accord. • Provides a theory of learning • A common currency for combining prior knowledge and the lessons of experience.

  16. Cox Axioms (via Jaynes) • Degrees of belief are represented by real numbers. • Qualitative correspondence with common sense, e.g.: • Consistency: • If a conclusion can be reasoned in more than one way, then every possible way must lead to the same result. • All available evidence should be taken into account when inferring a degree of belief. • Equivalent states of knowledge should be represented with equivalent degrees of belief. • Accepting these axioms implies Bel can be represented as a probability measure.

  17. Plan for tonight • Why be Bayesian? • Informal introduction to learning as probabilistic inference • Formal introduction to probabilistic inference • A little bit of mathematical psychology • An introduction to Bayes nets

  18. Example: flipping coins • Flip a coin 10 times and see 5 heads, 5 tails. • P(heads) on next flip? 50% • Why? 50% = 5 / (5+5) = 5/10. • “Future will be like the past.” • Suppose we had seen 4 heads and 6 tails. • P(heads) on next flip? Closer to 50% than to 40%. • Why? Prior knowledge.

  19. Example: flipping coins • Represent prior knowledge as fictional observations F. • E.g., F ={1000 heads, 1000 tails} ~ strong expectation that any new coin will be fair. • After seeing 4 heads, 6 tails, P(heads) on next flip = 1004 / (1004+1006) = 49.95% • E.g., F ={3 heads, 3 tails} ~ weak expectation that any new coin will be fair. • After seeing 4 heads, 6 tails, P(heads) on next flip = 7 / (7+9) = 43.75%. Prior knowledge too weak.

  20. Example: flipping thumbtacks • Represent prior knowledge as fictional observations F. • E.g., F ={4 heads, 3 tails} ~ weak expectation that tacks are slightly biased towards heads. • After seeing 2 heads, 0 tails, P(heads) on next flip = 6 / (6+3) = 67%. • Some prior knowledge is always necessary to avoid jumping to hasty conclusions. • Suppose F = { }: After seeing 2 heads, 0 tails, P(heads) on next flip = 2 / (2+0) = 100%.

  21. Origin of prior knowledge • Tempting answer: prior experience • Suppose you have previously seen 2000 coin flips: 1000 heads, 1000 tails. • By assuming all coins (and flips) are alike, these observations of other coins are as good as actual observations of the present coin.

  22. Problems with simple empiricism • Haven’t really seen 2000 coin flips, or any thumbtack flips. • Prior knowledge is stronger than raw experience justifies. • Haven’t seen exactly equal number of heads and tails. • Prior knowledge is smoother than raw experience justifies. • Should be a difference between observing 2000 flips of a single coin versus observing 10 flips each for 200 coins, or 1 flip each for 2000 coins. • Prior knowledge is more structured than raw experience.

  23. A simple theory • “Coins are manufactured by a standardized procedure that is effective but not perfect.” • Justifies generalizing from previous coins to the present coin. • Justifies smoother and stronger prior than raw experience alone. • Explains why seeing 10 flips each for 200 coins is more valuable than seeing 2000 flips of one coin. • “Tacks are asymmetric, and manufactured to less exacting standards.”

  24. Limitations • Can all domain knowledge be represented so simply, in terms of an equivalent number of fictional observations? • Suppose you flip a coin 25 times and get all heads. Something funny is going on …. • But with F ={1000 heads, 1000 tails}, P(heads) on next flip = 1025 / (1025+1000) = 50.6%. Looks like nothing unusual.

  25. Plan for tonight • Why be Bayesian? • Informal introduction to learning as probabilistic inference • Formal introduction to probabilistic inference • A little bit of mathematical psychology • An introduction to Bayes nets

  26. Basics • Propositions: A, B, C, . . . . • Negation: • Logical operators “and”, “or”: • Obey classical logic, e.g.,

  27. “Conditional probability” Basics • Conservation of belief: • “Joint probability”: • For independent propositions: • More generally:

  28. Basics • Example: • A = “Heads on flip 2” • B = “Tails on flip 2”

  29. Basics • All probabilities should be conditioned on background knowledge K: e.g., • All the same rules hold conditioned on any K: e.g., • Often background knowledge will be implicit, brought in as needed.

  30. Bayesian inference • Definition of conditional probability: • Bayes’ theorem:

  31. Bayesian inference • Definition of conditional probability: • Bayes’ rule: • “Posterior probability”: • “Prior probability”: • “Likelihood”:

  32. Bayesian inference • Bayes’ rule: • What makes a good scientific argument? P(H|D) is high if: • Hypothesis is plausible: P(H) is high • Hypothesis strongly predicts the observed data: P(D|H) is high • Data are surprising: P(D) is low

  33. Bayesian inference • Deriving a more useful version:

  34. Bayesian inference • Deriving a more useful version:

  35. “Marginalization” Bayesian inference • Deriving a more useful version: “Conditionalization”

  36. Bayesian inference • Deriving a more useful version:

  37. Bayesian inference • Deriving a more useful version:

  38. Bayesian inference • Deriving a more useful version:

  39. Bayesian inference • Deriving a more useful version:

  40. Random variables • Random variable X denotes a set of mutually exclusive exhaustive propositions (states of the world): • Bayes’ theorem for random variables:

  41. Random variables • Random variable X denotes a set of mutually exclusive exhaustive propositions (states of the world): • Bayes’ rule for more than two hypotheses:

  42. Sherlock Holmes • “How often have I said to you that when you have eliminated the impossible whatever remains, however improbable, must be the truth?” (The Sign of the Four)

  43. Sherlock Holmes • “How often have I said to you that when you have eliminated the impossible whatever remains, however improbable, must be the truth?” (The Sign of the Four)

  44. Sherlock Holmes • “How often have I said to you that when you have eliminated the impossible whatever remains, however improbable, must be the truth?” (The Sign of the Four) = 0

  45. Sherlock Holmes • “How often have I said to you that when you have eliminated the impossible whatever remains, however improbable, must be the truth?” (The Sign of the Four)

  46. Plan for tonight • Why be Bayesian? • Informal introduction to learning as probabilistic inference • Formal introduction to probabilistic inference • A little bit of mathematical psychology • An introduction to Bayes nets

  47. Representativeness in reasoning Which sequence is more likely to be produced by flipping a fair coin? HHTHT HHHHH

  48. But how does “representativeness” work? A reasoning fallacy Kahneman & Tversky: people judge the probability of an outcome based on the extent to which it is representative of the generating process.

  49. Predictive versus inductive reasoning Hypothesis H Data D

  50. Likelihood: Predictive versus inductive reasoning Prediction given ? H D

More Related