300 likes | 313 Views
This article explores how we grasp subset boundaries and make causal inferences from limited examples, discussing the role of abstract knowledge and Bayesian inference in learning and inference. It also delves into the structure of data space and the acquisition of intuitive theories.
E N D
Science, March 2011 we grasp subset boundaries from few examples e.g. horses and hairbrushes (recite cloud story) we make causal inferences from tiny samples statistically invalid OR reliant on IK/UP? based on constraints, inductive bias, priors different names for prior knowledge we build rich causal models we make strong generalizations we construct powerful abstractions but: inputs are noisy, ambiguous, limited
Highlights: Bayesian Inference (BI) vs. nets vs. natives - 3 central questions on Abstract Knowledge (AK) How does AK guide learning and inference? What forms does AK take? (across domains and tasks) Where does the AK come from? - connectionist vs. Bayesian models vs. nativists nativists: original knowledge, constructivism, theory of theory connectionists: Wij matrices w/ powerful statistics (e.g. spin glasses) 3rd possibility: statistical learning via abstract structured knowledge - despite Bayesian successes, BI operates only in domains where “biology has had time and cause to engineer” solutions (p. 2). - Abstract Knowledge emerges from generative models in that: it generalizes from instances to a broader category is distills in parsimonious form the structure of the world
How to Use Bayes Rule: given a conditional likelihood P(d/h) and a the frequency of different hypotheses P(h), we can calculate the most likely cause of the signal, i.e. the “posterior probability”. The signal (coughing) might be caused by a cold, lung cancer or heart disease. Since the likelihood of getting a cough from heart disease, P(d/h) is VERY low, that makes the hypothesis of heart disease causing the cough, P (h/d) very low. There is however, a good chance of coughing due to cold OR lung cancer, so those likelihoods are high. But how many of you have lung cancer? near zero (hopefully zero for our class). Because the prevalence (h) of lung cancer is << colds, the product of cold, P(h) x P(d/h) is the largest product, and so this might be how we know that the coughing all around is not spreading lung cancer!
TUFAS Supervised learning: red boxes contain “tufas”. Which others are tufas?
HBM models can learn the structure of the data space. Seen before? Figure 2 (A, B, C) S = structure F = form D = features/Data
Figure 2 (D, E) Architectures: Tree, Linear, Circular, “Globe”, 2D array
variables Fig. 3. HBMs defined over graph schemas can explain how intuitive theories are acquired and used to learn about specific causal relations from limited data (38).
variables Fig. 3. (A) shows ground-truth relation btw. 6 diseases and 10 symptoms (B) 2-level HBM w/ uninformative prior; good after n=1000 patients (malpractice?) (C) 3-level non-parametric HBM that assigns variables and a causal linkage (D) HBM learning abstract theory of causality.
To what extent is the solution-space made very narrow and thus easy to solve in the foregoing examples? To what extent do animals solve problems that extend far beyond the challenges posed by tufa-scale problems? Artificial Intelligence (machine learning) is constantly pushing the boundaries of this frontier, but how do the best learning machines actually do this? How much is based upon evolutionary learning?
AI vs. Human IQ vs. Animal Intelligence Q: What is the best universal problem solver that has been programmed to date? How does this compare with Crows, Rats and Chimps? Q: Is there a game learning program that can generalize and extend its knowledge, going e.g. from tic-tac-toe to checkers to Hearts? Q: Can mice or chimps learn to play Hearts? They could perhaps learn to follow rules via supervised learning (reward learning and/or instruction). Q: Could mice or chimps ever get good at Hearts? i.e. develop basic or perhaps advanced strategies? JBT’s pattern-learning problems are solvable on statistical grounds, given enough trial and error. Hearts might be statistically learnable at some level of performance (rules, basic strategies), but it might be that the solution space is outside the dimensions of the statistical program. What is it about the human brain that allows us to create and learn such things while other animals cannot? Can other animals learn by analogy? Analogy is important in language and in varied domains of problem solving. It is also conducive to genetic solutions, since duplication (copying) of circuits from one domain into another might enable faster learning and exploitation within that new domain. Does 6 layer neocortex constitute an analogical circuit that has been massively replicated and adapted to dx. tasks? Hearts Basic strategy: always play lower cards. Advanced: play high card on suit opening. More advanced: determine others’ suits and ranks; play strings of high cards to create desired scenarios.
Learning to play Hearts • rules • basic strategy • advanced strategy
Learning to play Hearts 1. Rules - statistical learning - instruction - hypothesis based • rules • basic strategy • advanced strategy 2. Basic Strategy - statistical learning - instruction - hypothesis based 3. Advanced Strategies - may come in Aha moments - Aha’s rely on SCIP - deliberate strategizing modeling play, scenarios uniquely human? goes beyond Theory of Mind
Einstein’s work on gravity was triumphed at the Solvay Conference but the relationship between quantum states and relativity was still a matter of controversy A sentence can mean something different to a 19-year old physics major, vs. a 12-year old, and something different still to a 6-year old, a 2-year old, and a 30-year old graduate student who has read tomes on the topics broached in said sentence. When you read an article packed full of sentences, many of which trigger introspective queries of what is intended, how reasonable is the sentence and what are its ramifications, then an article that resonates with your brain can yield many intricate computations that ramify profusely and lead to new domains. The more that is stored IN your brain and that is organized (in part by your deliberate, reflective efforts), then the more “new” stuff you might encounter within your internal frontiers. “Developmentalists have argued that not everything can be learned, that learning can only get off the ground with some innate stock of abstract concepts such as “agent”, “object” and “cause” to provide the basic ontology for carving up experience (7, 61).” “Following the “blessing of abstraction,” these constraints can be induced from only small samples of each network’s behavior and in turn enable more efficient causal learning for new systems (62).” [Guts of the HBM model]
Post-Stimulus Time Histogram w/ underlying Raster Plot The response of the neuron is most evident when averaging many trials together. Individual lines in the raster plot are noisy. The stimulus does not perfectly predict the resulting spike train in this neuron. The spike train does not perfectly predict the stimulus. Spikes emphasizes use of probability in deciphering the neural code evident here. Figure from splot.org
spike to spike in complete analogy to the fluctuations of the spike train from presentation to presentation in Fig. 2.1. Nonetheless, the stimulus surrounding a spike has a nonzero average, and we shall see in section 2.1.3 that this average stimulus waveform provides a useful description of the cell's response properties. Figures 2.1 and 2.3 are just the beginning of a quantitative analysis, but we hope that they provide some intuition for the problem of translation and the importance of Bayes’ rule. The bilingual dictionary of spikes and sensory signals must be written in a probabilistic format, and Bayes’ rule tells us how the two halves of the dictionary are related.
Applications of Bayes Theorem to Larval Locomotor Sequencing work in progress see Westphal and O’Malley Gordon Conference, 2014