630 likes | 826 Views
Learning : A modern review of anticipatory systems in brains and machines. Thomas Trappenberg. Outline. Universal Learning machines. 1961: Outline of a theory of Thought-Processes and Thinking Machines Neuronic & Mnemonic Equation Reverberation Oscillations
E N D
Learning: A modern review of anticipatory systems in brains and machines Thomas Trappenberg
Universal Learning machines • 1961: Outline of a theory of Thought-Processes • and Thinking Machines • Neuronic & Mnemonic Equation • Reverberation • Oscillations • Reward learning Eduardo Renato Caianiello (1921-1993) But: NOT STOCHASTIC (only small noise in weights) Stochastic networks: The Boltzmann machine Hinton & Sejnowski 1983
MultiLayerPerceptron (MLP) Universal approximator (learner) but Overfitting Meaningful input Unstructured learning Only deterministic (just use chain rule)
Linear large margin classifiers Support Vector Machines (SVM) MLP: Minimize training error (here threshold Perceptron) VM: Minimize generalization error (empirical risk)
Linear in parameter learning Linear hypothesis Non-Linear hypothesis Linear in parameters SVM in dual form } Kernel function Liquid/echo state machines Extreme learning machines Thanks to Doug Tweet (UoT) for pointing out LIP
Goal of learning: Make predictions !!!!!!!!!!! learning vs memory Fundamental stochastisity Irreducible indeterminacy Epistemological limitations Sources of fluctuations Probabilistic framework
Plant equation for robot Distance traveled when both motors are running with Power 50 Goal of learning:
Hypothesis: The hard problem: How to come up with a useful hypothesis Learning: Choose parameters that make training data most likely Assume independence of training examples Maximum Likelihood Estimation and consider this as function of parameters (log likelihood)
How about building more elaborate multivariate models? and arguing with Causal (graphical) models (Judea Pearl) 10 parameters 31 Parameters of CPT usually learned from data!
Hidden Markov Model (HMM) for localization • Integrating sensor information becomes trivial • Breakdown of point estimates in global localization (particle filters)
Synaptic Plasticity Gradient descent rule for LMS loss function: … with linear hypothesis: Perceptron learning rule Hebb rule
The organization of behavior (1949): Donald O. Hebb (1904-1985) see also Sigmund Freud, Law of association by simultaneity, 1888
Data from G.Q. Bi and M.M. Poo, J Neurosci 18 (1998) D. Standage, S. Jalil and T. Trappenberg, Biological Cybernetics 96 (2007)
Population argument of `weight dependence’ Is Bi and Poo’s weight dependent STDP data an experimental artifact? - Three sets of assumptions (B, C, D) - Their data may reflect population effects … with Dominic Standage (Queen’s University)
Horace Barlow Possible mechanisms underlying the transformations of sensory of sensory messages (1961) ``… reduction of redundancy is an important principle guiding the organization of sensory messages …” Sparsness & Overcompleteness The Ratio Club
PCA minimizing reconstruction error and sparsity
Deep believe networks: The stacked Restricted Boltzmann Machine Geoffrey E. Hinton
sparse convolutional RBM … with Paul Hollensen & Warren Connors Sonar images Truncated Cone Side Scan Sonar Synthetic Aperture Sonar scRBM reconstruction scRBM/SVM mine sensitivity: .983±.024, specificity: .954±.012 SIFT/SVM mine sensitivity: .970±.025, specificity: .944±.008
… with Paul Hollensen sparse and topographic RBM (rtRBM)
Map Initialized Perceptron (MIP) …with Pitoyo Hartono
Free-Energy-Based Supervised Learning: TD learning generalized to Boltzmann machines (Sallans & Hinton 2004) Paul Hollensen: Sparse, topographic RBM successfully learns to drive the e-puck and avoid obstacles, given training data (proximity sensors, motor speeds)
2. Reinforcement learning -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 From Russel and Norvik
Markov Decision Process (MDP) If we know all these factors the problem is said to be fully observable And we can just sit down and contemplate about the problem before moving
Two important quantities policy: value function: Goal: maximize total expected payoff Optimal Control
Calculate value function (dynamic programming) Deterministic policies to simplify notation Bellman Equation for policy p Solution: Analytic or Incremental Richard Bellman 1920-1984
Policy Iteration: Chose one policy calculate corresponding value function chose better policy based on this value function Value Iteration: For each state evaluate all possible actions Bellman Equation for optimal policy
Solution: But: Environment not known a priori Observability of states Curse of Dimensionality Online (TD) POMDP Model-based RL
What if the environment is not completely known ? Online value function estimation (TD learning) If the environment is not known, use Monte Carlo method with bootstrapping Expected payoff before taking step Expected reward after taking step = actual reward plus discounted expected payoff of next step =Temporal Difference This leads to the exploration-exploitation dilemma
Online optimal control: Exploitation versus Exploration On-policy TD learning: Sarsa Off-policy TD learning: Q-learning
Model-based RL: TD(l) Instead of tabular methods as mainly discussed before, use function approximator with parameters q and gradient descent with exponential eligibility trace e which weights updates with l for each step (Satton 1988): Free Energy-based reinforcement learning (Sallans & Hinton 2004) … Paul Hollensen
Basal Ganglia … work with Patrick Connor
Our questions • How do humans learn values that guide behaviour? (human behaviour) • How is this implemented in the brain? (anatomy and physiology) • How can we apply this knowledge? (medical interventions and robotics)
Classical Conditioning Ivan Pavlov 1849-1936 Nobel Prize 1904 Rescorla-Wagner Model (1972)
Reward Signals in the Brain Wolfram Schultz Stimulus ANo reward Stimulus B Stimulus A Reward
Disorders with effects On dopamine system: Parkinson’s disease Tourett’s syndrome ADHD Drug addiction Schizophrenia Maia & Frank 2011
Adding Biological Qualities to the Model Input Rescorla-Wagner Model Rescorla and Wagner, 1972 Striatum Dopamine and Reward Prediction Error Schultz, 1998