460 likes | 588 Views
Outline. 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009) Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007) Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning
E N D
Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009) Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007) Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning (Weber & Triesch, ICANN 2008) Task 6.4 Learning hierarchical world models for planning
Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009) Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007) Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning (Weber & Triesch, ICANN 2008) Task 6.4 Learning hierarchical world models for planning
reinforcement learning action a weights input s
reinforcement learning actor input (state space) simple input complex input go right! go right? go left?
complex input scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’; reward given if horizontal bar at specific position sensory input action reward
need another layer(s) to pre-process complex data a action P(a=1) = softmax(Q s) Q weight matrix encodes v = a Q s sstate s = softmax(W I) position of relevant bar W weight matrix feature detector I input minimize error: E = (0.9 v(s’,a’) - v(s,a))2 = δ2 d Q ≈ dE/dQ = δ a s d W ≈ dE/dW = δ Q s I + ε learning rules:
memory extension model uses previous state and action to estimate current state
learning the ‘short bars’ data feature weights RL action weights data action reward
short bars in 12x12 average # of steps to goal: 11
learning ‘long bars’ data RL action weights feature weights data input reward 2 actions (not shown)
WTA non-negative weights SoftMax no weight constraints SoftMax non-negative weights
models’ background: - gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995) - reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007) - reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002) - reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007); ... - RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)
unsupervised learning in cortex actor state space reinforcement learning in basal ganglia Doya, 1999
Discussion - may help reinforcement learning work with real-world data ... real visual processing!
Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009) Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007) Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning (Weber & Triesch, ICANN 2008) Task 6.4 Learning hierarchical world models for planning
Representation of depth • How to learn disparity tuned neurons in V1?
Reinforcement learning in a neural network • after vergence: input at a new disparity • if disparity is zero reward
Attention-Gated Reinforcement Learning (Roelfsema, van Ooyen, 2005) Hebbian-like weight learning:
Measured disparity tuning curves • Six types of tuning curves (Poggio, Gonzalez, Krause, 1988)
Development of disparity tuning All six types of tuning curves emerge in the hidden layer!
Discussion - requires application ... use 2D images from 3D space ... open question as to the implementation of the reward ... learning of attention?
Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009) Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007) Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning (Weber & Triesch, ICANN 2008) Task 6.4 Learning hierarchical world models for planning
Reinforcement Learning leads to a fixed reactive systemthat always strives for the same goal actor units value task: in exploration phase, learn a general model to allow the agent to plan a route to any goal
Learning randomly move around the state space actor learn world models: ● associative model ● inverse model ● forward model state space
Learning: Associative Model weights to associate neighbouring states use these to find any possible routes between agent and goal
Learning: Inverse Model weights to “postdict” action given state pair use these to identify the action that leads to a desired state Sigma-Pi neuron model
Learning: Forward Model weights to predict state given state-action pair use these to predict the next state given the chosen action
Planning actor units goal agent
Discussion - requires embedding ... learn state space from sensor input ... only random exploration implemented - tong ... hand-designed planning phases ... hierarchical models?