190 likes | 278 Views
Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition in Vision” Workshop at the Frankfurt Institute for Advanced Studies (FIAS), November 27-28, 2008.
E N D
Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition in Vision” Workshop at the Frankfurt Institute for Advanced Studies (FIAS), November 27-28, 2008
models’ background & overview: - unsupervised feature learning models are enslaved by bottom-up input - reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002) - reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007); ... - RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996) - reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007) (model 3 presented here, extending to delayed reward) - feature-pruning models learn all features but forget the irrelevant ones (models 1 & 2 presented here)
purely sensory data, in which one feature type is linked to reward the action is not controlled by the network sensory input action reward
model 1: obtaining the relevant features 1) build a feature detecting model 2) learn associations between features 3) register the average features’ reward 4) spread value along associative connections 5) check whether actions in-/decrease value 6) remove features where action doesn’t matter irrelevant relevant
Weber & Triesch, Proc ICANN, 740-9 (2008); Witkowski, Adap Behav, 15(1), 73-97 (2007); Toussaint, Proc NIPS, 929-36 (2003); Weber, Proc ICANN, 1147-52 (2001) Földiák, Biol Cybern 64, 165-70 (1990) selected features features lateral weights (decorrelation) associative weights thresholds action effect → homogeneous activity distr. → relevant features indentified
motor-sensory data (again, one feature type is linked to reward) the network selects the action (to get reward) sensory input reward irrelevant subspace relevant subspace
model 2: removing the irrelevant inputs 1) initialize feature detecting model (but continue learning) 2) perform actor-critic RL, taking the features’ outputs as state representation - works despite irrelevant features - challenge: relevant features will occur at different frequencies - nevertheless, features may remain stable 3) observe the critic: puts negative value on irrelevant features after long training 4) modulate (multiply) learning by critic’s value frequency value
Lücke & Bouecke, Proc ICANN, 31-7 (2005) features critic value action weights → relevant subspace discovered
model 3: learning only the relevant inputs 1) top level: reinforcement learning model (SARSA) 2) lower level: feature learning model (SOM / K-means) 3) modulate learning by δ, in both layers action RL weights feature weights input
feature weights relevant subspace RL action weights subspace coverage
learning ‘long bars’ data RL action weights feature weights data input reward 2 actions (not shown)
learning the ‘short bars’ data feature weights RL action weights action input data: bars controlled by actions ‘up’, ‘down’, ‘left’, ‘right’ reward
short bars in 12x12 average # of steps to goal: 11
biological interpretation - no direct feedback from striatum to cortex - convergent mapping → little receptive field overlap, consistent with subspace discovery GPi (output of basal ganglia) action selection striatum feature/subspace detection cortex
Discussion - models 1 and 2 learn all features and identify the relevent features - either requires homogeneous feature distribution (model 1) - or can do only subspace- (no real feature) detection (model 2) - model 3 is very simple: SARSA on SOM with δ-feedback - learns only the relevant subspace or features in the first place - link between unsupervised- and reinforcement learning Sponsors Frankfurt Institute for Advanced Studies FIAS Bernstein Focus Neurotechnology EU project 231722 “IM-CLeVeR” call FP7-ICT-2007-3
relevant features change during learning T - maze decision task (rat) Jog et al, Science, 286, 1158-61 (1999) early learning late learning units in the basal ganglia are active at the junction during early task acquisition but not at a later stage
evidence for reward/action modulated learning in the visual system Shuler & Bear, "Reward timing in the primary visual cortex", Science, 311, 1606-9 (2006) Schoups et al. "Practising orientation identification improves orientation coding in V1 neurons" Nature, 412, 549-53 (2001)