Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at:

Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition in Vision” Workshop at the Frankfurt Institute for Advanced Studies (FIAS), November 27-28, 2008

for taking action, we need only the relevant features y z x

models’ background & overview: - unsupervised feature learning models are enslaved by bottom-up input - reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002) - reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007); ... - RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)‏ - reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007)‏ (model 3 presented here, extending to delayed reward)‏ - feature-pruning models learn all features but forget the irrelevant ones (models 1 & 2 presented here)‏

purely sensory data, in which one feature type is linked to reward the action is not controlled by the network sensory input action reward

model 1: obtaining the relevant features 1) build a feature detecting model 2) learn associations between features 3) register the average features’ reward 4) spread value along associative connections 5) check whether actions in-/decrease value 6) remove features where action doesn’t matter irrelevant relevant

Weber & Triesch, Proc ICANN, 740-9 (2008); Witkowski, Adap Behav, 15(1), 73-97 (2007); Toussaint, Proc NIPS, 929-36 (2003); Weber, Proc ICANN, 1147-52 (2001)‏ Földiák, Biol Cybern 64, 165-70 (1990) selected features features lateral weights (decorrelation)‏ associative weights thresholds action effect → homogeneous activity distr. → relevant features indentified

motor-sensory data (again, one feature type is linked to reward)‏ the network selects the action (to get reward)‏ sensory input reward irrelevant subspace relevant subspace

model 2: removing the irrelevant inputs 1) initialize feature detecting model (but continue learning)‏ 2) perform actor-critic RL, taking the features’ outputs as state representation - works despite irrelevant features - challenge: relevant features will occur at different frequencies - nevertheless, features may remain stable 3) observe the critic: puts negative value on irrelevant features after long training 4) modulate (multiply) learning by critic’s value frequency value

Lücke & Bouecke, Proc ICANN, 31-7 (2005) features critic value action weights → relevant subspace discovered

model 3: learning only the relevant inputs 1) top level: reinforcement learning model (SARSA)‏ 2) lower level: feature learning model (SOM / K-means)‏ 3) modulate learning by δ, in both layers action RL weights feature weights input

model 3: SARSA with SOM-like activation and update

feature weights relevant subspace RL action weights subspace coverage

learning ‘long bars’ data RL action weights feature weights data input reward 2 actions (not shown)‏

learning the ‘short bars’ data feature weights RL action weights action input data: bars controlled by actions ‘up’, ‘down’, ‘left’, ‘right’ reward

short bars in 12x12 average # of steps to goal: 11

biological interpretation - no direct feedback from striatum to cortex - convergent mapping → little receptive field overlap, consistent with subspace discovery GPi (output of basal ganglia)‏ action selection striatum feature/subspace detection cortex

Discussion - models 1 and 2 learn all features and identify the relevent features - either requires homogeneous feature distribution (model 1)‏ - or can do only subspace- (no real feature) detection (model 2)‏ - model 3 is very simple: SARSA on SOM with δ-feedback - learns only the relevant subspace or features in the first place - link between unsupervised- and reinforcement learning Sponsors Frankfurt Institute for Advanced Studies FIAS Bernstein Focus Neurotechnology EU project 231722 “IM-CLeVeR” call FP7-ICT-2007-3

relevant features change during learning T - maze decision task (rat)‏ Jog et al, Science, 286, 1158-61 (1999)‏ early learning late learning units in the basal ganglia are active at the junction during early task acquisition but not at a later stage

evidence for reward/action modulated learning in the visual system Shuler & Bear, "Reward timing in the primary visual cortex", Science, 311, 1606-9 (2006)‏ Schoups et al. "Practising orientation identification improves orientation coding in V1 neurons" Nature, 412, 549-53 (2001)‏

Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at:

Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at:

Presentation Transcript

ECE 444

“Are the Goal Posts Moving?”

E-Learning in English Language Teaching and Learning

Literacy Assessments (Part 2): Significant Disabilities

Guided Goal Setting: The Logical Next Step to VENA

5 orientations of learning

SIFT Scale Invariant Feature Transform

The Sociology of Max Weber

Planning For Your Recovery

Media Studies

Multimodal Deep Learning

Advanced Algorithm Design and Analysis

10/16 do now

LinguaFolio Can-Do Language Learning

Design and Implementation of Speech Recognition Systems

Data Mining (and machine learning)

Outline

Functional Behavioral Assessment/ Behavior Intervention Plan (FBA/BIP):

Unsupervised Learning Networks

Max Weber 1864-1920