Dopamine, Uncertainty and TD Learning

Dopamine, Uncertainty and TD Learning Yael Niv Michael Duff Peter Dayan Gatsby Computational Neuroscience Unit, UCL CNS 2004

Dorsal Striatum (Caudate, Putamen) Prefrontal Cortex Nucleus Accumbens (Ventral Striatum) Amygdala Substantia Nigra Ventral Tegmental Area What is the function of Dopamine? • Parkinson’s Disease • -> Movement control? • Intracranial self-stimulation; • Drug addiction • -> Reward pathway? • -> Learning? • Also involved in: • - Working memory • - Novel situations • ADHD • Schizophrenia • …

Unpredicted reward (neutral/no stimulus) Predicted reward (learned task) Omitted reward (probe trial) What does phasic Dopamine encode? (Schultz etal.)

Temporal difference error The TD Hypothesis of Dopamine • Phasic DA encodes a reward prediction error • Precise theory for generation of DA firing patterns • Compelling account for the role of DA in classical conditioning (Sutton+Barto 1987, Schultz,Dayan,Montague 1997)

Stimulus = 2 sec visual stimulus Reward (probabilistic) = drops of juice But: Fiorillo, Tobler & Schultz 2003 • Introduce inherent uncertainty into the classical conditioning paradigm • Five visual stimuli indicating different reward probabilities: P= 100%, 75%, 50%, 25%, 0%

Fiorillo, Tobler & Schultz 2003 At stimulus time - DA represents mean expected reward Delay activity - A ramp in activity up to reward Hypothesis:DA ramp encodes uncertainty in reward

“Uncertainty Ramping” and TD error? • The uncertainty is predictablefrom the stimulus • TD predicts away predictable quantities • If it represents uncertainty, the ramping activity should disappear with learning according to TD. • Uncertainty ramping is not easily compatible with the TD hypothesis Are the ramps really coding uncertainty?

p = 50% p = 75% A closer look at FTS’s results At time of reward: • Prediction errors result from probabilistic reward delivery • Crucially: Positive and negative errors cancel out

DA 270% δ(t) 55% A TD Resolution: • TD prediction error δ(t) can be positive or negative • Neuronal firing rate is only positive (negative values can be encoded relative to base firing rate) But: DA base firing rate is low -> asymmetric encoding ofδ(t)

Simulating TD with asymmetric errors Negative δ(t) scaled by d=1/6 prior to PSTH summation Learning proceeds normally (without scaling) • Necessary to produce the right predictions • Can be biologically plausible

Experiment Model DA - Uncertainty or Temporal Difference? With asymmetric coding of errors, the mean TD error at the time of reward  p(1-p) => Maximal at p=50% However: • No need to assume explicit coding of uncertainty - Ramping is explained by neural constraints. • Explanation for puzzling absence of ramp in trace conditioning results. • Experimentaltest: Ramp as within or between trial phenomenon? Challenges: TD and noise; Conditioned inhibition, additivity

CS = short visual stimulus Trace period US (probabilistic) = drops of juice Trace conditioning: A puzzle and its resolution • Same (if not more) uncertainty, but no DA ramping(Fiorillo et al.; Morris, Arkadir, Nevet, Vaadia & Bergman) • Resolution: lower learning rate in trace conditioning eliminates ramp

σ = 0.0577 σ = 0.0866 Mirenowicz and Schultz (1996) σ = 0.1155 prediction error weights Other sources of uncertainty: Representational Noise (1) • Rate coding is inherently stochastic • Add noise to tapped delay line representation => TD learning is robust to this type of noise

ε = 0.05 ε = 0.10 Other sources of uncertainty: Representational Noise (2) • Neural timing of events is necessarily inaccurate • Add temporal noise to tapped delay line representation => Devastating effects of even small amounts of temporal noise on TD predictions

Dopamine, Uncertainty and TD Learning