450 likes | 591 Views
Computational Neuromodulation. Peter Dayan Gatsby Computational Neuroscience Unit University College London. Nathaniel Daw Sham Kakade Read Montague John O’Doherty Wolfram Schultz Ben Seymour Terry Sejnowski Angela Yu. 5. Diseases of the Will Contemplators
E N D
Computational Neuromodulation Peter Dayan Gatsby Computational Neuroscience Unit University College London Nathaniel Daw Sham Kakade Read Montague John O’Doherty Wolfram Schultz Ben Seymour Terry Sejnowski Angela Yu
5. Diseases of the Will • Contemplators • Bibliophiles and Polyglots • Megalomaniacs • Instrument addicts • Misfits • Theorists
Theorists • There are highly cultivated, wonderfully endowed minds whose wills suffer from a particular form of lethargy. Its undeniable symptoms include a facility for exposition, a creative and restless imagination, an aversion to the laboratory, and an indomitable dislike for concrete science and seemingly unimportant data… When faced with a difficult problem, they feel an irresistible urge to formulate a theory rather than question nature. • As might be expected, disappointments plague the theorist…
Computation and the Brain • statistical computations • representation from density estimation (Terry) • combining uncertain information over space, time, modalities for sensory/memory inference • learning as a hierarchical Bayesian problem • learning as a filtering problem • control theoretic computations • optimising rewards, punishments • homeostasis/allostasis
Ethology Psychology classical/operant conditioning Computation dynamic programming Kalman filtering Algorithm TD/delta rules Conditioning prediction: of important events control: in the light of those predictions policy evaluation policy improvement • Neurobiology neuromodulators; amygdala; OFC; nucleus accumbens; dorsal striatum
R Dopamine • drug addiction, self-stimulation • effect of antagonists • effect on vigour • link to action • `scalar’ signal R L R L Schultz et al R no prediction prediction, reward prediction, no reward
Prediction, but What Sort? • Sutton: predict sum future reward TD error
Rewards rather than Punishments TD error L R V(t) R no prediction prediction, reward prediction, no reward dopamine cells in VTA/SNc Schultz et al
Prediction, but What Sort? • Sutton: • Watkins: policy evaluation predict sum future reward TD error
Policy Improvement • Sutton: define p(x;M) do R-M on: uses the same TD error • Watkins: value iteration with
Active Issues • exploration/exploitation • model-based (PFC)/cached (striatal) methods • motivational influences • vigour • hierarchical control (PFC) • hyperbolic discounting, Pavlovian misbehavior and ‘the will’ • representational learning • appetitive/aversive opponency • links with behavioural economics
Computation and the Brain • statistical computations • representation from density estimation (Terry) • combining uncertain information over space, time, modalities for sensory/memory inference • learning as a hierarchical Bayesian problem • learning as a filtering problem • control theoretic computations • optimising rewards, punishments • homeostasis/allostasis • exploration/exploitation trade-offs
Uncertainty We focus on two different kinds of uncertainties: • expected uncertainty from known variability or ignorance ACh • unexpected uncertainty due to gross mismatch between prediction and observation NE Computational functions of uncertainty: • weaken top-down influence over sensory processing • promote learning about the relevant representations
Norepinephrine • vigilance • reversals • modulates plasticity? exploration? • scalar
Aston-Jones: Target Detection detect and react to a rare target amongst common distractors • elevated tonic activity for reversal • activated by rare target (and reverses) • not reward/stimulus related? more response related?
Vigilance Task • variable time in start • η controls confusability • one single run • cumulative is clearer • exact inference • effect of 80% prior
Phasic NE • onset response from timing • uncertainty (SET) • growth as P(target)/0.2 rises • act when P(target)=0.95 • stop if P(target)=0.01 • arbitrarily set NE=0 after • 5 timesteps (small prob of reflexive action)
Four Types of Trial 19% 1.5% 1% 77% fall is rather arbitrary
Response Locking slightly flatters the model – since no further response variability
Interrupts/Resets (SB) PFC/ACC LC
Active Issues • approximate inference strategy • interaction with expected uncertainty (ACh) • other representations of uncertainty • finer gradations of ignorance
Computation and the Brain • statistical computations • representation from density estimation (Terry) • combining uncertain information over space, time, modalities for sensory/memory inference • learning as a hierarchical Bayesian problem • learning as a filtering problem • control theoretic computations • optimising rewards, punishments • homeostasis/allostasis • exploration/exploitation trade-offs
Computational Neuromodulation • general: excitability, signal/noise ratios • specific: prediction errors, uncertainty signals
Learning and Inference • Learning: predict; control ∆ weight (learning rate) x (error) x (stimulus) • dopamine phasic prediction error for future reward • serotonin phasic prediction error for future punishment • acetylcholine expected uncertainty boosts learning • norepinephrine unexpected uncertainty boosts learning
Learning and Inference context expected uncertainty unexpected uncertainty top-down processing NE ACh cortical processing prediction, learning, ... bottom-up processing sensory inputs
Temporal Difference Prediction Error High Pain 0.8 1.0 0.2 0.2 Low Pain 0.8 1.0 predict sum future pain: TD error ∆ weight (learning rate) x (error) x (stimulus)
Temporal Difference Prediction Error TD error Value Prediction error High Pain 0.8 1.0 0.2 0.2 Low Pain 0.8 1.0
Prediction error Temporal Difference Prediction Error experimental sequence….. A – B – HIGH C – D – LOW C – B – HIGH A – B – HIGH A – D – LOW C – D – LOW A – B – HIGH A – B – HIGH C – D – LOW C – B – HIGH MR scanner TD model Brain responses ? Ben Seymour; John O’Doherty
TD prediction error: ventral striatum Z=-4 R
Temporal Difference Values dorsal raphe? right anterior insula
Rewards rather than Punishments TD error L R V(t) R no prediction prediction, reward prediction, no reward dopamine cells in VTA/SNc Schultz et al
TD Prediction Errors • computation: dynamic programming and optimal control • algorithm: ongoing error in predictions of the future • implementation: • dopamine: phasic prediction error for reward; tonic punishment • serotonin: phasic prediction error for punishment; tonic reward • evident in VTA; striatum; raphe? • next: action; motivation; addiction; misbehavior
Task Difficulty • set η=0.65 rather than 0.675 • information accumulates over a longer period • hits more affected than cr’s • timing not quite right
Intra-trial Uncertainty • phasic NE as unexpected state change within a model • relative to prior probability; against default • interrupts (resets) ongoing processing • tie to ADHD? • close to alerting (AJ) – but not necessarily tied to behavioral output (onset rise) • close to behavioural switching (PR) – but not DA • farther from optimal inference (EB) • phasic ACh: aspects of known variability within a state?
Where Next • dopamine • tonic release and vigour • appetitive misbehaviour and hyperbolic discounting • actions and habits • psychosis • serotonin • aversive misbehaviour and psychiatry • norepinephrine • stress, depression and beyond
Experimental Data • ACh&NEhave similar physiological effects • suppress recurrent & feedback processing • enhance thalamocortical transmission • boost experience-dependent plasticity (e.g. Kimura et al, 1995; Kobayashi et al, 2000) (e.g. Gil et al, 1997) (e.g. Bear & Singer, 1986; Kilgard & Merzenich, 1998) • ACh&NEhave distinct behavioral effects: • ACh boosts learning to stimuli with uncertain • consequences • NEboosts learning upon encountering global • changes in the environment (e.g. Bucci, Holland, & Gallagher, 1998) (e.g. Devauges & Sara, 1990)
Model Schematics context expected uncertainty unexpected uncertainty top-down processing NE ACh cortical processing prediction, learning, ... bottom-up processing sensory inputs
Attention Example 1: Posner’s Task cue cue high validity low validity cue stimulus location stimulus location target sensory input sensory input response (Phillips, McAlonan, Robb, & Brown, 2000) attentional selection for (statistically) optimal processing, above and beyond the traditional view of resource constraint 0.1s 0.1s 0.2-0.5s 0.15s generalize to the case that cue identity changes with no notice
Formal Framework ACh NE variability in identity of relevant cue variability in quality of relevant cue cues: vestibular, visual, ... target: stimulus location, exit direction... avoid representing full uncertainty Sensory Information
nicotine scopolamine validity effect concentration concentration (Phillips, McAlonan, Robb, & Brown, 2000) increase ACh decrease ACh validity effect 100 120 140 100 80 60 % normal level % normal level Simulation Results: Posner’s Task vary cue validity vary ACh fix relevant cue low NE
Maze Task cue 1 cue 2 relevant irrelevant reward cue 1 cue 2 (Devauges & Sara, 1990) irrelevant relevant reward example 2: attentional shift no issue of validity
experimental data model data % Rats reaching criterion % Rats reaching criterion change relevant cue NE No. days after shift from spatial to visual task No. days after shift from spatial to visual task (Devauges & Sara, 1990) Simulation Results: Maze Navigation fix cue validity no explicit manipulation of ACh
true & estimated relevant stimuli neuromodulation in action validity effect (VE) trials Simulation Results: Full Model
Simulated Psychopharmacology 50% NE ACh compensation 50% ACh/NE NE can nearly catch up
Summary • single framework for understanding ACh, NE and some • aspects of attention • ACh/NE as expected/unexpected uncertainty signals • experimental psychopharmacological data replicated by model simulations • implications from complex interactions between ACh & NE • predictions at the cellular, systems, and behavioral levels • activity vs weight vs neuromodulatory vs population representations of uncertainty