330 likes | 342 Views
Explore the anatomy and functions of ACCs and ACCg in learning from past actions and social interactions. Learn about reward processing, connection differences, and integration length determination. Discover the impact on learning rates and behavioral decisions.
E N D
Learning, Volatility and the ACC • Tim Behrens • FMRIB + Psychology, University of Oxford • FIL - UCL.
i-1 0.8 i-2 i-3 i-4 i-5 i-6 i-7 i-8 CON 0.7 0.6 0.5 0.4 0.3 Reward History Weight (β) 0.2 0.1 0.0 -0.1 -0.2 Trials Into Past B Kennerley, et al., Nature Neuroscience, 2006
i-1 0.8 i-2 i-3 i-4 i-5 i-6 i-7 i-8 CON 0.7 0.6 0.5 0.4 0.3 Reward History Weight (β) 0.2 0.1 0.0 -0.1 -0.2 Trials Into Past B ACCs Kennerley et al. Nature Neuroscience, 2006
ACCG Monkeys will sacrifice food opportunities to look at other monkeys Rudebeck,et al. Science 2005
ACCG Interest in other individuals is reduced after ACC gyrus lesion Rudebeck,et al. Science 2005
Anatomy - Differences in connections between ACCs and ACCg. • Connections unique to the sulcus are mainly with motor regions: • Primary motor cortex • Premotor cortex • Parietal motor areas • Spinal Cord • ACCs has information about our own actions
Anatomy - Differences in connections between ACCs and ACCg. • Connections unique to the gyrus are mainly with regions that process emotional and biological stimuli: • Periacqueductal grey • hypothalamus • STS/STG • Insula/Temporal pole connections are stronger to the gyrus • ACCg has access to information about other agents.
Anatomy - shared connections between ACCs and ACCg. • Some shared connections • Orbitofrontal cortex • Amydala • Ventral striatum • ACCg and ACCs are strongly interconnected • Both regions have access to and influence over reward and value processing.
i-1 0.8 i-2 i-3 i-4 i-5 i-6 i-7 i-8 CON 0.7 0.6 0.5 0.4 0.3 Reward History Weight (β) 0.2 0.1 0.0 -0.1 -0.2 Trials Into Past B ACCs Kennerley et al. Nature Neuroscience, 2006
i-1 0.8 i-2 i-3 i-4 i-5 i-6 i-7 i-8 CON 0.7 0.6 0.5 0.4 Reward History Weight (β) 0.3 0.2 0.1 0.0 -0.1 -0.2 Trials Into Past What determines the integration length? Kennerly et al. Nat Neurosci 2006 Sugrue et al. Science 2005
i-1 0.8 i-2 i-3 i-4 i-5 i-6 i-7 i-8 CON 0.7 0.6 0.5 0.4 Reward History Weight (β) 0.3 0.2 0.1 0.0 -0.1 -0.2 Trials Into Past VOLATILE Reward probabilities change approximately every 25 trials STABLE Reward probabilities change only after hundreds of trials Kennerly et al. Nat Neurosci 2006 Sugrue et al. Science 2005
α x δ prediction (Vt) outcome new prediction (Vt+1) δ Reinforcement learning • We need to continually re-appraise the value of an action based each new experience.
The learning rate is the weight given to the current information The prediction error is the information available from this event Updating beliefs on the basis of new information Vt+1=Vt +( α x δ ) 14
The learning rate and the value of information. Vt+1=Vt +( α x δ ) The learning rate should represent the value of the current information for guiding future beliefs.
α=0.01 α=0.1 α=0.4 Relationship with integration length
stable 37 63 Behrens et al., Nature Neuroscience, 2007
Vt+1=Vt+α x δ Behrens, Woolrich, Walton, Rushworth, Nature Neuroscience, 2007
changes in reward estimates occur throughout the task… …as do change in volatility estimates Behrens, Woolrich, Walton, Rushworth, Nature Neuroscience, 2007
Monitor x Volatility Decide Monitor Behrens et al., Nature Neuroscience, 2007
ACC effect size predicts learning rate across subjects Behrens, Woolrich, Walton &Rushworth Nat Neurosci 2007
ACCG Interest in other individuals is reduced after ACC gyrus lesion Rudebeck et al. Science 2005
Learning about other agents 37 63 Behrens, Hunt, Woolrich, Rushworth Nature 2008
Value of action information Value of social information Sources of information Probability that correct colour is blue Probability that confederate advice is good Behrens, Hunt, Woolrich, Rushworth Nature 2008
Reward Prediction Error Vt+1=Vt +( α xδ ) Reward - Expectation Outcome Effect size Time Behrens, Hunt, Woolrich, Rushworth Nature 2008
Prediction error on a social partner. Vt+1=Vt +( α xδ ) Lie event - Lie prediction Outcome Effect size Time Behrens, Hunt, Woolrich, Rushworth Nature 2008
The value of information and the ACC Vt+1=Vt +( αx δ ) Value of reward information Value of social information 30
Combining Information to drive behaviour Vt+1=Vt+( α x δ )
Conclusions • ACC codes a learning signal when information is observed. • This signal predicts the speed of learning. • Learning from our own and others’ actions are processed in parallel in ACCs and ACCg. • The outputs of these parallel learning processes are combined in the reward system.
Acknowledgments • Matthew Rushworth • Mark Woolrich • Laurence Hunt • Mark Walton 33