190 likes | 292 Views
Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein MTAT.03.292 Seminar in Computational Neuroscience Zurab Bzhalava. Introduction. Operant Learning
E N D
Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein MTAT.03.292 Seminar in Computational Neuroscience Zurab Bzhalava
Introduction • Operant Learning • Dominant computational approach to model operant learning is model-free RL • Human behavior is far more complex • Remaining Challenges
Reinforcement Learning RL: A class of learning problems in which an agent interacts with an unfamiliar, dynamic and stochastic environment Goal: Learn a policy to maximize some measure of long-term reward
Markov Decision Process • A (finite) set of states S • A (finite) set of actions A • Transition Model: T(s, a, s’) = P(s’ | a ,s) • Reward Function: R(s) • ᵧ is a discount factor ᵧ ∈ [0; 1] • Policy π • Optimal policy π*
Markov Decision Process Bellman equation:
Biological Algorithms • Behavioral control • Evaluate the world quickly • Choose appropriate behavior based on those valuations
midbrain's dopamine neurons • Central role in guiding our behavior and thoughts • Valuation of our world • Value of money • Other human being • Major role in decision-making • Reward-dependent learning • Malfunction in mental illness • Related to Parkinson's disease. • Schizophrenia
Reinforcement signals define an agent's goals • organism is in state X an receives reward information; • organism queries stored value of state X; • organism updates stored value of state X based on current reward information; • organism selects action based on stored policy • organism transitions to state Y and receives reward information.
The reward-prediction error hypothesis Difference between the experienced and predicted “reward” of an event • Neurons of the ventral tegmental area • phasic activity changes encode a 'prediction error about summed future reward'
Human reward responses • Orbitofrontal Cortex (OFC) • Amygdala (Amyg) • Nucleus Accumbens • Sublenticularextended amygdala • Hypothalamus (Hyp) • Ventral Tegmental Area (VTA)
Model-based RL vs Model-free RL • goal-directed vs habitual behaviors • Implemented by two anatomically distinct systems (subject of debate) • Some findings suggest: • Medial striatum is more engaged during planning • Lateral striatum is more engaged during choices in extensively trained tasks
Model-based RL vs Model-free RL (b) Model-free RL (c) Model-based RL Human subjects in exhibited a mixture of both effects.
Challenges in relating human behavior to RL algorithms • Humans tend to alternate rather than repeat an action after receiving a positively surprising payoff • Tremendous heterogeneity in reports on human operant learning • Probability matching or not
Heterogeneity in world model Questions?
Learning the world model Questions?
Reference List: • Reinforcement learning and human behavior HananShteingartand Yonatan Loewenstein • The ubiquity of model-based reinforcement learning Bradley B Doll Dylan A Simon3 and Nathaniel D Daw • Computational roles for dopamine in behavioral control P. Read Montague1,2, Steven E. Hyman3 & Jonathan D. Cohen4,5