250 likes | 261 Views
Explore how the brain learns values to make decisions and maximize rewards in a complex world. Learn about dopamine, TD learning algorithms, and impacts of dopamine-related disorders. Discover the science behind reward processing and dopamine release.
E N D
The computational problem The goal is to maximize the sum of rewards
The computational problem The value of the stateS1 depends on the policy If the animal chooses ‘right’ at S1,
How to find the optimal policy in a complicated world? • If values of the different states are known then this task is easy
How to find the optimal policy in a complicated world? • If values of the different states are known then this task is easy How can the values of the different states be learned?
V(St) = the value of the state at time t rt = the (average) reward delivered at time t V(St+1) = the value of the state at time t+1
The TD (temporal difference) learning algorithm where is the TD error.
Dopamine is good • Dopamine is released by rewarding experiences, e.g., sex, food • Cocaine, nicotine and amphetamine directly or indirectly lead to an increase of dopamine release • Neutral stimuli that are associated with rewarding experiences result in a release of dopamine • Drugs that reduce dopamine activity reduce motivation, cause anhedonia (inability to experience pleasure) • Long-term use may result in dyskinesia (diminished voluntary movements and the presence of involuntary movements)
No dopamine is bad (Parkinson’s disease) • Bradykinesia – slowness in voluntary movement such as standing up, walking, and sitting down. This may lead to difficulty initiating walking, but when more severe can cause “freezing episodes” once walking has begun. • Tremors – often occur in the hands, fingers, forearms, foot, mouth, or chin. Typically, tremors take place when the limbs are at rest as opposed to when there is movement. • Rigidity – otherwise known as stiff muscles, often produce muscle pain that is increased during movement. • Poor balance – happens because of the loss of reflexes that help posture. This causes unsteady balance, which oftentimes leads to falls.
2 3 4 9 1 6 7 8 5 CS Reward Before trial 1: • In trial 1: • no reward in states 1-7 • reward of size 1 in states 8
2 3 4 9 1 6 7 8 5 CS Reward Before trial 2: In trial 2, for states 1-6 For state 7,
2 3 4 9 1 6 7 8 5 CS Reward Before trial 2: For state 8,
2 3 4 9 1 6 7 8 5 CS Reward Before trial 3: In trial 2, for states 1-5 For state 6,
2 3 4 9 1 6 7 8 5 CS Reward Before trial 3: For state 7, For state 8,
2 3 4 9 1 6 7 8 5 CS Reward After many trials Except for the CS whose time is unknown
“We found that these neurons encoded the difference between the current reward and a weighted average of previous rewards, a reward prediction error, but only for outcomes that were better than expected”. Bayer and Glimcher, 1998