490 likes | 686 Views
大脳基底核の神経活動と行動 Neural activity of Basal ganglia and its correlation with behavior. Kazuyuki Samejima ATR Human Information Science Laboratories & `Creating the Brain’ CREST ,JST. 共同研究. 京都府立医科大学 第二生理学教室 上田康雅 助手 木村實 教授 ATR人間情報科学研究所 銅谷賢治. Basal Ganglia.
E N D
大脳基底核の神経活動と行動Neural activity of Basal ganglia and its correlation with behavior Kazuyuki Samejima ATR Human Information Science Laboratories & `Creating the Brain’ CREST ,JST
共同研究 京都府立医科大学 第二生理学教室 上田康雅 助手 木村實 教授 ATR人間情報科学研究所 銅谷賢治
Basal Ganglia • 運動の実行に関係 疾患( パーキンソン病、ハンチントン病) • 系列運動の学習 ( Hikosaka et.al. 1999) • 報酬予測 (Shidara et.al. 1998 , Kawagoe 1998) • 報酬の予測誤差 (Shultz 1997)
Overview • Data analysis of recorded data • Striatum activity in remembered sequential movement • Task design based on reinforcement learning paradigm • Dose the Striatum represent ``Value function’’ used by decision making? • Model of dopamine neuron activity • How dose the basal ganglia calculate temporal difference of reward expectation?
Data analysis of Striatum neurons 共同研究開始当時、基底核は系列運動の学習に関係すると考えられていて、運動系列の短期記憶に関係するのではないかと考えられていた。 • 仮説 視覚誘導 ->PM - M1 記憶誘導 -> SMA – BG
Instructed – remembered motor sequence task (Ueda and Kimura, 1999)
Information analysis • Former analysis method of neural activity • Which type and How many neurons found? • The category of neurons set by experimenter are arbitrary. • If the neurons related task but is not classified by the category, they leave in ‘unclassified’ and information from these neurons are lost. • Information analysis • Total information as a whole of recorded neurons • Time course of information (Sugase et.al. 1999; Kitazawa et.al. 1998)
Information analysis: definition Information of movement sequence Two type of sequence categories are calculated 1. First movement direction ( Left or Right) 2. Second movement type ( Stick or Button )
Information analysis: example H(s|r) Conditioned distribution Totaldistribution →H(s) I(S;R) = H(S)-H(S|R)
Result: first movement direction • After 200msec from first stimulus, the information of direction (L/R information) rose rapidly. • After first movement, a high level L/R information was maintained. • The L/R information slowly rose up 800 msec before the first stimulus • After first movement, a high level L/R information was maintained
Result: Second movement type Instructed task • After 200msec from 2nd stimulus, the S/B information rapidly rose up and was maintained after reward supply. Remembered task • The S/B information was suppressed during the holding period. • After the fist movement, the S/B information rose up
Result: # of significant cell • Significant cells of L/R information increased before first stimulus (B1) in remembered task (orange bar, upper left). • Significant cells of L/B information decreased before first stimulus and increase before second stimulus in remembered task (upper right).
Conclusion • The activity of striate neurons containsinformation of direction and types of movement not only during execution of the movement but also during preparatory delay period of the REMEMBERED task. The neuron activity contains information about next movement but little information about the movement in two steps later.
この解析でわかったこと • 線条体は予測性の反応を示す。 • ただし、直後の運動要素に対する予測がほとんど • 報酬の予測であるのか刺激もしくは行動の予測であるのかは報酬の条件を変えて見ないとわからない。 タスクの条件を変えてみたくなる
Overview • Data analysis of recorded data • Striatum activity in remembered sequential movement • Task design based on reinforcement learning paradigm • Dose the Striatum represent ``Value function’’ (reward expectation) used by decision making? • Model of Basal Ganglia as a reinforcement learner. • How dose the basal ganglia calculate temporal difference of reward expectation?
Ventral striatum activity in approaching stages to reward (Shidara et.al. 1998) Approaching stage is essential for ventral striatal activity
Reward expectation (Kawagoe et.al 1998)
Reinforcement learning Agent maximize getting reward by acting to environment • Model of shaping animal behavior which lead to get reward. Agent/animal Action State Reward Environment
Value function • Expected reward in future • Learning value function and policy • Value function indicate the goodness of state or action. • Policy is selected by value difference of selecting action or predicted state. • Temporal difference error
Substiantia Nigra and Temporal Difference error Dopamine neuron TD error • This suggest that TD learning is implemented in BG network . Schultz et.al.1997
Computational model of striatum • Action value function Q = MatrixState value function V= Striosome • GP makes stochastic action selection. • SNc dopamin neuron carry error information of evaluation (Temporal difference error). (Doya 2000)
Designing new task inspired by reinforcement learning model • Dose the striatum represent “action value function” ? • If neural activity is represent value function • It correlate with Reward expectation • It dose not only represent simple action • It dose not only represent simple stimulus response • The condition of same stimulus, same action, but different reward expectation is needed to prove the activity represent value function. • Action selection based on reward expectation
Related work • Reward prediction (Kawagoe et.al. 1998) • ADR,1DR task– same stimulus and same action but different reward expectation. Stimuls and action is coupled (delayed saccade indicating cue signal) ->Dissociate stimulus and action • Progress in multi stage to reward (Shidara et.al. 1998) • It has only one action (one lever release) -> multiple action and decision making
Stochastic reward and target task • Decision making • Monkey turns the lever left or right. • Stochastic Feedback ( LED indicate reaching target, Reward ) • Reward is delivered with probability P(x) where x is target position.
State transition diagram of task State N: Goal positon is hidden R: Right is goal position L: Left is goal position Action l : selecting left turn r: selecting right turn Reward probability P( R ): when it reach right target P( L ): when it reach left target
サルは隠れたターゲットに気づいて行動しているか?サルは隠れたターゲットに気づいて行動しているか? Yes
Dose the monkey change its action selection by reward probability? Yes
Model: 3 state RL model • Reinforcement learning • Action selection
Prediction : short term • N-state activity and R or L-state activity • Reward expectation Q(N,l) ~ 1/2P(L) Q(R,r) ~ P(R) • Between the internal states which is unknown target and known target position Q
Prediction: Long term change of reward prediction • Fixed alpha and beta prediction model
Estimating parameter and hyper-parameter from behavioral data In previous prediction, we assumed reinforcement learner with fixed learning rate, fixed parameter of stochastic behavior. But, It might change. • We have to estimate parameter and hyper-parameter of learning system. Sequential Monte Carlo method for estimating Q value, learning rate and parameter for stochastic behavior.
Overview • Data analysis of recorded data • Striatum activity in remembered sequential movement • Task design based on reinforcement learning paradigm • Dose the Striatum represent ``Value function’’ used by decision making? • Model of dopamine neural activity • How dose the basal ganglia calculate temporal difference of reward expectation?
Substiantia Nigra and Temproal Difference error Dopamine neuron TD error • This suggest that TD learning is implemented in BG network . Schultz et.al.1997
Classical Conditioning • Reward expectation (value function) V(t) • Rewrad r(t) • Temporal difference error After learning Reward omitting Before learning R Reward: r(t) Reward prediction V(t) TD error d(t) Conditional stimulus
Delay!! Computaional model of BG • TD model ( Barto 1995, Suri 2000) (Barto 1995) What is the biological implimentation of delay for the temporal difference of reward expectation?
Direct-Indirect delay model • Houk et.al 1995 • Direct inhibition pathway -V(t-1) • Indirect disinhibition pathway +V(t)
How dose the BG computes TD of reward expectation? • Network model of BG • Value function V(s) is learned by cortico-striatal synaptic plasticity • Dopamine dependent plasticity in cortico-striatal synapses • Temprol difference is calculated by difference of receptor delay or dynamics between direct-indirect pathway. • Integrate-and-fire typeneuron model
STN GPi GPe Thalamus Basal ganglia network Cortex Striatum matrix Striosome Dopamine - - + - - - SNc SNr - - - Reward
Idea V(s(t)) 1. Cortico-striatal projection computes V(t) GABAB Slow inhibition V(t-Dt) GABAA Fast dis-inhibition V(t) Model Basal ganglia Cortex Striatum matrix Striosome Dopamine 2. Two kinds of temporal property of synaptic current calculate temporal difference of the value V. - - SNc SNr - 3. Dopamine mediated LTP and LTD in cortico-striatal synapse. Reward r(t)
Integrate-and-fire (IF) neuron model • A single neuron model • Synaptic current • Exponential function • Spontaneous activity • Random current injection • Threthold dynamics and reset membrane potential Input pulse Ic(t) GABAA GABAB
Dopamine dependent prastisity in Cortico-striatal connection • SNc dopamine regulates synaptic plasticity in rat neostriatum (Reynolds & Wickens 2001) • Dopamine sensitive plasticity
Simulation 8 neurons as cortical input Inputs Cortex 100 msec ,50Hz • Leaky integrate and fire neuron with Glutamate, GABAA and GABAB receptor • Dopamine mediated LTP and LTD • Orthogonal traveling activity as frontal cortex activity input … + Matrix Striosome CS R Slow Synapse t = 100msec Fast synapse t = 10msec Reward - - SNc SNr - + 10 Hz 50 Hz Reward
Result : Shifting phasic activity Model Before CS only Experiment CS Reward only After conditioning R • (Schultz et.al.1997)
Result : Omitted reward Model Experiment R CS (no R) CS (Schultz et.al.1997)
Prediction • Tdelay>100 msec • Activity shifting through learning 1-20 21-40 41-60 61-80 81-100 cs R