1 / 27

Goal Directed Reaching with the Motor Cortex Model

Goal Directed Reaching with the Motor Cortex Model. Cheol Han Feb 20, 2007. Introduction. Goal: A computational model for goal directed reaching movement with biologically plausible motor cortex model, which can explain 1. neural coding in the motor cortex

arlene
Download Presentation

Goal Directed Reaching with the Motor Cortex Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Goal Directed Reaching with the Motor Cortex Model Cheol Han Feb 20, 2007

  2. Introduction • Goal: A computational model for goal directed reaching movement with biologically plausible motor cortex model, which can explain • 1. neural coding in the motor cortex • 2. relationship between skill learning and map formation • 3. reorganization of the motor cortex after lesion with improvement of movement

  3. Overview • Dual Map • Motor output map • Motor input map • Models • Arm model with Hill-type muscles • Cortex model • Reinforcement learning framework • Results • Discussion

  4. Directional coding Georgeopoulos, 1986

  5. Dual Map • Two views of neural coding in the motor cortex • Low-level, Muscle coding (Evarts…) • High-level, Kinematic coding (Georgeopoulos, 1986) • Both or Intermediate (joint) coding • We hypothesized • Motor cortex output map: mainly encodes low-level muscle coding • Motor input map: high-level kinematic coding

  6. Learning goal directed movements with Actor-Critic Trajectory Planner Kinematic Coding Critic (Basal Ganglia) • Learning feed-forward controllerusing temporal difference learning and actor-critic architecture (Sutton, 1984, Barto et al.1983) • Biologically plausible (dopamine and/or acetylcholine modulation of LTP in motor cortex) • Continuous time and space (K Doya, 2000) • Similar approaches • Bissmarck et al, 2005. • Jun Izawa et al, 2004. Temporal Difference Learning (Dopamine neurons) Motor Cortex Model Competitive Hebbian Learning Motoneurons (Spinal Cord) Arm Model With muscles

  7. Motor output map • ICMS may exhibit characteristics of corticospinal projections • Monosynapticprojections from some M1 neurons to motoneurons • Fetz and Cheney, 1980; Lemon et al., 1986 • Todorov(2003), Donoghue group’s Motor Cortex Model Motoneurons (Spinal Cord)

  8. Motor input map • Motor cortex neural recording during voluntary movements (i.e. Georgeopoulos) • Activation level in the voluntary movement tends to be similar to the high level’s coding, kinematic coding Kinematic Coding Motor Cortex Model

  9. Models • Motor output map • Competitive Hebbian learning with a motor cortex model • Reversed feature extraction • Motor input map • Temporal difference Reinforcement learning

  10. Arm model • Arm model: 2 links on the horizontal plane • 6 muscles with Hill-type muscle model • Shoulder Extensor (E), Shoulder Flexor (F) • Elbow Extensor (O), Elbow Flexor (C) • Biarticular Extensor (B) and Flexor (T) • An accurate arm model is important • Todorov (2002) mentioned characteristic may be propagated from bottom to up. • Ning Lan(2002), Zajac (1989), Katayama (1993), Cheng et al., (2000), Spoelstra et al.(2000) (from Spoelstra et al., 20001)

  11. Motor Cortex model • Chernjavsky and Moody, 1990 • 2 layer with GABA neurons. • Shunting inhibitory GABA neurons • Mexican Hat activation • Shunting inhibition (Douglas et al., 1995; Prescott et al., 2003) PYR GABA

  12. Model Diagram Trajectory Generator Joint static Level Planning ACTOR CRITIC Inverse Dynamics Evaluator Of Mvmt • Our motor cortex model includes the inverse dynamics and the inverse muscle model. • How do we learn it in a biologically plausible manner? • Using reinforcement learning • Provides an evaluation of the movement • Implementation with temporal difference learning based on the actor-critic structure • Similar approaches • Bissmarck et al, 2005. • Jun Izawa et al, 2004. Joint “force” Level Planning TD error Inverse Muscle Model Muscle Level Planning Motoneurons Arm

  13. Actor-Critic Model (Sutton, 1984) • “Actor” produces a motor command • The motor command feeds into the plant. • “Critic” evaluate how good the movement was, compared with previous expectations (TD error) • Update “Actor” based on Critic’s evaluation. • Update “Critic”. If the actor is improved, the critic can expect better movements. • The worse movement than what the critic expected is discarded. Trajectory Generator ACTOR CRITIC MOTOR CORTEX Evaluator Of Mvmt TD error Arm

  14. Actor: compute the motor commands Kinematic planning • Example of Actor: Bissmarck et al, 2005. • Coding of kinematic variables • Distributed coding Action pool: preferred torques • The layer contains action unit which is tuned to “preferred torque” • Competition between these preferred torques using softmax. • Pi is the probability to be chosen, shown as bar in the diagram. • Modifiable weights w exist between kinematic planning signal and preferred torques • Exploration using action perturbation TD w Preferred Torque Layer pi Torque (Joint Force)

  15. Critic: providing the reward prediction error for actor learning • Temporal Difference Learning • Critic learns the reward prediction error by the temporal difference learning • The reward is generally delayed • This prediction of reward helps to help generate correct action choice before the reward is received (temporal credit assignment problem). • K Doya, 2000. in continuous time and space • Critic: The Basal Ganglia and dopamine neurons • Dopamine neuron carry TD error (Schultz 1998) • Reward prediction error is learned in the basal ganglia (O’Doherty Science 2004)

  16. Critic: immediate reward • A large reward is given at the goal. • The reward function over space does not have to be continuous. However, if it is continuous, it helps to find a good movement. • The reward function bellow is (Bissmarck et al., 2005)

  17. Critic: Reward prediction error • The total predicted reward at the current state includes discounted future rewards • A critic learns this predicted error at the current state • Delta shows how much action made difference between expected reward and real reward. • If it’s positive, the action was good. t t0

  18. Critic: Reward prediction error • Example: Dopamine neuron CS US A well trained critic produced Just before reward is expected to be given Reward given If there is no reward, because a well trained critic expected , delta become negative. No reward

  19. Results (1): Motor output map • Motor output map of the cortex model • Map representation is the muscle coding

  20. Results (2) : Motor output map • 50 msec random stimulation on the motor cortex • Motoneuron pattern shows ‘determined’ preferred direction. • Actually, motoneuron is tuned to preferred “torque”. However, at a fixed starting posture, preferred torque implies preferred direction

  21. Results (3) : Motor input map • NOT FINISHED, NEED TUNING OF REINFORCEMENT LEARNING • Movement is not fully learned • Motor input map • Activation of the motor cortex during a voluntary movement. • Broad activation (on 20% of movement time) • Similar direction has similar pattern

  22. Results (4) : Motor input map • Population code • During the first 20% of time • Excluded insignificantly tuned neurons (about half among 400 neurons)

  23. Short Discussion • Neural coding and regression • Tuning curve over directions • Cosine • Sharper than cosine • Truncated cosine • Advantage of population coding • Two ways of neural coding

  24. Neural coding and regression • Cricket detects wind direction with four neurons. ci is pre-tuned (preferred) wind direction of the ith neuron, and ri is its firing rate. • Regression error is the smallest where the preferred direction exists. (Its tuning curve is a truncated cosine function) • INFERENCE AND COMPUTATION WITH POPULATION CODES (Pouget A, Dayan P, Zemel RS. 2003)

  25. Tuning Curve • If the tuning curve is cosine function as Georgeopoulos (1986) • Perfect reconstruction using basis • If the tuning curve is sharper than cosine function • Recently, sharper tuning curve has been reported (Paninski et al., 2004; Scott et al., 2001) • Distortion exists. (Regression error)

  26. Advantage of population coding • Low regression error: • Ideally, if preferred direction exist for all different directions. (Pouget et al., 2003) • Strong to noisy input • Pouget et al., 2003 • Less variability in the motor control • Assumes signal dependent noise (SDN) • Use more muscles, less variability (Todorov, 2002)

  27. Future work • Fine tuning of Reinforcement learning • Cerebellum • Concurrent Learningof Motor input map and Motor output map • Sensory cortex, which may be related to the feedback control • “Premotor cortex” for inverse kinematic coding (Action sensory coding, currently implemented with SOM)

More Related