Biological Arm Motion through Reinforcement Learning

Biological Arm Motion through Reinforcement Learning by Jun Izawa, Toshiyuki Kondo, Koji Ito Presented by Helmut Hauser

Overview • biological motivation and basic idea • biological muscle force model • mathematical formulations • reaching task • results and conclusions helmut hauser @ igi

Biological Motivation (1) Reinforcement Learning in biology (Dopamine,…)  In the framework we have a big state and action space (Curse of dimensionality) (2) Multiple muscles produce joint torques • High redundancy • enables the system to maintain robustness and flexibility • increases space Humans can deal with that, but how ?? helmut hauser @ igi

Basic Idea How do humans learn a new motion ? • We coactivate muscles and stiff our joint • Stiffness decreases while learning (feeling „safer“) • Our motions get smoother Maybe there exists some preferred domain in the action space with higher priority in the learning process. Idea: Restricting the learning domain for the action space while learning and then soften restrictions when improving. helmut hauser @ igi

„stiffness“ Muscle force elasticity viscosity lr equilibrium length Muscle force model helmut hauser @ igi

Lower arm 5 θ2 1 3 4 θ1 2 6 upper arm Biological Model helmut hauser @ igi

and some transformations R =GTKG… elasticity λR-1GTK ……Θv D=GTBG … viscosity Merging two worlds Muscle force model Dynamic 2-link model helmut hauser @ igi

Mathematical Formulation Remember: G is constant K = diag (k0+kiui) R = GTKG Θv = λR-1GTK D = GTBG constant helmut hauser @ igi

. Mathematical Formulation pseudoinverse: Orthogonal decomposition: u = u1‘ + u2‘ n = n1‘ + n2‘ Note: 0 ≤ c ≤1 ň = n1‘ + c* n2‘ helmut hauser @ igi

ρ R(J) action space u N(J) θv helmut hauser @ igi

ρ R(J) action space u c N(J) θv helmut hauser @ igi

reward Critic network qt-1 TD error Actor network Noise generator motor command ut Architecture helmut hauser @ igi

goal (GA) Reward model: 1 - cErE for r -cErE for -1 for S start with rE=Σui2 over all 6 muscles Reaching Task helmut hauser @ igi

Some implementation facts • extended input q, since reward model needs u too ! • stiffness R set to rather „high“ values • Neural Network (proposed by Shibata) as a function approximator (backpropagation) • as a second experiment and a load with arbitrary orientation (which stays the same in one trial) is applied within a certain region • Parameter (like noise-parameter, cE of the reward model,…) have to be tuned. helmut hauser @ igi

Results Proposed architecture (compared to a standard approach) • gets more reward • Cummulative reward doesn‘t tend to zero • Energy doesn‘t change in the early stage, decreases after hitting the target. • With extra force: peak of stiffness moves to this area helmut hauser @ igi

Conlusions • Can deal with redundant systems (typical case in nature) • The search noise is restricted to a subspace • A robust controller has been achieved • Some extra tuning was needed (made by evolution ?) Future outlook: • Applying to hierarchical system (more stages) • How to prevent extra tuning ? helmut hauser @ igi

Literature „Biological Robot Arm Motion through Reinforcement Learning“ Jun Izawa, Toshiyuki Kondo, Koji Ito Proceedings of the 2002 IEEE International Conference on Robotics & Automation „Motor Learning Model using Reinforcement Learning with Neural Internal Model“ Jun Izawa, Toshiyuki Kondo, Koji Ito Department of Computational Intelligence and Systems „Biological Robot Arm Motion through Reinforcement Learning“ Jun Izawa, Toshiyuki Kondo, Koji Ito Biol.Xabern. 91, 10.22 (2004) Springer-Verlag 2004 helmut hauser @ igi

Biological Arm Motion through Reinforcement Learning

Biological Arm Motion through Reinforcement Learning

Presentation Transcript

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

REINFORCEMENT LEARNING

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning