Reinforcement Learning

Presented by: Kyle Feuz Reinforcement Learning

Outline • Motivation • MDPs • RL • Model-Based • Model-Free • Q-Learning • SARSA • Challenges

Examples • Pac-Man • Spider

MDPs • 4-tuple (State, Actions, Transitions, Rewards) .

Important Terms • Policy • Reward Function • Value Function • Model

Model-Based RL • Learn transition function • Learn expected rewards • Compute the optimal policy

Model-Free RL • Learn expected rewards/values • Skip learning transistion function • Trade-offs?

Basic Equations

Examples • Pac-Man • Spider • Mario

Q-Learning Q(s, a) = = (1 − α)Q(s, a) + α[R(s, s′ ) + Max Q(s′ , a′ )]

Q-Learning • Demo Video

SARSA Q-Learning Q(s, a) = = (1 − α)Q(s, a) + α[R(s, s′ ) + Q(s′ , a′ )]

Challenges • Explore vs. Exploit • State Space representation • Training Time • Multiagent Learning • Moving Target • Competive or Cooperative

Transfer Learning for Reinforcement Learning on a Physical Robot • Applied TL and RL on Nao robot • TL using the q-value reuse approach • RL uses SARSA variant • State space is represented via CMAC • Neural Network inspired by the cerebellum • Acts as an associative memory • Allows agents to generalize the state space

Agent Model

SARSA Update Rule Q(s, a) = = (1 − α)Q(s, a) + α[R(s, s′ ) + γe(s, a)Q(s′ , a′ )]

Q-Value Reuse Q(s, a) = = Qsource (χX (s), χA (a)) + Qtarget (s, a)

Experimental Setup • Seated Nao robot • Hit the ball at 45 angle • 5 Actions in Source – 9 Actions in Target

Robot Results

Simulator Results

Advanced Combinations

Examples • Pac-Man • Spider • Mario • Q-Learning • Penalty Kick • Others

References and Resources • rl repository • rl-community • rl on PBWorks • rl warehouse • Reinforcement Learning: An Introduction • Artificial Intelligence: A Modern Approach • How to Make Software Agents do the Right Thing

Questions?

Reinforcement Learning