Reinforcement Learning [Intro]

Reinforcement Learning[Intro] Marco Loog

Introduction • How can an agent learn if there is no teacher around who tells it with every action what’s right and what’s wrong? • E.g., an agent can learn how to play chess by supervised learning, given that examples of states and their correct actions are provided • But what if these examples are not available?

Introduction • But what if these examples are not available? • Through random moves, i.e., exploratory behavior, agent may be able to infer knowledge about the environment it is in • But what is good and what is bad? = necessary knowledge to decide what to do in order to reach its goal

Introduction • But what is good and what is bad? = necessary knowledge to decide what to do in order to reach its goal • ‘Rewarding’ the agent when it did something good and ‘punishing’ it when it did something bad is called reinforcement • Task of reinforcement learning is to use observed rewards to learn a [best] policy for the environment

E.g. [D. Terzopoulos et al.]

E.g. [T. Streeter]

E.g. [K. Sims]

Reinforcement Learning • Use observed rewards to learn an [almost?] optimal policy for an environment • Reward R(s) assigns to every state s a number • Utility of an environment history is [as an example] the sum of the rewards received • Policy describes agent’s action from any state s in order to reach the goal • Optimal policy is policy with highest expected utility

Rewards, Utilities, &c. +1 -1

Reinforcement Learning • How to learn a policy like the previous one? • Complicating factors • Normally, both the environment and the reward function are unknown • In many complex domains reinforcement learning is the only feasible way to success

Reinforcement Learning • Might be considered to encompass all of AI : an agent is dropped off somewhere and it should itself figure everything out • We will concentrate on simple settings and agent designs to keep things manageable • E.g. fully observable environment

3 Agent Designs • Utility-based agents : learns a utility function based on which it chooses actions • Q-learning agent : learns an action value function given the expected utility of taking a given action in a given state • Reflex agent : learns a policy that maps directly from states to actions

More • Next week...

Reinforcement Learning [Intro]