150 likes | 212 Views
Reinforcement Learning [Intro]. Marco Loog. Introduction. How can an agent learn if there is no teacher around who tells it with every action what’s right and what’s wrong?
E N D
Reinforcement Learning[Intro] Marco Loog
Introduction • How can an agent learn if there is no teacher around who tells it with every action what’s right and what’s wrong? • E.g., an agent can learn how to play chess by supervised learning, given that examples of states and their correct actions are provided • But what if these examples are not available?
Introduction • But what if these examples are not available? • Through random moves, i.e., exploratory behavior, agent may be able to infer knowledge about the environment it is in • But what is good and what is bad? = necessary knowledge to decide what to do in order to reach its goal
Introduction • But what is good and what is bad? = necessary knowledge to decide what to do in order to reach its goal • ‘Rewarding’ the agent when it did something good and ‘punishing’ it when it did something bad is called reinforcement • Task of reinforcement learning is to use observed rewards to learn a [best] policy for the environment
Reinforcement Learning • Use observed rewards to learn an [almost?] optimal policy for an environment • Reward R(s) assigns to every state s a number • Utility of an environment history is [as an example] the sum of the rewards received • Policy describes agent’s action from any state s in order to reach the goal • Optimal policy is policy with highest expected utility
Rewards, Utilities, &c. +1 -1
Rewards, Utilities, &c. +1 -1
Reinforcement Learning • How to learn a policy like the previous one? • Complicating factors • Normally, both the environment and the reward function are unknown • In many complex domains reinforcement learning is the only feasible way to success
Reinforcement Learning • Might be considered to encompass all of AI : an agent is dropped off somewhere and it should itself figure everything out • We will concentrate on simple settings and agent designs to keep things manageable • E.g. fully observable environment
3 Agent Designs • Utility-based agents : learns a utility function based on which it chooses actions • Q-learning agent : learns an action value function given the expected utility of taking a given action in a given state • Reflex agent : learns a policy that maps directly from states to actions
More • Next week...