Chapter 8: Generalization and Function Approximation

Chapter 8: Generalization and Function Approximation • Look at how experience with a limited part of the state set be used to produce good behavior over a much larger part. • Overview of function approximation (FA) methods and how they can be adapted to RL Objectives of this chapter:

Value Prediction with FA As usual: Policy Evaluation (the prediction problem): for a given policy p, compute the state-value function In earlier chapters, value functions were stored in lookup tables. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Adapt Supervised Learning Algorithms Training Info = desired (target) outputs Supervised Learning System Inputs Outputs Training example = {input, target output} Error = (target output – actual output) R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Backups as Training Examples As a training example: input target output R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Any Function Approximation Method? • In principle, yes: • artificial neural networks • decision trees • multivariate regression methods • etc. • But RL has some special requirements: • usually want to learn while interacting • ability to handle nonstationarity • other? R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Gradient Descent Methods transpose R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Performance Measures • Many are applicable but… • a common and simple one is the mean-squared error (MSE) over a distribution P to weight various errors: • Why P ? • Why minimize MSE? • Let us assume that P is always the distribution of states at which backups are done. • The on-policy distribution: the distribution created while following the policy being evaluated. Stronger results are available for this distribution. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Gradient Descent Iteratively move down the gradient: R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Gradient Descent Cont. For the MSE given above and using the chain rule: R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Gradient Descent Cont. Assume that states appear with distribution P Use just the sample gradient instead: Since each sample gradient is an unbiased estimate of the true gradient, this converges to a local minimum of the MSE if a decreases appropriately with t. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

But We Don’t have these Targets R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

What about TD(l) Targets? R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

On-Line Gradient-Descent TD(l) R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Linear Methods R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Nice Properties of Linear FA Methods • The gradient is very simple: • For MSE, the error surface is simple: quadratic surface with a single minumum. • Linear gradient descent TD(l) converges: • Step size decreases appropriately • On-line sampling (states sampled from the on-policy distribution) • Converges to parameter vector with property: best parameter vector (Tsitsiklis & Van Roy, 1997) R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Coarse Coding R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Learning and Coarse Coding R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Tile Coding • Binary feature for each tile • Number of features present at any one time is constant • Binary features means weighted sum easy to compute • Easy to compute indices of the features present R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Tile Coding Cont. Irregular tilings Hashing CMAC “Cerebellar model arithmetic computer” Albus 1971 R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Radial Basis Functions (RBFs) e.g., Gaussians R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Can you beat the “curse of dimensionality”? • Can you keep the number of features from going up exponentially with the dimension? • Function complexity, not dimensionality, is the problem. • Kanerva coding: • Select a bunch of binary prototypes • Use hamming distance as distance measure • Dimensionality is no longer a problem, only complexity • “Lazy learning” schemes: • Remember all the data • To get new value, find nearest neighbors and interpolate • e.g., locally-weighted regression R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Learning state-action values The general gradient-descent rule: Gradient-descent Sarsa(l) (backward view): Training examples of the form: Control with Function Approximation R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

GPI with Linear Gradient Descent Sarsa(l) R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

GPI Linear Gradient Descent Watkins’ Q(l) R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Mountain-Car Task R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Mountain-Car Results R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Baird’s Counterexample Reward is always zero, so the true value is zero for all s The approximate of state value function is shown in each state Updating Is unstable from some initial conditions R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Baird’s Counterexample Cont. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Should We Bootstrap? R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Summary • Generalization • Adapting supervised-learning function approximation methods • Gradient-descent methods • Linear gradient-descent methods • Radial basis functions • Tile coding • Kanerva coding • Nonlinear gradient-descent methods? Backpropation? • Subtleties involving function approximation, bootstrapping and the on-policy/off-policy distinction R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Chapter 8: Generalization and Function Approximation

Chapter 8: Generalization and Function Approximation

Presentation Transcript

ROLE FUNCTION MODE

Generalization and Maintenance of Behavior Change

CHAPTER 7: CELL STRUCTURE AND FUNCTION

Chapter 4

Intonation

Chapter 2: Static Timing Analysis

Tutorial 7: Using Advanced Functions and Conditional Formatting

Chapter 50 Disorders of Motor Function

Unit 1 Overview

Altered Cellular and Tissue Biology

Approximation Algorithms for Stochastic Combinatorial Optimization

Approximation Techniques for Coloring Problems

Chapter 8

Introduction to Radial Basis Function Networks

Chapter 4

Chapter 6

Chapter Menu

Approximation Algorithms

A -Approximation Algorithm for Shortest Superstring

Chapter 3: Frequency Response of AC Circuit