Model-based RL (+ action sequences): maybe it can explain everything

Model-based RL (+ action sequences): maybe it can explain everything Niv lab meeting 6/11/2012 Stephanie Chan

goal-directed v.s. habitual instrumental actions • Habitual • Goal-directed • After extensive training • Choose action based on previous actions/stimuli • Sensory motor cortices + DLS (putamen) • Not sensitive to: • reinforcer devaluation • action-outcome changes in contingency • After moderate training • Choose action based on expected outcome • PFC & DMS(caudate) • Usually: • Model-based RL • Model-free RL

goal-directed v.s. habitual instrumental actions • What do real animals do?

Model-free RL • Explains resistance to devaluation: • Devaluation occurs in “extinction”. No feedback / no TD error • Does NOT explain resistance to changes in action-outcome contingency • In fact, habituated behavior should be MORE sensitive to changes in contingency • Maybe: update rates go small after extended training

Alternative explanation • We don’t need model-free RL • Habit formation = association of individual actions into “action sequences” • More parsimonious • A means of modeling action sequences

Over the course of training • Exploration -> exploitation • Variability -> stereotypy • Errors and RT -> decrease • Individual actions -> “chunked” sequences • PFC + associative striatum -> sensorimotor striatum • “closed loop” -> “open loop”

When should actions get chunked? • Q-learning with dwell time • Q(s,a) = R(s) + E[V(s’)] – D(s)<R> • When costs (possible mistakes) are outweighed by benefits (decrease decision time) • Cost: C(s,a,a’) = E[Q(s’,a’)-V(s’)] = E[A(s’,a’)] • Efficient way to compute this: TDt = [rt – dt<R> + V(st+1)]-V(st) = a sample of A(st,at) • Benefit: (# timesteps saved) <R>

When do they get unchunked? • C(s,a,a’) is insensitive to changes in environment • Primitive actions no longer evaluated, no TD error, no samples for C • But <R> is sensitive to changes… • Action sequences get unchunked when environment changes to decrease <R> • No unchunking if environment changes to present a better alternative to increase <R> • Ostlund et al 2009: rats are immediately sensitive to devaluation of the state that the macro action lands on, but not on the intermediate states

Simulations I : SRTT-like task

Simulations II: Instrumental conditioning Reinforcer devaluation Non-contingent Omission

Model-based RL (+ action sequences): maybe it can explain everything

Model-based RL (+ action sequences): maybe it can explain everything

Presentation Transcript

TRBAC: A Temporal Role-Based Access Control Model

OSI Model

Model-Based Safety Analysis Overview

The Basics of Public Health

Chapter 19 The Keynesian Model in Action

Chapter 11

Chapter 3 Supply and Demand

Calvin Taylor’s Multiple Talents Model

MSA- multiple sequence alignment

Model-Based Testing and Test-Based Modelling

INFINITE SEQUENCES AND SERIES

Action Recognition

Reading DNA Sequences

BLAST

Discrete and Rhythmic Dynamics as Units of Coordinated Action: Behavioral Data, a Model,

Sector Model, by Hoyt

Uncovering Sequences Mysteries With Hidden Markov Model

PROTEIN PATTERN DATABASES

DOM (Document Object Model)

Infinite Sequences and Series

Model-Based Testing