Trials and Tribulations

Trials and Tribulations Architectural Constraints on Modeling a Visuomotor Task within the Reinforcement Learning Paradigm

Subject of Investigation • How humans integrate visual object properties into their action policy when learning a novel visuomotor task. • BubblePop! • Problem: Too many possible questions… • Solution: Motivate behavioral research by looking at modeling difficulties. • Nonobvious crossroads

Approach • Since the task has a scalar performance signal, model must utilize reinforcement learning. • Temporal Difference Back Propagation • Start with an extremely simplified version of the task and add back the complexity once you have a successful model. • Analyze the representational and architectural constraints necessary for each model.

First Steps: Dummy World • 5x5 grid-world • 4 possible actions • Up, down, left, right • 1 unmoving target • Starting locations of target and agent randomly assigned • Fixed reward upon reaching target and a new target generated • Epoch ends after fixed number of steps

Dummy World Architectures Expected Reward 1 8 Hidden Layer context (ego only) 25 units for the grid 4 Actions The whole grid (allocentric), or agent centered (egocentric)

Building in symmetry • Current architectures learn each action independently. • ‘Up’ is like ‘Down’, but different. • It shifts the world • 1 action, 4 different inputs • “In which rotation of the world would you rather go ‘up’ in?”

World scaling • Scaled grid size up to 10x10 • Not as unrealistic as one might think… (tile coding) • Scaled number of targets • Difference from 1 to 2, but not from 2 to many. • Confirmed ‘winning-est’ representation • Added memory

No low hanging fruit:The ripeness problem • Added a ‘ripeness’ dimension to target, and changed the reward function: If target.ripeness>.60 reward = 1; Else reward = -.66667; How the problem occurs: • At a high temperature you move randomly. • The random pops net zero reward. • The temperature lowers and you ignore the target entirely.

Annealing away the curse of pickiness

A psychologically plausible solution • No feedback for almost ripe • So how could we anneal our ripeness criterion? • Anneal the amount you care about unripe pops. • Differentiate internal and extern reward functions

Future directions • Investigate how the type of ripeness difficulty impacts computational demands. • Difficulty due to reward schedule vs. perceptual acuity vs. redundancy vs. conjunctive-ness vs. ease of prediction • How to handle the ‘Feature Binding ‘Problem’ in this context • Emergent binding through deep learning? • Just keep increasing complexity and see what problems crop up. • If the model gets to human level performance without a hitch, then that’d be pretty good to.

Summary& discussion • Egocentric representations pay off in this domain, even with the added memory cost. • In any domain with a single agent? • Symmetries in the action space can be exploited to greatly expedite learning • Could there be a general mechanism for detecting such symmetries? • Difficult reward functions might be learnt via annealing internal reward signals. • How could we have this annealing emerge from the model?

Questions?

Trials and Tribulations

Trials and Tribulations

Presentation Transcript

The Trials and Tribulations of an eTutor

The trials and tribulations of an ‘insider’ researcher

Plotting Trials, Tribulations, and Successes

Facebook trials and tribulations

Mobile Computing: Trials and Tribulations

Trials and Tribulations of FFATA

Trials and Tribulations

Protests and Tribulations

Trials and Tribulations of Alija Izetbegovic

Protests and Tribulations

The Trials and Tribulations of a Visualization Laboratory

Small College Advisory Panels: One Institution’s Trials and Tribulations

TRIALS, TRIBULATIONS AND TRAVAILS

The trials and tribulations of writing a

RAID trials, tribulations and evaluations

PepcDB Reporting at CESG: More Trials and Fewer Tribulations

Trials and Tribulations: Archiving Electronic Records

Ashland – Valvoline Trials and Tribulations of Alternate Supply Chains

SPT Testing Problems (Trials and Tribulations) George Goble

Trials and Tribulations of a Road Widening Project

Adverse Event Reporting: Trials and Tribulations

Trials and Tribulations: Archiving Electronic Records