An Object-oriented Representation for Efficient Reinforcement Learning

An Object-oriented Representation for Efficient Reinforcement Learning Carlos Diuk, Andre Cohen and Michael L. Littman Rutgers Laboratory for Real-Life Reinforcement Learning (RL)3 Department of Computer Science Rutgers University (New Jersey, USA) ICML 2008 – Helsinki, Finland

Motivation How would YOU play this game?

What’s in a state? s1 -> a0 -> s5 s5 -> a2 -> s24 s24 -> a1 -> s1 If we know that our agents are interacting in a spatial relation with objects, let’s just tell them so. A simple hash code that tells you if you’ve been “there” before. What we (the agent) can actually “see”: objects, interactions, spatial relationships.

What we did • Grab ideas from Relational RL and come up with a representation that: • is suitable for a wide-enough range of domains • is tractable • provides opportunities for generalization • enables smart exploration • Strike a balance between generality and tractability.

OO representation • Problem defined by a set of objects and their attributes. • Example: Objects in Pitfall defined by a bounding box on a set of pixels based on color. Man.<x,y> Log.<x,y> Hole.<x,y> Ladder.<x,y> Wall.<x,y> • State is the union of all objects’ attribute values.

OO representation • For any given state s, there is a function c(s) that tells us which relations occur under s. • Dynamics defined by preconditions and effects. • Preconditions are conjunctions of terms: • Relations between objects: • touchN/S/E/W(objecti, objectj) • on(objecti, objectj) • Any (boolean) function on the attributes. • Any other function encoding prior knowledge. • Actions have effects that determine how objects’ attributes get modified. on(Man, Ladder) Action Up Man.y = Man.y + 8

DOORMax • An algorithm for efficient learning of deterministic OO-MDPs. • When objects interact, and an effect is observed, DOORMax learns the conjunction of terms that enabled the effect. • Belongs to the R-Max family of algorithms: • Guides exploration to make objects interact

Pitfall video

DOORMax Analysis • Let n be the number of terms. • Assume that: • The number of effects per action is bounded by a (small) constant m. • Each effect has a unique conjunctive condition. • As long as effects are observed (that is, some effect occurs given an action a), DOORMax will learn the condition-effect pairs that determine the dynamics of a in O(nm). There is a worst-case bound, when lots of no-effects are observed, of O(nm).

Results What about this game? Videogame

Representations in Taxi

Bigger Taxi

Conclusions and future work • OO-MDPs provide a natural way of modeling an interesting set of domains, while enabling generalization and smart exploration. • DOORMax learns deterministic OO-MDPs outperforming state-of-the-art algorithms for factored-state representations. • DOORMax scales very nicely with respect to the size of the state space, as long as transition dynamics between objects do not change. • We do not have a provably efficient algorithm for stochastic OO-MDPs. • We do not yet handle inheritance between classes of objects.

An Object-oriented Representation for Efficient Reinforcement Learning

An Object-oriented Representation for Efficient Reinforcement Learning

Presentation Transcript

An Efficient Representation for Irradiance Environment Maps

Object Oriented Programming: An Introduction

An Introduction to Reinforcement Learning

Efficient Representation of Local Geometry for Large Scale Object Retrieval

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

An Object-oriented Representation for Efficient Reinforcement Learning

Reinforcement Learning An Introduction

REINFORCEMENT LEARNING

Reinforcement Learning An Introduction

An SQL API for Object Oriented Perl

Region Inference for an Object-Oriented Language

Region Inference for an Object-Oriented Language

Object Oriented Learning Objects

An object oriented HL7 Framework

Graphical Representation for Object Oriented Analysis and Design

Learning object affordances based on structural object representation

Object Oriented Testing: An Overview

An Object-Oriented Simulation Program for CMS

Object and Agent Oriented Knowledge Representation

Object Oriented Learning Objects