Motivated Reinforcement Learning for Non-Player Characters in Persistent Computer Game Worlds

Motivated Reinforcement Learning for Non-Player Characters in Persistent Computer Game Worlds Kathryn Merrick University of Sydney and National ITC Australia Supervisor: Mary Lou Maher

Introduction • Reinforcement learning uses a reward signal as a learning stimulus • Common assumptions about reward: • Tasks are known at design time and can be modelled as a task specific reward signal or • A teacher is present to provide reward when desirable (or undesirable) tasks are performed

Sometimes these assumptions cannot be justified…

Research Question • How can we extend reinforcement learning to environments where: • Tasks are not known at design time • A teacher is not present In order to achieve: • Efficient learning • Competence at multiple tasks of any complexity

Overview • Related work • Motivation using interest as an intrinsic reward signal • Metrics for evaluating motivated reinforcement learning • Results • Future work

Existing Technologies for NPCs • Reflexive Agents • Rule based • State machines • Learning Agents • Reinforcement learning IF !Range(NearestEnemyOf(Myself),3) Range(NearestEnemyOf(Myself),8) THEN RESPONSE #40 EquipMostDamagingMelee() AttackReevalutate(NearestEnemyOf(Myself),60) RESPONSE #80 EquipRanged() AttackReevalutate(NearestEnemyOf(Myself),30) END startup state Startup${ trigger OnGoHandleMessage$ (WE_ENTERED_WORLD){ SetState Spawn$; }}

Motivated Reinforcement Learning • Motivated reinforcement learning introduces an intrinsic rewardsignal in addition to or instead of extrinsic reward • Intrinsic reward has been used to: • Speed learning of extrinsically rewarded tasks • Solve maintenance problems • Intrinsic reward has been modelled as: • Curiosity and boredom • Changes in light and sound intensity • Predictability, familiarity, stability • Novelty

Representing the Environment • Existing techniques: • Attribute based representation • Fixed length vectors <wall1x, wall1y, wall2x, wall2y, wall3x, wall3y, wall4x, wall4y> • Problem: • How long should the vector be?

Context Free Grammars • Represent only what is present using a context free grammar: S  <objects> <object>  <object><objects> | ε <object>  <objectID><objectx><objecty> <objectID>  <integer> <objectx>  <integer> <objecty>  <integer> <integer>  1 | 2| 3 | …

Representing Tasks • Potential learning tasks are represented as events: changes in the environment E(t) = S(t) – S(t-1) • S(1) (<locationX:2>, <locationY:5>, <pick:1>, <forge:1>) • A(1) (move, north) • S(2) (<locationX:2>, <locationY:6><, lathe:1>) • E(2) (<locationY:1>, <forge:-1>, <lathe:1>)

Motivation as Interesting Events E(t) Events Memory HSOM Clustering layer (SOM) Novelty losing neurons σ(t) = 0 winning neighbourhood σ(t) = 1 Interest Habituating layer Reward N(t) = habituated value from winning clustering neuron

Novelty and Interest • Novelty and Habituation • Stanley’s model of habituation • Interest • The Wundt curve

Motivated Reinforcement Learning W(t) • Sensation • Computes events • Motivation • Computes an intrinsic reward signal • Learning: • Q-Learning update • Activation • ε-greedy action selection sensors S(t) S(t-1) S Memory S(t) S(t), E(t) E(t-1) M E(t) S(t), R(t) π(t-1),A(t-1) L π(t) π(t) A A(t) A(t) effectors F(t)

Motivated Hierarchical Reinforcement Learning W(t) • Sensation • Computes events • Motivation • Computes an intrinsic reward signal • Organisation • Manages policies • Learning: • Hierarchical Q-Learning update • Activation • Recall reflex • ε-greedy action selection sensors S(t) S(t-1) S Memory S(t) S(t), E(t) E(t-1) M E(t) S(t), R(t), E(t) B(t-1) O B(t) S(t), R(t) π(t-1),B(t-1) L π (t) S(t), π (t) A A(t) A(t) effectors F(t)

Performance Evaluation • Related work: • Characterise the output of the motivation function • Measure learning efficiency • Characterise the emergent behaviour • Our goals: • Efficient learning • Competence at multiple tasks • Tasks of any complexity

Metrics • Existing metrics for learning efficiency • Eg: chart number of actions against time • Behavioural variety: • Summarises learning efficiency for multiple tasks • Behavioural complexity σE = CE = average(ā E | σE < r )

A Game Scenario in Second Life…

The Agent… • Sensors: • Location sensor • Object sensor • Inventory Sensor • Effectors: • Move to object effector • Pick up object effector • Use object effector

MRL – Behavioural Variety E(<inventoryIron:1>) E(<inventoryTimber:-1>) E(<location:-2>)

MRL – Behavioural Variety

MRL – Behavioural Complexity

MRL – Learning Efficiency Iron Mining

MRL – Learning Efficiency Furniture Making

MHRL – Learning EfficiencyIron Mining

MHRL – Learning EfficiencyFurniture Making

MHRL – Behavioural Variety E(<inventoryIron:-1>) E(<inventoryTimber:-1>) E(<location:-2>) E(<location:-2>) E(<inventoryTimber:1>) E(<inventoryIron:-1>)

MHRL – Behavioural Complexity

Emergent Behaviour – Travelling Vendor • Sensors: • Location sensor • Object sensor • Effectors: • Move to object effector

Conclusions • It is possible for efficient task oriented learning to emerge without explicitly representing tasks in the reward signal. • Agents motivated by interest learn behaviours of greater variety and complexity than agents motivated by a random reward signal • Motivated hierarchical reinforcement learning agents are able to recall learned behaviours however behaviours are learned more slowly.

Conclusions about MRL for NPCs • Motivated reinforcement learning offers a single agent model for many characters. • Motivated characters display progressively emerging behavioural patterns. • Motivated characters can adapt their behaviour to changes in their environment.

Ongoing and Future Work • Scalability testing • Alternative models of motivation • Competence based motivation • Motivation with other classes of machine learning algorithms • Applications to intelligent environments

Other Applications of MRL

Curious Information Display

Motivated Reinforcement Learning for Non-Player Characters in Persistent Computer Game Worlds

Motivated Reinforcement Learning for Non-Player Characters in Persistent Computer Game Worlds

Presentation Transcript

Game Worlds Space in Computer Games

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Intrinsically Motivated Hierarchical Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

CSE1GDT Characters and Worlds

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Motivated Reinforcement Learning for Non-Player Characters in Persistent Computer Game Worlds

REINFORCEMENT LEARNING

Reinforcement Learning

Computer Science Readings: Reinforcement Learning

CSE1GDT Characters and Worlds

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning