1 / 34

Motivated Reinforcement Learning for Non-Player Characters in Persistent Computer Game Worlds

Motivated Reinforcement Learning for Non-Player Characters in Persistent Computer Game Worlds. Kathryn Merrick University of Sydney and National ITC Australia Supervisor: Mary Lou Maher. Introduction. Reinforcement learning uses a reward signal as a learning stimulus

eldon
Download Presentation

Motivated Reinforcement Learning for Non-Player Characters in Persistent Computer Game Worlds

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Motivated Reinforcement Learning for Non-Player Characters in Persistent Computer Game Worlds Kathryn Merrick University of Sydney and National ITC Australia Supervisor: Mary Lou Maher

  2. Introduction • Reinforcement learning uses a reward signal as a learning stimulus • Common assumptions about reward: • Tasks are known at design time and can be modelled as a task specific reward signal or • A teacher is present to provide reward when desirable (or undesirable) tasks are performed

  3. Sometimes these assumptions cannot be justified…

  4. Sometimes these assumptions cannot be justified…

  5. Research Question • How can we extend reinforcement learning to environments where: • Tasks are not known at design time • A teacher is not present In order to achieve: • Efficient learning • Competence at multiple tasks of any complexity

  6. Overview • Related work • Motivation using interest as an intrinsic reward signal • Metrics for evaluating motivated reinforcement learning • Results • Future work

  7. Existing Technologies for NPCs • Reflexive Agents • Rule based • State machines • Learning Agents • Reinforcement learning IF !Range(NearestEnemyOf(Myself),3) Range(NearestEnemyOf(Myself),8) THEN RESPONSE #40 EquipMostDamagingMelee() AttackReevalutate(NearestEnemyOf(Myself),60) RESPONSE #80 EquipRanged() AttackReevalutate(NearestEnemyOf(Myself),30) END startup state Startup${ trigger OnGoHandleMessage$ (WE_ENTERED_WORLD){ SetState Spawn$; }}

  8. Motivated Reinforcement Learning • Motivated reinforcement learning introduces an intrinsic rewardsignal in addition to or instead of extrinsic reward • Intrinsic reward has been used to: • Speed learning of extrinsically rewarded tasks • Solve maintenance problems • Intrinsic reward has been modelled as: • Curiosity and boredom • Changes in light and sound intensity • Predictability, familiarity, stability • Novelty

  9. Representing the Environment • Existing techniques: • Attribute based representation • Fixed length vectors <wall1x, wall1y, wall2x, wall2y, wall3x, wall3y, wall4x, wall4y> • Problem: • How long should the vector be?

  10. Context Free Grammars • Represent only what is present using a context free grammar: S  <objects> <object>  <object><objects> | ε <object>  <objectID><objectx><objecty> <objectID>  <integer> <objectx>  <integer> <objecty>  <integer> <integer>  1 | 2| 3 | …

  11. Representing Tasks • Potential learning tasks are represented as events: changes in the environment E(t) = S(t) – S(t-1) • S(1) (<locationX:2>, <locationY:5>, <pick:1>, <forge:1>) • A(1) (move, north) • S(2) (<locationX:2>, <locationY:6><, lathe:1>) • E(2) (<locationY:1>, <forge:-1>, <lathe:1>)

  12. Motivation as Interesting Events E(t) Events Memory HSOM Clustering layer (SOM) Novelty losing neurons σ(t) = 0 winning neighbourhood σ(t) = 1 Interest Habituating layer Reward N(t) = habituated value from winning clustering neuron

  13. Novelty and Interest • Novelty and Habituation • Stanley’s model of habituation • Interest • The Wundt curve

  14. Motivated Reinforcement Learning W(t) • Sensation • Computes events • Motivation • Computes an intrinsic reward signal • Learning: • Q-Learning update • Activation • ε-greedy action selection sensors S(t) S(t-1) S Memory S(t) S(t), E(t) E(t-1) M E(t) S(t), R(t) π(t-1),A(t-1) L π(t) π(t) A A(t) A(t) effectors F(t)

  15. Motivated Hierarchical Reinforcement Learning W(t) • Sensation • Computes events • Motivation • Computes an intrinsic reward signal • Organisation • Manages policies • Learning: • Hierarchical Q-Learning update • Activation • Recall reflex • ε-greedy action selection sensors S(t) S(t-1) S Memory S(t) S(t), E(t) E(t-1) M E(t) S(t), R(t), E(t) B(t-1) O B(t) S(t), R(t) π(t-1),B(t-1) L π (t) S(t), π (t) A A(t) A(t) effectors F(t)

  16. Performance Evaluation • Related work: • Characterise the output of the motivation function • Measure learning efficiency • Characterise the emergent behaviour • Our goals: • Efficient learning • Competence at multiple tasks • Tasks of any complexity

  17. Metrics • Existing metrics for learning efficiency • Eg: chart number of actions against time • Behavioural variety: • Summarises learning efficiency for multiple tasks • Behavioural complexity σE = CE = average(ā E | σE < r )

  18. A Game Scenario in Second Life…

  19. The Agent… • Sensors: • Location sensor • Object sensor • Inventory Sensor • Effectors: • Move to object effector • Pick up object effector • Use object effector

  20. MRL – Behavioural Variety E(<inventoryIron:1>) E(<inventoryTimber:-1>) E(<location:-2>)

  21. MRL – Behavioural Variety

  22. MRL – Behavioural Complexity

  23. MRL – Learning Efficiency Iron Mining

  24. MRL – Learning Efficiency Furniture Making

  25. MHRL – Learning EfficiencyIron Mining

  26. MHRL – Learning EfficiencyFurniture Making

  27. MHRL – Behavioural Variety E(<inventoryIron:-1>) E(<inventoryTimber:-1>) E(<location:-2>) E(<location:-2>) E(<inventoryTimber:1>) E(<inventoryIron:-1>)

  28. MHRL – Behavioural Complexity

  29. Emergent Behaviour – Travelling Vendor • Sensors: • Location sensor • Object sensor • Effectors: • Move to object effector

  30. Conclusions • It is possible for efficient task oriented learning to emerge without explicitly representing tasks in the reward signal. • Agents motivated by interest learn behaviours of greater variety and complexity than agents motivated by a random reward signal • Motivated hierarchical reinforcement learning agents are able to recall learned behaviours however behaviours are learned more slowly.

  31. Conclusions about MRL for NPCs • Motivated reinforcement learning offers a single agent model for many characters. • Motivated characters display progressively emerging behavioural patterns. • Motivated characters can adapt their behaviour to changes in their environment.

  32. Ongoing and Future Work • Scalability testing • Alternative models of motivation • Competence based motivation • Motivation with other classes of machine learning algorithms • Applications to intelligent environments

  33. Other Applications of MRL

  34. Curious Information Display

More Related