150 likes | 298 Views
Curious Characters in Multiuser Games: A Study in Motivated Reinforcement Learning for Creative Behavior Policies *. Mary Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010.
E N D
Curious Characters in Multiuser Games: A Study in Motivated Reinforcement Learning for Creative Behavior Policies* Mary Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010 1 Based on Merrick, K. and Maher, M.L. (2009) Motivated Reinforcement Learning: Curious Characters for Multiuser Games, Springer.
Outline • Curiosity and Fun • Motivation • Motivated Reinforcement Learning • An Agent Model of a Curious Character • Evaluation of Behavior Policies
Can AI model Fun? Claim: An agent motivated by curiosity to learn patterns is a model of fun.
Games try to achieve flow: a function of the players skill and performance J. Chen, Flow in games (and everything else). Communications of the ACM 50(4):31-34, 2007
Why Motivated Reinforcement Learning? • More efficient learning: • Complement external reward with internal reward • External reward not known at design time • Design tasks • Real world scenrios: Robotics • Virtual world scenarios: NPC in computer games • More autonomy in determining learning tasks • Robotics • NPC in computer games
Models of Motivation • Cognitive: • Interest • Competency • Challenge • Biological • Stasis variables: energy, blood pressure, etc • Social • Conformity • Peer pressure
Motivation as Interesting Events Event is a change in observations: O(t)–O(t’) = (Δ(o1(t), o1(t’)), Δ(o2(t), o2(t’)), … Δ(oL(t), oL(t’)), …) D.E. Berlyne, Exploration and Curiosity, Science 153:24-33, 1966
Sensed States: Context Free Grammar (CFG) CFG = (VS, ΓS, ΨS, S) where: • VS is a set of variables or syntactic categories, • ΓS is a finite set of terminals such that VS ∩ ΓS = {}, • ΨS is a set of productions V -> v where V is a variable and v is a string of terminals and variables,S is the start symbol. Thus, the general form of a sensed state is: • S -> <sensations> • <sensations> -> <PiSensations><sensations> | ε • <PiSensations> -> <sL><PiSensations> | ε • <sL> -> <number> | <string>
Behavioral Variety • Behavioural variety measures the number of events for which a near optimal policy is learned. • We characterise the level of optimality of a policy learned to achieve the event E(t) in terms of its structural stability.
Behavioral Complexity • The complexity of a policy can be measured by averaging the mean numbers of actions ā E(t) required to repeat E(t) at any time when the current behaviour is stable
Research Directions • Scalability and dynamics: different RL such as decision trees and NN function approximation • Motivation functions: competence, optimal challenges, social models
Relevance to AI and Fun • Is it more fun to play with curious NPC? • Can a curious agent play a game to test how fun a game is?