Mary Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

Curious Characters in Multiuser Games: A Study in Motivated Reinforcement Learning for Creative Behavior Policies* Mary Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010 1 Based on Merrick, K. and Maher, M.L. (2009) Motivated Reinforcement Learning: Curious Characters for Multiuser Games, Springer.

Outline • Curiosity and Fun • Motivation • Motivated Reinforcement Learning • An Agent Model of a Curious Character • Evaluation of Behavior Policies

Can AI model Fun? Claim: An agent motivated by curiosity to learn patterns is a model of fun.

Games try to achieve flow: a function of the players skill and performance J. Chen, Flow in games (and everything else). Communications of the ACM 50(4):31-34, 2007

Why Motivated Reinforcement Learning? • More efficient learning: • Complement external reward with internal reward • External reward not known at design time • Design tasks • Real world scenrios: Robotics • Virtual world scenarios: NPC in computer games • More autonomy in determining learning tasks • Robotics • NPC in computer games

Models of Motivation • Cognitive: • Interest • Competency • Challenge • Biological • Stasis variables: energy, blood pressure, etc • Social • Conformity • Peer pressure

MRL Agent Model

Motivation as Interesting Events Event is a change in observations: O(t)–O(t’) = (Δ(o1(t), o1(t’)), Δ(o2(t), o2(t’)), … Δ(oL(t), oL(t’)), …) D.E. Berlyne, Exploration and Curiosity, Science 153:24-33, 1966

Sensed States: Context Free Grammar (CFG) CFG = (VS, ΓS, ΨS, S) where: • VS is a set of variables or syntactic categories, • ΓS is a finite set of terminals such that VS ∩ ΓS = {}, • ΨS is a set of productions V -> v where V is a variable and v is a string of terminals and variables,S is the start symbol. Thus, the general form of a sensed state is: • S -> <sensations> • <sensations> -> <PiSensations><sensations> | ε • <PiSensations> -> <sL><PiSensations> | ε • <sL> -> <number> | <string>

MRL for Non Player Characters

Habituated Self Organizing Map

Behavioral Variety • Behavioural variety measures the number of events for which a near optimal policy is learned. • We characterise the level of optimality of a policy learned to achieve the event E(t) in terms of its structural stability.

Behavioral Complexity • The complexity of a policy can be measured by averaging the mean numbers of actions ā E(t) required to repeat E(t) at any time when the current behaviour is stable

Research Directions • Scalability and dynamics: different RL such as decision trees and NN function approximation • Motivation functions: competence, optimal challenges, social models

Relevance to AI and Fun • Is it more fun to play with curious NPC? • Can a curious agent play a game to test how fun a game is?

Mary Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

Mary Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

Presentation Transcript

The University of Mary

Mary Lou Maher CreativeIT Program Director, NSF University of Colorado August 2007

Mary Lou Klem, PhD, MLIS Health Sciences Library System University of Pittsburgh

AAAI Spring Symposium: Sketch Understanding Workshop

Semantic Cities @ AAAI 2012 July 23rd, 2012

Mary Lou Downie, Lulu Wang and Peng Xiao ERES June 2010 Milan

Sydney SharePoint User Group – July 2010

University of Sydney

University of Sydney

Sydney Ports Corporation – Workshop 20 July 2010

David E. Meltzer Mary Lou Fulton Teachers College Arizona State University

University of Sydney

Mary Lou Cook

The University of Mary

The University of Mary

GPU Workshop: July, 2010

Jon Nevill, Peter Nichols and Mary Maher

RG 0291 – July 2010 Workshop

Mary Hayden, Ph.D. University of Colorado July 23, 2004

University of Delaware July 6, 2010

Phillip Maher IPS Radio and Space Services, Sydney NSW p-maher@ips.au