700 likes | 933 Views
Ralf Herbrich, Thore Graepel Applied Games Group Microsoft Research Cambridge. Playing Machines: Machine Learning Applications in Computer Games. Overview. Overview. Why Machine Learning and Games?. Games can be very hard!. Partially observable stochastic games
E N D
Ralf Herbrich, Thore Graepel Applied Games Group Microsoft Research Cambridge Playing Machines: Machine Learning Applications in Computer Games
Games can be very hard! • Partially observable stochastic games • States only partially observed • Multiple agents choose actions • Stochastic pay-offs and state transitions depend on state and all the other agents’ actions • Goal: Optimise long term pay-off (reward) • Just like life: complex, adversarial, uncertain, and we are in it for the long run!
Space Invaders 1977 Non-Player Character 2001 Agents Human Player
Creatures (1996, Steve Grand) • Objective is to nurture creatures called norns • Model incorporates artificial life features • Norns had neural network brains • Their development can be influenced by player feedback
Black & White (2001, Richard Evans) • Peter Molineux’s famous “God Game” • Player determines fate of villagers as their “God” (seen as a hand) • Creature can be taught complex behaviour • Good and Evil - actions have consequence
Colin McRae Rally 2.0 (2001, Jeff Hannan) • First car racing game to use neural networks • Variety of tracks, drivers and road conditions • Racing line provided by author, neural network keeps car on racing line • Multilayer perceptrons trained with RPROP • Simple rules for recovery and overtaking
Other Games using Machine Learning Source: http://www.gameai.com/games.html
Reinforcement Learning Agent game state parameter update action Learning Algorithm reward / punishment game state Game action
Q Learning (off-policy) SARSA (on-policy) Q and SARSA Learning • Q(s,a) is expected reward for action a in state s. • α is rate of learning • a is action chosen • r is reward resulting from a • s is current state • s’ is state after executing a • γ is discount factor for future rewards
Tabular Q-Learning +10.0 actions 13.2 10.2 -1.3 3 ft 5 ft game states 3.2 6.0 4.0
Results (visual) Reinforcement Learner • Game state features • Separation (5 binned ranges) • Last action (6 categories) • Mode (ground, air, knocked) • Proximity to obstacle • Available Actions • 19 aggressive (kick, punch) • 10 defensive (block, lunge) • 8 neutral (run) In-Game AI Code • Q-Function Representation • One layer neural net (tanh) • Linear
Learning Aggressive Fighting Reward for decrease in Wulong Goth’s health Early in the learning process … … after 15 minutes of learning
Learning “Aikido” Style Fighting Punishment for decrease in either player’s health Early in the learning process … … after 15 minutes of learning
Reinforcement Learning for Car Racing: AMPS (Kochenderfer, 2005) • Collect Experience • Learn transition probabilities and rewards • Revise Value Function and Policy • Revise state-action abstraction • Return to 1 and collect more experience Left Distance Speed
Balancing Abstraction Complexity Just Right! Too Fine Too Coarse Representational Complexity
Adapting the Representation Split A A A A A A A A Split Merge • Merge
Project Gotham Racing 3 • Real time racing simulation. • Goal: as fast lap times as possible.
Input Features and Reward Laser Range Finder Measurements as Features Progress along Track as Reward
Actions • Coast • Accelerate • Brake • Hard-Left • Hard-Right • Soft-Left • Soft-Right
Learning to Walk: Why? • Current Games have unrealistic physical movement • Moonwalk • Hovering • Only death scenes are realistic • Rag-doll physics • Releases joint constraints
Reinforcement Learning to Walk(Russell Smith 1998) • Compromise between hard-wired and learned controller • Motion Sequencer with corrections • FOX controller: based on cerebellar model articulation controller (CMAC) neural network trained by reinforcement learning • Can follow paths and climb up and down slopes • Trained monopeds (“hopper”) and bipeds
Learning to Walk (Russell Smith, 1998) Hopper Training Hopper Trained
Learning to Walk (Russell Smith, 1998) Biped Training Biped Trained
Motion Capture Data • Fix Markers at key body positions • Record their position in 3D during motion • Fundamental technology in animation today • Free download of mo-cap files: www.bvhfiles.com
Gaussian Process Latent Variable Models (Lawrence, 2004) • Generative model for dimensionality reduction • Probabilistic equivalent to PCA which defines a probability distribution over data • Non-linear manifolds based on kernels • Visualisation of high-dimensional data • Back-projection from latent to data space • Can deal with missing data
Generative Model (SPCA vs. GPLVM) Latent variables Weight matrix x W • SPCA: Marginalise over x and optimise W • GPLVM: Marginalise over W and optimise x y Data
Bayes Nets for Bots(R. Le Hy et al. 2004) • Goal: Learn from skilled players how to act in a first-person shooter (FPS) game • Test Environment: • Unreal Tournament FPS game engine • Gamebots control framework • Idea: Naive Bayes classifier to learn under which circumstances to switch behaviour
Naive Bayes for State Classification • St: bot’s state at t • St+1: bot’s state t+1 • H: health level • W: weapon • OW: opponent’s weapon • HN: hear noise • NE: number of close enemies • PW: weapon close by? • PH: health pack close by?
Drivatars Unplugged “Built-In” AI BehaviourDevelopment Tool DrivatarLearning System DrivatarRacing LineBehaviour Model Vehicle Interaction and Racing Strategy Recorded Player Driving Controller Car Behaviour Drivatar AI Driving
Drivatars: Main Idea • Two phase process: • Pre-generate possible racing lines prior to the race from a (compressed) racing table. • Switch the lines during the race to add variability. • Compression reduces the memory needs per racing line segment • Switching makes smoother racing lines.
Racing Tables Segments a1 a2 a3 a4