Playing Machines: Machine Learning Applications in Computer Games

Ralf Herbrich, Thore Graepel Applied Games Group Microsoft Research Cambridge Playing Machines: Machine Learning Applications in Computer Games

Overview

Why Machine Learning and Games?

Games can be very hard! • Partially observable stochastic games • States only partially observed • Multiple agents choose actions • Stochastic pay-offs and state transitions depend on state and all the other agents’ actions • Goal: Optimise long term pay-off (reward) • Just like life: complex, adversarial, uncertain, and we are in it for the long run!

Approximations

Overview

Space Invaders 1977 Non-Player Character 2001 Agents Human Player

Game Industry Surpasses Hollywood

Games Industry Drives IT Progress

Creatures (1996, Steve Grand) • Objective is to nurture creatures called norns • Model incorporates artificial life features • Norns had neural network brains • Their development can be influenced by player feedback

Black & White (2001, Richard Evans) • Peter Molineux’s famous “God Game” • Player determines fate of villagers as their “God” (seen as a hand) • Creature can be taught complex behaviour • Good and Evil - actions have consequence

Colin McRae Rally 2.0 (2001, Jeff Hannan) • First car racing game to use neural networks • Variety of tracks, drivers and road conditions • Racing line provided by author, neural network keeps car on racing line • Multilayer perceptrons trained with RPROP • Simple rules for recovery and overtaking

Other Games using Machine Learning Source: http://www.gameai.com/games.html

Drivatar™ (2004, Michael Tipping et al.)

TrueSkill™ (2005, Graepel & Herbrich)

TrueSkill™: Applications

Overview

Reinforcement Learning Agent game state parameter update action Learning Algorithm reward / punishment game state Game action

Q Learning (off-policy) SARSA (on-policy) Q and SARSA Learning • Q(s,a) is expected reward for action a in state s. • α is rate of learning • a is action chosen • r is reward resulting from a • s is current state • s’ is state after executing a • γ is discount factor for future rewards

Tabular Q-Learning +10.0 actions 13.2 10.2 -1.3 3 ft 5 ft game states 3.2 6.0 4.0

Results (visual) Reinforcement Learner • Game state features • Separation (5 binned ranges) • Last action (6 categories) • Mode (ground, air, knocked) • Proximity to obstacle • Available Actions • 19 aggressive (kick, punch) • 10 defensive (block, lunge) • 8 neutral (run) In-Game AI Code • Q-Function Representation • One layer neural net (tanh) • Linear

Learning Aggressive Fighting Reward for decrease in Wulong Goth’s health Early in the learning process … … after 15 minutes of learning

Learning “Aikido” Style Fighting Punishment for decrease in either player’s health Early in the learning process … … after 15 minutes of learning

Reinforcement Learning for Car Racing: AMPS (Kochenderfer, 2005) • Collect Experience • Learn transition probabilities and rewards • Revise Value Function and Policy • Revise state-action abstraction • Return to 1 and collect more experience Left Distance Speed

Balancing Abstraction Complexity Just Right! Too Fine Too Coarse Representational Complexity

Adapting the Representation Split A A A A A A A A Split Merge • Merge

Project Gotham Racing 3 • Real time racing simulation. • Goal: as fast lap times as possible.

Input Features and Reward Laser Range Finder Measurements as Features Progress along Track as Reward

Actions • Coast • Accelerate • Brake • Hard-Left • Hard-Right • Soft-Left • Soft-Right

Learning to Walk: Why? • Current Games have unrealistic physical movement • Moonwalk • Hovering • Only death scenes are realistic • Rag-doll physics • Releases joint constraints

Reinforcement Learning to Walk(Russell Smith 1998) • Compromise between hard-wired and learned controller • Motion Sequencer with corrections • FOX controller: based on cerebellar model articulation controller (CMAC) neural network trained by reinforcement learning • Can follow paths and climb up and down slopes • Trained monopeds (“hopper”) and bipeds

Learning to Walk (Russell Smith, 1998) Hopper Training Hopper Trained

Learning to Walk (Russell Smith, 1998) Biped Training Biped Trained

Overview

Motion Capture Data • Fix Markers at key body positions • Record their position in 3D during motion • Fundamental technology in animation today • Free download of mo-cap files: www.bvhfiles.com

Gaussian Process Latent Variable Models (Lawrence, 2004) • Generative model for dimensionality reduction • Probabilistic equivalent to PCA which defines a probability distribution over data • Non-linear manifolds based on kernels • Visualisation of high-dimensional data • Back-projection from latent to data space • Can deal with missing data

Generative Model (SPCA vs. GPLVM) Latent variables Weight matrix x W • SPCA: Marginalise over x and optimise W • GPLVM: Marginalise over W and optimise x y Data

GPLVM on Motion Capture Data

Overview

Bayes Nets for Bots(R. Le Hy et al. 2004) • Goal: Learn from skilled players how to act in a first-person shooter (FPS) game • Test Environment: • Unreal Tournament FPS game engine • Gamebots control framework • Idea: Naive Bayes classifier to learn under which circumstances to switch behaviour

Naive Bayes for State Classification • St: bot’s state at t • St+1: bot’s state t+1 • H: health level • W: weapon • OW: opponent’s weapon • HN: hear noise • NE: number of close enemies • PW: weapon close by? • PH: health pack close by?

Supervised Learning from Humans

Illustration of Learned Bots

Drivatar™

Drivatars Unplugged “Built-In” AI BehaviourDevelopment Tool DrivatarLearning System DrivatarRacing LineBehaviour Model Vehicle Interaction and Racing Strategy Recorded Player Driving Controller Car Behaviour Drivatar AI Driving

The Racing Line Model

Drivatars: Main Idea • Two phase process: • Pre-generate possible racing lines prior to the race from a (compressed) racing table. • Switch the lines during the race to add variability. • Compression reduces the memory needs per racing line segment • Switching makes smoother racing lines.

Racing Tables Segments a1 a2 a3 a4

Playing Machines: Machine Learning Applications in Computer Games

Playing Machines: Machine Learning Applications in Computer Games

Presentation Transcript

An Introduction to Machine Learning with Perl

SM2215 Genres Part 1 - Games

Chapter 7 Two Player Perfect-Information Games

Subpart O - Machine Guarding

Unsupervised Morphological Segmentation With Log-Linear Models

Games in Education

Introduction to Machine Learning

Machine Learning in the Cloud

Subpart O - Machine Guarding

TCS for Machine Learning Scientists

Machine Learning for Analyzing Brain Activity

Support Vector Machine 支持向量機

CS194-10 Fall 2011 Introduction to Machine Learning Machine Learning: An Overview

DC Machine

Computer System

Cryptography on Non-Trusted Machines

Lecture 7: Turning Machines

Machine Learning

Support Vector Machine

Game Playing 2