Machine Learning in Computer Game Players

Machine Learning in Computer Game Players Chikayama & Taura Lab. M1 Ayato Miki

Outline • Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion

1. Introduction • Improvements in Computer Game Players • DEEP BLUE defeated Kasparov in 1997 • GEKISASHI and TANASE SHOGI on WCSC 2008 • Strong Computer Game Players are usually developed by strong human players • Input heuristics manually • Devote a lot of time and energy to tuning

Machine Learning for Games • Machine Learning enables automatic tuning using a large amount of data • It is not necessary for a developer to be an expert of the game

2. Computer Game Players • Games • Game Trees • Game Tree Search • Evaluation Function

Games • Turn system games • ex. tic-tac-toe, chess, shogi, poker, mah-jong… • Additional Classification • two player or otherwise • zero-sum or otherwise • deterministic or non-deterministic • perfect or imperfect information • Game Tree Model

Game Trees ← player’s turn ← move 2 move 1 → ← opponent’s turn

Game Tree Search • ex. Minimax search algorithm 5 Max 5 3 Min Min 5 8 3 6 Max Max 3 1 5 4 8 2 3 0 1 6 4 2

Game Tree Search • Difficult to search up to leaf nodes • 10^220 possible positions in shogi • Stop search at practicable depth • And “Evaluate” nodes • Using Evaluation Function

Evaluation Function • Estimate the superiority of the position • Elements • feature vector of the position • parameter vector feature vector of position s parameter vector

3. Machine Learning inComputer Game Players • Initial work • Samuel’s research [1959] • Learning objective • What do Computer Game Players Learn ?

Samuel’s Checker Player [1959] • Many useful techniques • Rote learning • Quiescence search • 3-layer neural network evaluation function • And some machine learning techniques • Learning through self-play • Temporal-difference learning • Comparison training

Learning Objective • Opening Book • Search Control • Evaluation Function

Learning Evaluation Functions • Automatic construction of evaluation function • Construct and select a feature vector automatically • ex. GLEM [Buro, 1998] • Difficult • Tuning evaluation function parameters • Make a feature vector manually and tune its parameters automatically • Easy and effective

4. Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithm

Supervised Learning • Provide the program with example positions and their exact evaluation values • Adjusts the parameters in a way that minimizes the error between the evaluation function outputs and the exact values ・・・ 20 50 50 40

Difficulty of Hard Supervised Training • Manual labeling positions • Quantitative evaluation Consider more soft approach

Comparison Training • Soft Supervised Training • Require only relative order for the possible moves • Easier and more intuitive >

Bonanza [Hoki, 2006] • Comparison training using records of expert games • Simple relative order The expert move other moves >

Bonanza Method • Based on the Optimal Control Theory • Minimize the Cost Function J example positions in the records total number of example positions error function

Bonanza Method Error Function child position with move m total number of possible moves the move played in the record minimax search value order discriminant function

Order Discriminant Function • Sigmoid Function • k is the parameter to control the gradient • When , T(x) is Step Function • In this case, the error function means “the number of moves that were considered to be better than the move in the record”

Bonanza • 30,000 professional game records and 30,000 high rating game records in SHOGI CLUB 24 were used • The weight parameters of about 10,000 feature elements were tuned • And won in the World Computer Shogi Championship 2006

Problem of Supervised Learning • It is costly to accumulate a training data set • It takes a lot of time to label manually • Using expert records has been successful • But how if not enough expert records ? • New games • Minor games • Other approach without a training set • ex. Reinforcement Learning (Next)

Reinforcement Learning • The learner gets “a reward” from the environment • In the domain of game, the reward is final outcome(win/lose) • Reinforcement learning requires only the objective information of the game

Reinforcement Learning +10 +20 -10 +30 +60 -30 +120 -60 +60 +200 -100 +100 Inefficient in Games…

Temporal-Difference Learning +10 +10 +30 +15 +60 +10 +80 +100

TD-Gammon [Tesauro, 1992] • Trained through self-play

Problems of Reinforcement Learning • Falling into a local optimum • Lack of playing variation • Solutions • Add intentional randomness • Play against various players (computer/human) • Credit Assignment Problem (CAP) • Not clear which action was effective

Evolutionary Algorithm Initialize Population Randomly Vary Individuals Evaluate “Fitness” Apply Selection

Research of Fogel et al. [2004] • Evolutionary algorithm forchess player • Using open-source chess program • Attempt to tune its parameters

Initialization • Make initial 10 parents • Initialize parameters with random values

Variation • Create 10 offsprings from each surviving parent by mutating parental parameters Gaussian random variable strategy parameter

Evaluate Fitness and Selection • Each player plays ten games against randomly selected opponents • Ten best players become parents of the next generation Select 10 opponents randomly

Tuned Parameters • Material value • Positional value • Weights and biases of three neural networks

Three Neural Networks • Each network has 3 Layers • Input = Arrangement of specific areas (front 2 rows, back 2 rows, and center 4x4 square) • Hidden = 10 Units • Output = Worth of the area arrangement 16 input 10 hidden 1 output

Result • Initial Rating = 2066 (Expert) • Rating of open-source player • Best Rating = 2437 (Senior Master) • But the program cannot yet compete with other strongest chess programs (R2800~) 10 independent trials (Each has 50 generations)

Characteristics

Future Work • Automatic position labeling • Using records or computer play • Sophisticated reward • Consider opponent’s strength • Move analysis for credit assignment • Experiment in other games

Machine Learning in Computer Game Players

Machine Learning in Computer Game Players

Presentation Transcript

Machine Learning: Making Computer Science Scientific

Topics in Machine Learning

Machine Learning in Bioinformatics

Computer Game

Playing Machines: Machine Learning Applications in Computer Games

Helping Players Grow in the Game

Computer Vision Machine Learning Features

Machine Learning in DryadLINQ

Machine learning in IDS

Submodularity in Machine Learning

Machine Learning for Computer graphics

Machine Computer

Machine Learning in realtime

A Survey on Computer Game Players

Machine Learning in GATE

SAL: A Game Learning Machine

Experiments in Machine Learning

Evaluation in Machine Learning

Machine Learning in Football

Computer Game

Experiments in Machine Learning

Applied machine learning in game theory