490 likes | 793 Views
Machine Learning in Computer Game Players. Chikayama & Taura Lab. M1 Ayato Miki. Outline. Introduction Computer Game Players Machine Learning in Computer Game Players Tuning Evaluation Functions Supervised Learning Reinforcement Learning Evolutionary Algorithms Conclusion.
E N D
Machine Learning in Computer Game Players Chikayama & Taura Lab. M1 Ayato Miki
Outline • Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion
1. Introduction • Improvements in Computer Game Players • DEEP BLUE defeated Kasparov in 1997 • GEKISASHI and TANASE SHOGI on WCSC 2008 • Strong Computer Game Players are usually developed by strong human players • Input heuristics manually • Devote a lot of time and energy to tuning
Machine Learning for Games • Machine Learning enables automatic tuning using a large amount of data • It is not necessary for a developer to be an expert of the game
Outline • Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion
2. Computer Game Players • Games • Game Trees • Game Tree Search • Evaluation Function
Games • Turn system games • ex. tic-tac-toe, chess, shogi, poker, mah-jong… • Additional Classification • two player or otherwise • zero-sum or otherwise • deterministic or non-deterministic • perfect or imperfect information • Game Tree Model
Game Trees ← player’s turn ← move 2 move 1 → ← opponent’s turn
Game Tree Search • ex. Minimax search algorithm 5 Max 5 3 Min Min 5 8 3 6 Max Max 3 1 5 4 8 2 3 0 1 6 4 2
Game Tree Search • Difficult to search up to leaf nodes • 10^220 possible positions in shogi • Stop search at practicable depth • And “Evaluate” nodes • Using Evaluation Function
Evaluation Function • Estimate the superiority of the position • Elements • feature vector of the position • parameter vector feature vector of position s parameter vector
Outline • Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion
3. Machine Learning inComputer Game Players • Initial work • Samuel’s research [1959] • Learning objective • What do Computer Game Players Learn ?
Samuel’s Checker Player [1959] • Many useful techniques • Rote learning • Quiescence search • 3-layer neural network evaluation function • And some machine learning techniques • Learning through self-play • Temporal-difference learning • Comparison training
Learning Objective • Opening Book • Search Control • Evaluation Function
Learning Evaluation Functions • Automatic construction of evaluation function • Construct and select a feature vector automatically • ex. GLEM [Buro, 1998] • Difficult • Tuning evaluation function parameters • Make a feature vector manually and tune its parameters automatically • Easy and effective
Outline • Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion
4. Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithm
Supervised Learning • Provide the program with example positions and their exact evaluation values • Adjusts the parameters in a way that minimizes the error between the evaluation function outputs and the exact values ・・・ 20 50 50 40
Difficulty of Hard Supervised Training • Manual labeling positions • Quantitative evaluation Consider more soft approach
Comparison Training • Soft Supervised Training • Require only relative order for the possible moves • Easier and more intuitive >
Bonanza [Hoki, 2006] • Comparison training using records of expert games • Simple relative order The expert move other moves >
Bonanza Method • Based on the Optimal Control Theory • Minimize the Cost Function J example positions in the records total number of example positions error function
Bonanza Method Error Function child position with move m total number of possible moves the move played in the record minimax search value order discriminant function
Order Discriminant Function • Sigmoid Function • k is the parameter to control the gradient • When , T(x) is Step Function • In this case, the error function means “the number of moves that were considered to be better than the move in the record”
Bonanza • 30,000 professional game records and 30,000 high rating game records in SHOGI CLUB 24 were used • The weight parameters of about 10,000 feature elements were tuned • And won in the World Computer Shogi Championship 2006
Problem of Supervised Learning • It is costly to accumulate a training data set • It takes a lot of time to label manually • Using expert records has been successful • But how if not enough expert records ? • New games • Minor games • Other approach without a training set • ex. Reinforcement Learning (Next)
4. Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithm
Reinforcement Learning • The learner gets “a reward” from the environment • In the domain of game, the reward is final outcome(win/lose) • Reinforcement learning requires only the objective information of the game
Reinforcement Learning +10 +20 -10 +30 +60 -30 +120 -60 +60 +200 -100 +100 Inefficient in Games…
Temporal-Difference Learning +10 +10 +30 +15 +60 +10 +80 +100
TD-Gammon [Tesauro, 1992] • Trained through self-play
Problems of Reinforcement Learning • Falling into a local optimum • Lack of playing variation • Solutions • Add intentional randomness • Play against various players (computer/human) • Credit Assignment Problem (CAP) • Not clear which action was effective
4. Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithm
Evolutionary Algorithm Initialize Population Randomly Vary Individuals Evaluate “Fitness” Apply Selection
Research of Fogel et al. [2004] • Evolutionary algorithm forchess player • Using open-source chess program • Attempt to tune its parameters
Initialization • Make initial 10 parents • Initialize parameters with random values
Variation • Create 10 offsprings from each surviving parent by mutating parental parameters Gaussian random variable strategy parameter
Evaluate Fitness and Selection • Each player plays ten games against randomly selected opponents • Ten best players become parents of the next generation Select 10 opponents randomly
Tuned Parameters • Material value • Positional value • Weights and biases of three neural networks
Three Neural Networks • Each network has 3 Layers • Input = Arrangement of specific areas (front 2 rows, back 2 rows, and center 4x4 square) • Hidden = 10 Units • Output = Worth of the area arrangement 16 input 10 hidden 1 output
Result • Initial Rating = 2066 (Expert) • Rating of open-source player • Best Rating = 2437 (Senior Master) • But the program cannot yet compete with other strongest chess programs (R2800~) 10 independent trials (Each has 50 generations)
Outline • Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion
Future Work • Automatic position labeling • Using records or computer play • Sophisticated reward • Consider opponent’s strength • Move analysis for credit assignment • Experiment in other games