230 likes | 358 Views
The Implementation of Machine Learning in the Game of Checkers. Billy Melicher Computer Systems lab 08-09. Abstract. Machine learning uses past information to predict future states Can be used in any situation where the past will predict the future Will adapt to situations. Introduction.
E N D
The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09
Abstract • Machine learning uses past information to predict future states • Can be used in any situation where the past will predict the future • Will adapt to situations
Introduction • Checkers is used to explore machine learning • Checkers has many tactical aspects that make it good for studying
Background • Minimax • Heuristics • Learning
Minimax • Method of adversarial search • Every pattern(board) can be given a fitness value(heuristic) • Each player chooses the outcome that is best for them from the choices they have
Minimax Chart from wikipedia
Minimax • Has exponential growth rate • Can only evaluate a certain number of actions into the future – ply
Heuristic • Heuristics predict out come of a board • Fitness value of board, higher value, better outcome • Not perfect • Requires expertise in the situation to create
Heuristics • H(s) = c0F0(s) + c1F1(s) + … + cnFn(s) • H(s) = heuristic • Has many different terms • In checkers terms could be: Number of checkers Number of kings Number of checkers on an edge How far checkers are on board
Learning by Rote • Stores every game played • Connects the moves made for each board • Relates the moves made from a particular board to the outcome of the board • More likely to make moves that result in a win, less likely to make moves resulting in a loss • Good in end game, not as good in mid game
How I store data I convert each checker board into a 32 digit base 5 number where each digit corresponds to a playable square and each number corresponds to what occupies that square.
Learning by Generalization • Uses a heuristic function to guide moves • Changes the heuristic function after games based on the outcome • Good in mid game but not as good in early and end games • Requires identifying the features that affect game
Development • Use of minimax algorithm with alpha beta pruning • Use of both learning by Rote and Generalization • Temporal difference learning
Temporal Difference Learning • In temporal difference learning, you adjust the heuristic based on the difference between the heuristic at one time and at another • Equilibrium moves toward ideal function • U(s) <-- U(s) + α( R(s) + γU(s') - U(s))
Temporal Difference Learning • No proof that prediction closer to the end of the game will be better but common sense says it is • Changes heuristic so that it better predicts the value of all boards • Adjusts the weights of the heuristic
Alpha Value • The alpha value decreases the change of the heuristic based on how much data you have • Decreasing returns • Necessary for ensuring rare occurrences do not change heuristic too much
Development • Equation for learning applied to each weight: • w=(previous-current)(previous+current/2) • Equation for alpha value: • a=50/(49+n)
Results • Value of weight reaches equilibrium • Changes to reflect the learning of the program • Occasionally requires programmer intervention when it reaches a false equilibrium
Results • Learning by rote requires a large data set • Requires large amounts of memory • Necessary for determining alpha value in temporal difference learning
Conclusions • Good way to find equilibrium weight • Sometimes requires intervention • Doesn't require much memory • Substantial learning could be achieved with relativelly few runs • Learning did not require the program to know strategies but does require it to play towards a win