140 likes | 162 Views
The Implementation of Machine Learning in the Game of Checkers. Billy Melicher Computer Systems lab 08 2008-2009. Abstract. Machine learning uses past information to predict future states Can be used in any situation where the past will predict the future Will adapt to situations.
E N D
The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08 2008-2009
Abstract • Machine learning uses past information to predict future states • Can be used in any situation where the past will predict the future • Will adapt to situations
Introduction • Checkers is used to explore machine learning • Checkers has many tactical aspects that make it good for studying
Background • Minimax • Heuristics • Learning
Minimax • Method of adversarial search • Every pattern(board) can be given a fitness value(heuristic) • Each player chooses the outcome that is best for them from the choices they have
Minimax • Has exponential growth rate • Can only evaluate a certain number of actions into the future – ply
Heuristic • Heuristics predict out come of a board • Fitness value of board, higher value, better outcome • Not perfect • Requires expertise in the situation to create
Heuristics • H(s) = c0F0(s) + c1F1(s) + … + cnFn(s) • H(s) = heuristic • Has many different terms • In checkers terms could be: • Number of checkers • Number of kings • Number of checkers on an edge • How far checkers are on board
Learning by Rote • Stores every game played • Connects the moves made for each board • Relates the moves made from a particular board to the outcome of the board • More likely to make moves that result in a win, less likely to make moves resulting in a loss • Good in end game, not as good in mid game
Learning by Generalization • Uses a heuristic function to guide moves • Changes the heuristic function after games based on the outcome • Good in mid game but not as good in early and end games • Requires identifying the features that affect game
Development • Use of minimax algorithm with alpha beta pruning • Use of both learning by Rote and Generalization • Temporal difference learning
Temporal Difference Learning • In temporal difference learning, you adjust the heuristic based on the difference between the heuristic at one time and at another • Equilibrium moves toward ideal function • U(s) <-- U(s) + α( R(s) + γU(s') - U(s))