150 likes | 334 Views
Game Playing. Evolve a strategy for two-person zero-sum games. Help the user to determine the next move. Constructing a game tree Each node represents a state in the game Each arc represents a legal move The minimax algorithm Alpha-beta pruning. Example: Minimax Algorithm. Game Tree:
E N D
Game Playing • Evolve a strategy for two-person zero-sum games. • Help the user to determine the next move. • Constructing a game tree • Each node represents a state in the game • Each arc represents a legal move • The minimax algorithm • Alpha-beta pruning
Example: Minimax Algorithm • Game Tree: • We want to maximize player X’ score. • A value of 1 indicates a win for player X and a loss for player O. • A value of 0 indicates a win for player O and a loss for player X. 1 1 0 1 0 1 1
Heuristics • Not viable to generate the entire game tree. • Use of heuristics • Example : Tic-Tac-Toe • Number of possible wins for X minus number of possible wins for O. 8 – 5 = 3 4 – 5 = -1
Example: Minimax Algorithm 16 16 8 32 16 8 24
Operators • Terminals – Legal moves, i.e. left and right • Functions: CXM1, CXM2, COM1, COM2 • XM1: first move made by player X • XM2: second move made by player X • OM1: first move made by player O • OM2: second move made by player O
Fitness Cases • Consists of the possible combinations of L and R for the moves that O can make. • Format: XM1, OM1, XM2, OM2 LLLL LRRR LLLR LRRL
Evaluation • The raw fitness of an individual is the sum of the payoffs for each fitness case. • The hits ratio is the number of fitness cases for which the individual receives a payoff at least as good as the minimax strategy. • What is the raw fitness and hits ratio of the following individuals?
GP Parameters • Population size: 500 • Max. no. of Generations: 51 • Initial Population Generation:The ramped half-and-half method with an initial tree depth of six and a depth limit of seventeen on the size of trees created by the genetic operators. • Method of Selection: Fitness proportionate selection
Game Parameters • The payoff for the pursuer is the time it takes to catch the evader . • The payoff of the evader is the time it remains free. • The information available at each stage of the game is the position of the pursuer and the evader. • A game-playing strategy will specify the angle at which the pursuer must move in order to catch the evader.
Terminals and Functions • T={ X, Y , R } • X - x-coordinate of the position of the evader • Y – Y-coordinate of the position of the evader • R – ephemeral constant in the range [-1, 1] • F={ +, -, /, EXP, IFLTZ} • EXP – the exponential function • IFLTZ – evaluates its first argument if its second argument is less than zero else it evaluates its third arguments
Evaluation • This fitness cases consists of 20 different positions of the evader on the plane, i.e. a set of (X, Y) coordinate values. • The raw fitness of an individual is average time required to catch the evader over the 20 fitness cases. • An upper limit is set on the maximum time permitted. The hits ratio is the number of fitness cases for which this time limit is not exceeded.