490 likes | 504 Views
Explore the Minimax algorithm for optimal decision-making in two-player zero-sum games. Learn how to set up game trees, evaluate moves, and determine the best strategy despite an infallible opponent.
E N D
Adversarial Search • We’ll set up a framework for formulating a multi-person game as a search problem. • We will consider games in which the players alternatively make moves and try respectively • to maximize and minimize a scoring function (also called utility function). • To simplify things a bit, we will only consider games with the following two properties: • Two player - we do not deal with coalitions, etc. • Zero sum - one player's win is the other's loss; there are no cooperative victories
Game Trees • Such a category of games can be represented as a tree, where • the nodes represent the current state of the game --- and • the arcs represent the moves. • The game tree consists of all possible moves for the current players starting at the root and all possible moves for the next player as the children of these nodes, and so forth …. • Each individual move by one player is called a "ply". • The leaves of the game tree represent terminal positions as one where the outcome of the game is clear (a win, a loss, a draw, a payoff). • Each terminal position has a score. • High scores are good for one of the player, called the MAX player. The other player, called MIN player, tries to minimize the score. • For example, we may associate 1 with a win, 0 with a draw and -1 with a loss for MAX.
Example : Game of Tic-Tac-Toe • Opposite is a section of a game tree .. • Each node represents a board position, and the children of each node are the legal moves from that position. • To score each position, give each position which is favorable for player 1 a positive number (the more positive, the more favorable). • Similarly, give each position which is favorable for player 2 a negative number (the more negative, the more favorable). • player 1 is 'X', player 2 is 'O', and the only three scores we will have are +1 for a win by 'X', -1 for a win by 'O', and 0 for a draw. • Note here that the blue scores are the only ones that can be computed by looking at the current position.
How to Take Optimal Decisions • Formulate as a search problem – Initial state: • Board positions + identifying players turn – Successor function • Returns list of (move, state) pairs describing legal moves and resulting state. – Terminal test • Is the game over – Utility function (Objective function) • Provides numeric value for terminal state (-1,0,+1)
Minimax strategy Find the optimal strategy for MAX assuming an infallible MIN opponent Need to compute this all down the tree Assumption: Both players play optimally! Given a game tree, the optimal strategy can be determined by using the minimax value of each node.
Two-Ply Game Tree Minimax maximizes the utility for the worst-case outcome for max The minimax decision
Minimax • Perfect play for deterministic games • Idea: choose move to position with highest minimax value = best achievable payoff against best play
What if MIN does not play optimally? Definition of optimal play for MAX assumes MIN plays optimally: maximizes worst-case outcome for MAX But if MIN does not play optimally, MAX will do even better Can prove this (Problem 6.2)
Minimax Algorithm For each move by the computer 1. Perform depth-first search as far as the terminal state is achieved.. 2. Assign utilities at each terminal state 3. Propagate upwards the minimax choices If the parent is a minimizer (opponent) Propagate up the minimum value of the children If the parent is a maximizer (computer) Propagate up the maximum value of the children 4. Choose the move (the child of the current node) corresponding to the maximum of the minimax values of the children
Algorithm –Another Version minimax(player,board) if(game over in current board position) return winner children = all legal moves for player from this board if(max's turn) return maximal score of calling minimax on all the children else (min's turn) return minimal score of calling minimax on all the children • If the game is over in the given position, then there is nothing to compute; • minimax will simply return the score of the board. • Otherwise, minimax will go through each possible child, • and (by recursively calling itself) evaluate each possible move. • Then, the best possible move will be chosen, • where ‘best’ is the move leading to the board with the most positive score for player 1, and the board with the most negative score for player 2.
How long does this algorithm take? • For a simple game like tic tac toe, not too long - it is certainly possible to search all possible positions. • For a game like Chess, however, the running time is prohibitively expensive. • To completely search such games, we’d need to develop interstellar travel, • as by the time we finish analyzing a move the sun will have gone nova and the earth will no longer exist. • Therefore, all real computer games will search, not to the end of the game, but only a few moves ahead. • The program must determine whether a certain board position is 'good' or 'bad' for a certainly player…. • This is often done using an evaluation function. • This function is the key to a strong computer game. • The depth bound search may stop just as things get interesting... • The search may also tend to postpone bad news until after the depth bound leading to the horizon effect.
Complexity • Time Complexity • O(bm) • Space Complexity • O(bm) • Where b is the branching factor and m is the maximum depth
Practical problem with minimax search Number of game states is exponential in the number of moves. Solution: Do not examine every node => pruning Remove branches that do not influence final decision Revisit the example …(do it…)
Alpha-Beta Pruning • ALPHA-BETA pruning is a method that reduces the number of nodes explored in Minimax strategy. • It reduces the time required for the search and it must be restricted so that no time is to be wasted searching moves that are obviously bad for the current player. • The exact implementation of alpha-beta keeps track of the best move for each side as it moves throughout the tree. • Use pruning to eliminate parts of the tree from consideration • α – value of best choice along MAX’s path • β – value of best choice along MIN’s path
Alpha-Beta Pruning • We proceed in the same (preorder) way as for the minimax algorithm. • Would prune away branches which cannot possibly influence the final decision …… • For the MIN nodes, the score computed starts with +infinity and decreases with time. • For MAX nodes, scores computed starts with - infinity and increase with time. • The efficiency of the Alpha-Beta procedure depends on the order in which successors of a node are examined. Rules for Alpha-beta Pruning • Alpha Pruning:Search can be stopped below any MIN node having a beta value less than or equal to the alpha value of any of its MAX ancestors. • Beta Pruning:Search can be stopped below any MAX node having a alpha value greater than or equal to the beta value of any of its MIN ancestors.
Alpha-Beta Example Do DF-search until first leaf Range of possible values [-∞,+∞] [-∞, +∞]
Alpha-Beta Example (continued) [-∞,+∞] [-∞,3]
Alpha-Beta Example (continued) [-∞,+∞] [-∞,3]
Alpha-Beta Example (continued) [3,+∞] [3,3]
Alpha-Beta Example (continued) [3,+∞] This node is worse for MAX [3,3] [-∞,2]
Alpha-Beta Example (continued) , [3,14] [3,3] [-∞,2] [-∞,14]
Alpha-Beta Example (continued) , [3,5] [3,3] [−∞,2] [-∞,5]
Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [−∞,2]
Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [-∞,2]
Alpha-Beta algorithm • The algorithm maintains two values, alpha and beta, which represent the minimum score that the maximizing player is assured of and the maximum score that the minimizing player is assured of respectively. Initially alpha is negative infinity and beta is positive infinity. • When beta becomes less than alpha, it means that the current position cannot be the result of best play by both players and hence need not be explored further. Pseudocode for the alpha-beta algorithm is given below:
Effectiveness of Alpha-Beta Search Worst-Case branches are ordered so that no pruning takes place. In this case alpha-beta gives no improvement over exhaustive search Best-Case each player’s best move is the left-most alternative (i.e., evaluated first) in practice, performance is closer to best rather than worst-case In practice often get O(b(d/2)) rather than O(bd) this is the same as having a branching factor of sqrt(b), since (sqrt(b))d = b(d/2)
Alpha Beta Pruning - Illustrated Source: Wikipedia.org
Alpha-Beta Pruning -- Example Example taken from: http://www.cs.pitt.edu/~milos/
Evaluation functions • We can terminate the search using an evaluation function • Best decision at that time – Use features to estimate the expected utility from a given point (heuristic) • Need to decide when to terminate – use depth limited search – Iterative deepening with time limit
Do IT ….! -which nodes can be pruned? 6 5 3 4 1 2 7 8
Practical Implementation How do we make these ideas practical in real game trees? Standard approach: cutoff test: (where do we stop descending the tree) depth limit better: iterative deepening cutoff only when no big changes are expected to occur next (quiescence search). evaluation function When the search is cut off, we evaluate the current state by estimating its utility. This estimate is captured by the evaluation function.
1 Tic-Tac-Toe tree at horizon = 2 Best move 1 -1 -2 6-5=1 5-5=0 6-5=1 5-5=1 4-5=-1 5-4=1 6-4=2 5-6=-1 5-5=0 5-6=-1 6-6=0 4-6=-2
Summary Game playing can be effectively modeled as a search problem Game trees represent alternate computer/opponent moves Evaluation functions estimate the quality of a given board configuration for the Max player. Minimax is a procedure which chooses moves by assuming that the opponent will always choose the move which is best for them Alpha-Beta is a procedure which can prune large parts of the search tree and allow search to go deeper For many well-known games, computer algorithms based on heuristic search match or out-perform human world experts.