1 / 49

Adversarial Search Lecture # 10 & 11

Explore the Minimax algorithm for optimal decision-making in two-player zero-sum games. Learn how to set up game trees, evaluate moves, and determine the best strategy despite an infallible opponent.

jamieh
Download Presentation

Adversarial Search Lecture # 10 & 11

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adversarial SearchLecture # 10 & 11

  2. Adversarial Search • We’ll set up a framework for formulating a multi-person game as a search problem. • We will consider games in which the players alternatively make moves and try respectively • to maximize and minimize a scoring function (also called utility function). • To simplify things a bit, we will only consider games with the following two properties: • Two player - we do not deal with coalitions, etc. • Zero sum - one player's win is the other's loss; there are no cooperative victories

  3. Game Trees • Such a category of games can be represented as a tree, where • the nodes represent the current state of the game --- and • the arcs represent the moves. • The game tree consists of all possible moves for the current players starting at the root and all possible moves for the next player as the children of these nodes, and so forth …. • Each individual move by one player is called a "ply". • The leaves of the game tree represent terminal positions as one where the outcome of the game is clear (a win, a loss, a draw, a payoff). • Each terminal position has a score. • High scores are good for one of the player, called the MAX player. The other player, called MIN player, tries to minimize the score. • For example, we may associate 1 with a win, 0 with a draw and -1 with a loss for MAX.

  4. Game tree (2-player, deterministic, turns)

  5. Example : Game of Tic-Tac-Toe • Opposite is a section of a game tree .. • Each node represents a board position, and the children of each node are the legal moves from that position. • To score each position, give each position which is favorable for player 1 a positive number (the more positive, the more favorable). • Similarly, give each position which is favorable for player 2 a negative number (the more negative, the more favorable). • player 1 is 'X', player 2 is 'O', and the only three scores we will have are +1 for a win by 'X', -1 for a win by 'O', and 0 for a draw. • Note here that the blue scores are the only ones that can be computed by looking at the current position.

  6. How to Take Optimal Decisions • Formulate as a search problem – Initial state: • Board positions + identifying players turn – Successor function • Returns list of (move, state) pairs describing legal moves and resulting state. – Terminal test • Is the game over – Utility function (Objective function) • Provides numeric value for terminal state (-1,0,+1)

  7. CONTD…

  8. Minimax strategy Find the optimal strategy for MAX assuming an infallible MIN opponent Need to compute this all down the tree Assumption: Both players play optimally! Given a game tree, the optimal strategy can be determined by using the minimax value of each node.

  9. Two-Ply Game Tree

  10. Two-Ply Game Tree

  11. Two-Ply Game Tree

  12. Two-Ply Game Tree Minimax maximizes the utility for the worst-case outcome for max The minimax decision

  13. Minimax • Perfect play for deterministic games • Idea: choose move to position with highest minimax value = best achievable payoff against best play

  14. What if MIN does not play optimally? Definition of optimal play for MAX assumes MIN plays optimally: maximizes worst-case outcome for MAX But if MIN does not play optimally, MAX will do even better Can prove this (Problem 6.2)

  15. Minimax Algorithm For each move by the computer 1. Perform depth-first search as far as the terminal state is achieved.. 2. Assign utilities at each terminal state 3. Propagate upwards the minimax choices If the parent is a minimizer (opponent) Propagate up the minimum value of the children If the parent is a maximizer (computer) Propagate up the maximum value of the children 4. Choose the move (the child of the current node) corresponding to the maximum of the minimax values of the children

  16. Algorithm –Another Version minimax(player,board) if(game over in current board position) return winner children = all legal moves for player from this board if(max's turn) return maximal score of calling minimax on all the children else (min's turn) return minimal score of calling minimax on all the children • If the game is over in the given position, then there is nothing to compute; • minimax will simply return the score of the board. • Otherwise, minimax will go through each possible child, • and (by recursively calling itself) evaluate each possible move. • Then, the best possible move will be chosen, • where ‘best’ is the move leading to the board with the most positive score for player 1, and the board with the most negative score for player 2.

  17. How long does this algorithm take? • For a simple game like tic tac toe, not too long - it is certainly possible to search all possible positions. • For a game like Chess, however, the running time is prohibitively expensive. • To completely search such games, we’d need to develop interstellar travel, • as by the time we finish analyzing a move the sun will have gone nova and the earth will no longer exist. • Therefore, all real computer games will search, not to the end of the game, but only a few moves ahead. • The program must determine whether a certain board position is 'good' or 'bad' for a certainly player…. • This is often done using an evaluation function. • This function is the key to a strong computer game. • The depth bound search may stop just as things get interesting... • The search may also tend to postpone bad news until after the depth bound leading to the horizon effect.

  18. Complexity • Time Complexity • O(bm) • Space Complexity • O(bm) • Where b is the branching factor and m is the maximum depth

  19. EXAMPLE

  20. CONTD…

  21. CONTD…

  22. CONTD…

  23. Practical problem with minimax search Number of game states is exponential in the number of moves. Solution: Do not examine every node => pruning Remove branches that do not influence final decision Revisit the example …(do it…)

  24. Alpha-Beta Pruning • ALPHA-BETA pruning is a method that reduces the number of nodes explored in Minimax strategy. • It reduces the time required for the search and it must be restricted so that no time is to be wasted searching moves that are obviously bad for the current player. • The exact implementation of alpha-beta keeps track of the best move for each side as it moves throughout the tree. • Use pruning to eliminate parts of the tree from consideration • α – value of best choice along MAX’s path • β – value of best choice along MIN’s path

  25. Alpha-Beta Pruning • We proceed in the same (preorder) way as for the minimax algorithm. • Would prune away branches which cannot possibly influence the final decision …… • For the MIN nodes, the score computed starts with +infinity and decreases with time. • For MAX nodes, scores computed starts with - infinity and increase with time. • The efficiency of the Alpha-Beta procedure depends on the order in which successors of a node are examined. Rules for Alpha-beta Pruning • Alpha Pruning:Search can be stopped below any MIN node having a beta value less than or equal to the alpha value of any of its MAX ancestors. • Beta Pruning:Search can be stopped below any MAX node having a alpha value greater than or equal to the beta value of any of its MIN ancestors.

  26. Alpha-Beta Example Do DF-search until first leaf Range of possible values [-∞,+∞] [-∞, +∞]

  27. Alpha-Beta Example (continued) [-∞,+∞] [-∞,3]

  28. Alpha-Beta Example (continued) [-∞,+∞] [-∞,3]

  29. Alpha-Beta Example (continued) [3,+∞] [3,3]

  30. Alpha-Beta Example (continued) [3,+∞] This node is worse for MAX [3,3] [-∞,2]

  31. Alpha-Beta Example (continued) , [3,14] [3,3] [-∞,2] [-∞,14]

  32. Alpha-Beta Example (continued) , [3,5] [3,3] [−∞,2] [-∞,5]

  33. Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [−∞,2]

  34. Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [-∞,2]

  35. Alpha-Beta algorithm • The algorithm maintains two values, alpha and beta, which represent the minimum score that the maximizing player is assured of and the maximum score that the minimizing player is assured of respectively. Initially alpha is negative infinity and beta is positive infinity. • When beta becomes less than alpha, it means that the current position cannot be the result of best play by both players and hence need not be explored further. Pseudocode for the alpha-beta algorithm is given below:

  36. Effectiveness of Alpha-Beta Search Worst-Case branches are ordered so that no pruning takes place. In this case alpha-beta gives no improvement over exhaustive search Best-Case each player’s best move is the left-most alternative (i.e., evaluated first) in practice, performance is closer to best rather than worst-case In practice often get O(b(d/2)) rather than O(bd) this is the same as having a branching factor of sqrt(b), since (sqrt(b))d = b(d/2)

  37. Alpha Beta Pruning - Illustrated Source: Wikipedia.org

  38. Alpha-Beta Pruning -- Example

  39. Alpha-Beta Pruning -- Example

  40. Alpha-Beta Pruning -- Example

  41. Alpha-Beta Pruning -- Example

  42. Alpha-Beta Pruning -- Example

  43. Alpha-Beta Pruning -- Example

  44. Alpha-Beta Pruning -- Example Example taken from: http://www.cs.pitt.edu/~milos/

  45. Evaluation functions • We can terminate the search using an evaluation function • Best decision at that time – Use features to estimate the expected utility from a given point (heuristic) • Need to decide when to terminate – use depth limited search – Iterative deepening with time limit

  46. Do IT ….! -which nodes can be pruned? 6 5 3 4 1 2 7 8

  47. Practical Implementation How do we make these ideas practical in real game trees? Standard approach: cutoff test: (where do we stop descending the tree) depth limit better: iterative deepening cutoff only when no big changes are expected to occur next (quiescence search). evaluation function When the search is cut off, we evaluate the current state by estimating its utility. This estimate is captured by the evaluation function.

  48. 1 Tic-Tac-Toe tree at horizon = 2 Best move 1 -1 -2 6-5=1 5-5=0 6-5=1 5-5=1 4-5=-1 5-4=1 6-4=2 5-6=-1 5-5=0 5-6=-1 6-6=0 4-6=-2

  49. Summary Game playing can be effectively modeled as a search problem Game trees represent alternate computer/opponent moves Evaluation functions estimate the quality of a given board configuration for the Max player. Minimax is a procedure which chooses moves by assuming that the opponent will always choose the move which is best for them Alpha-Beta is a procedure which can prune large parts of the search tree and allow search to go deeper For many well-known games, computer algorithms based on heuristic search match or out-perform human world experts.

More Related