COMP-4640: Intelligent & Interactive Systems Game Playing

COMP-4640: Intelligent & Interactive Systems Game Playing • A game can be formally defined as a search problem with: • An initial state • a set of operators (actions or moves) • a terminal test • a utility (payoff) function

COMP-4640: Intelligent & Interactive SystemsGame Playing • Multi-agent environment • Multi-player games involve planning and acting in environments populated by other active agents • Agents use sense/plan/act architecture that does not plan too far into the unpredictable future • But with proper information agent can construct plan that consider the effects of the actions of other agents • In AI we will consider the special case of a games, • deterministic • turn taking • two-player • zero sum games of perfect-information • Zero Sum Games • either one of them wins (and the other loses), or a draw results • +1 win -1 loss 0 draw • Agents utility functions make the games adversarial

COMP-4640: Intelligent & Interactive SystemsGame Playing Multi-agent environment Robot Soccer

Game tree (2-player, deterministic, turns)

COMP-4640: Intelligent & Interactive Systems Game Playing The Minimax Algorithm

COMP-4640: Intelligent & Interactive SystemsGame PlayingThe Minimax Algorithm

COMP-4640: Intelligent & Interactive SystemsGame Playing • The evaluation function: • Must have the same terminal states (goal states) as the utility function • Must be of reasonable complexity so that it can be computed quickly (this is a trade-off between Accuracy and Time) • Should be accurate • The performance of the game playing system depends on the accuracy “goodness” of the evaluation function

COMP-4640: Intelligent & Interactive SystemsGame Playing • One problem with using minimax is that it may not be feasible to search the whole game tree for a minimax decision (move or action) • Using depth-limited search may speed thing up the minimax decision process but instead of using the utility function one would need to construct an evaluation fuction. • This evaluation function would provide an estimate of the expected utility of a game position

COMP-4640: Intelligent & Interactive SystemsGame Playing Properties of minimax • Complete? Yes (if tree is finite) • Optimal? Yes (against an optimal opponent) • Time complexity? O(bm) • Space complexity? O(bm) (depth-first exploration) • For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible

COMP-4640: Intelligent & Interactive SystemsGame Playing Once we have developed a good evaluation function, we must also consider: • The depth-limit • The Horizon Problem • Difficult to eliminate • When a program is facing a move by the opponent that causes serious damage and is ultimately unavoidable • Stalling pushes the move over the horizon to a place where it can’t be detected

COMP-4640: Intelligent & Interactive SystemsGame Playing • Once we have an evaluation function and a depth-limit we can then re-apply minimax search. • However, for depth-limited search minimax may still be inefficient. • Minimax will expand nodes that need not be searched. • By making our search method more efficient, we will be able to search at deeper levels of our game tree.

COMP-4640: Intelligent & Interactive SystemsGame Playing: Alpha-Beta Pruning • Search below a MIN node may be alpha-pruned if the beta value is < to the alpha value of some MAX ancestor. 2. Search below a MAX node may be beta-pruned if the alpha value is > to the beta value of some MIN ancestor. 2 7

Alpha-Beta Pruning (αβ prune) • Rules of Thumb • α is the highest max found so far • β is the lowest min value found so far • If Min is on top Alpha prune • If Max is on top Beta prune • You will only have alpha prune’s at Min level • You will only have beta prunes at Max level • See detailed algorithm p167

COMP-4640: Intelligent & Interactive SystemsGame Playing: Alpha-Beta Pruning • Search below a MIN node may be alpha-pruned if the beta value is < to the alpha value of some MAX ancestor. 2. Search below a MAX node may be beta-pruned if the alpha value is > to the beta value of some MIN ancestor. 2 7

COMP-4640: Intelligent & Interactive SystemsGame Playing: Alpha-Beta Pruning • Search below a MIN node may be alpha-pruned if the beta value is < to the alpha value of some MAX ancestor. 2. Search below a MAX node may be beta-pruned if the alpha value is > to the beta value of some MIN ancestor. 3 3 3 5 3 9 2 3 5 β

COMP-4640: Intelligent & Interactive SystemsGame Playing: Alpha-Beta Pruning • Search below a MIN node may be alpha-pruned if the beta value is < to the alpha value of some MAX ancestor. 2. Search below a MAX node may be beta-pruned if the alpha value is > to the beta value of some MIN ancestor. 3 0 3 0 3 9 0 7 α 3 9 7 0 2 3 5 9 7 4 β

COMP-4640: Intelligent & Interactive SystemsGame Playing: Alpha-Beta Pruning • Search below a MIN node may be alpha-pruned if the beta value is < to the alpha value of some MAX ancestor. 2. Search below a MAX node may be beta-pruned if the alpha value is > to the beta value of some MIN ancestor. 3 3 2 0 3 0 2 3 9 0 7 2 6 α α 2 6 3 9 0 7 0 2 1 5 6 2 3 5 9 7 4 β

COMP-4640: Intelligent & Interactive SystemsGame Playing

COMP-4640: Intelligent & Interactive SystemsGame Playing 5 5 3 5 3 3 7 6 5 5 6 3 5 1 3 2 0 6 7 4

COMP-4640: Intelligent & Interactive SystemsGame Playing 5 5 3 5 3 3 7 6 5 α 5 6 3 5 1 3 2 0 6 7 4 β

COMP-4640: Intelligent & Interactive SystemsGame of Chance: Expecti-minimax • Initial value of leaves indicate board state • Use percentage chance based upon roll for first calculated value • Min eval f(n) selects Max value • The second roll uses different assigned percentage chance • Max eval f(n) selects Max value

COMP-4640: Intelligent & Interactive SystemsGame of Chance: Expecti-minimax 0 3 0 (3*1.0) 3 0 • Initial value of leaves indicate board state • Use percentage chance based upon roll for first calculated value • Min eval f(n) selects Max value • The second roll uses different assigned percentage chance • Max eval f(n) selects Max value

COMP-4640: Intelligent & Interactive SystemsGame of Chance: Expecti-minimax 0 6 3 6 0 9 (3*1.0) 3 0 6 3 0 6 9 12 • Initial value of leaves indicate board state • Use percentage chance based upon roll for first calculated value • Min eval f(n) selects Max value • The second roll uses different assigned percentage chance • Max eval f(n) selects Max value

COMP-4640: Intelligent & Interactive SystemsGame of Chance: Expecti-minimax 2 2 (0*0.67 + 6*0.33) 0 6 3 6 0 9 (3*1.0) 3 0 6 3 0 6 9 12 • Initial value of leaves indicate board state • Use percentage chance based upon roll for first calculated value • Min eval f(n) selects Max value • The second roll uses different assigned percentage chance • Max eval f(n) selects Max value

COMP-4640: Intelligent & Interactive SystemsGame of Chance: Expecti-minimax 2 2 (0*0.67 + 6*0.33) 0 0 6 6 3 6 12 0 0 9 3 6 (3*1.0) 3 0 6 3 0 6 9 12 • Initial value of leaves indicate board state • Use percentage chance based upon roll for first calculated value • Min eval f(n) selects Max value • The second roll uses different assigned percentage chance • Max eval f(n) selects Max value

COMP-4640: Intelligent & Interactive SystemsGame of Chance: Expecti-minimax 2 2 2 2 2 (0*0.67 + 6*0.33) (0*0.67 + 6*0.33) 0 0 6 6 3 6 12 0 0 9 3 6 (3*1.0) 3 0 6 3 0 6 9 12 • Initial value of leaves indicate board state • Use percentage chance based upon roll for first calculated value • Min eval f(n) selects Max value • The second roll uses different assigned percentage chance • Max eval f(n) selects Max value

Cutting off search MinimaxCutoff is identical to MinimaxValue except • Terminal? is replaced by Cutoff? • Utility is replaced by Eval Does it work in practice? bm = 106, b=35  m=4 4-ply lookahead is a hopeless chess player! • 4-ply ≈ human novice • 8-ply ≈ typical PC, human master • 12-ply ≈ Deep Blue, Kasparov

COMP-4640: Deterministic games in practice • Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used a precomputed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions. • Chess: Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply. • Othello: human champions refuse to compete against computers, who are too good. • Go: human champions refuse to compete against computers, who are too bad. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves.

http://www.research.ibm.com/deepblue/

COMP-4640: Intelligent & Interactive Systems Game Playing