390 likes | 630 Views
Agents that can play multi-player games. Recall: Single-player, fully-observable, deterministic game agents. An agent that plays Peg Solitaire involves A representation of the initial state; A method to generate new states from existing ones; A test for whether a state is a goal state.
E N D
Recall: Single-player, fully-observable, deterministic game agents An agent that plays Peg Solitaire involves • A representation of the initial state; • A method to generate new states from existing ones; • A test for whether a state is a goal state. Initial Board for Triangle Peg Solitaire A jump, with resulting board The goal state:
Recall: Single-player, fully-observable, deterministic game agents Initial state Successor state axioms or STRIPS effects Initial Board for Triangle Peg Solitaire … A jump, with resulting board The goal state: Goal state
Recall: Single-player, fully-observable, deterministic game agents Initial state Successor state axioms or STRIPS effects Initial Board for Triangle Peg Solitaire … A jump, with resulting board The goal state: Goal state
Goal state vs. Terminal states and Utilities terminal states Utility: +2 The goal state: Utility: +1 Utility: -1
Quiz: Goal states vs. Terminal states and Utilities What could go wrong when using A* or breadth-first or other strategies with terminal states? Initial state Successor state axioms or STRIPS effects … -1 +2 +1 Terminal states
Answer: Goal states vs. Terminal states and Utilities You’re guaranteed to find the best path to the terminal state that is found. You’re NOT guaranteed to find the best terminal state (the one with highest utility), unless you do an exhaustive search. Initial state Successor state axioms or STRIPS effects … -1 +2 +1 Terminal states
Hex: Two-player, zero-sum game (Also, deterministic and fully-observable.) • Hex: • Two players, red and blue. • Board is N x N, with hexagonal spaces. • Two opposite sides are red, and other two sides are blue. • Each player’s objective is to build a path connecting the sides of his or her color. • Players alternate turns, and place a single piece of their color on their turn.
Hex: Two-player, zero-sum game • Some fun facts: • There are no ties in Hex (proved by John Nash). • First player has a distinct advantage (also proved by Nash). • In tournament play, it’s common to use the “pie rule”, for fairness: after the first player makes the first move, the second player can choose whether to switch sides. (We will ignore this rule.)
Hex Question What is red’s best move (red’s turn next)?
Hex Question What is red’s best move (red’s turn next)? This orange one looks pretty good: only one more square, and red will win. Using a simple heuristic, this looks like it’s getting close to the goal.
Hex Question What is red’s best move (red’s turn next)? However, if red moves to the orange square, the blue player can win on the next turn!
Quiz: Hex Question If red moves to the orange square, what is blue’s best move?
Answer: Hex Question Blue has no good moves left!
Answer: Hex Question Blue has no good moves left! This one’s bad – red can still connect the paths.
Answer: Hex Question Blue has no good moves left! And this one’s bad too – red can still connect the paths.
Reasoning about 2-player games To pick a good move, each player has to think about the other player’s possible responses!
Extensive Form Representation of Games Notation: • two players, Max (Δ) and Min (∇). • Terminal states are represented by a with a number for the utility for Max (Δ) inside. (Since we’re doing zero-sum games, the utility for Min (∇) is just the opposite of this number.)
Extensive Form Representation of Games Game tree: ∆ Max’s turn Max’s possible actions ∇ ∇ ∇ Min’s turn Resulting worlds/boards Min’s possible actions Resulting worlds/boards ∆ ∆ ∆ ∆ ∆ ∆ ∆ ∆ ∆ Max’s turn … … -1 Terminal states, with utility for Max … +2 +1
Minimax(Backup) Algorithm Basic Idea: Compute ∆’s Value(n) for each node n in the game tree, starting with the leaves and working up (“backup”). We’ll use a depth-first tree traversal. Once this is calculated, Max will choose an action that leads to a child node with the highest possible value. ∆ ∇ ∇ ∇ 20 4 8 15 4 12 2 3 1
Minimax(Backup) Algorithm Value(n) = • If n is a terminal node, Value(n) = ∆’s utility • If n is ∆’s turn: • If n is ∇’s turn: ∆ ∇ ∇ ∇ 20 4 8 15 4 12 2 3 1
Minimax(Backup) Algorithm Value(n) = • If n is a terminal node, Value(n) = Max’s utility • If n is ∆’s turn: • If n is ∇’s turn: ∆ Value: min {3, 4, 4} = 3 Value: min {2, 30, 15} = 2 ∇ ∇ ∇ 20 4 8 15 4 12 2 3 1
Quiz: Minimax (Backup) Algorithm Value(n) = • If n is a terminal node, Value(n) = Max’s utility • If n is ∆’s turn: • If n is ∇’s turn: What is the Value of the middle ∇ node? What is the value of the top ∆ node? ∆ Value: min {3, 4, 4} = 3 Value: min {2, 30, 15} = 2 ∇ ∇ ∇ 20 4 8 15 4 12 2 3 1
Answer: Minimax (Backup) Algorithm • What is the Value of the middle ∇ node? • min {1, 8, 12} = 1 • What is the value of the top ∆ node? • Max {3, 1, 2} = 3 Value(n) = • If n is a terminal node, Value(n) = Max’s utility • If n is ∆’s turn: • If n is ∇’s turn: ∆ ∇ ∇ ∇ 20 4 8 15 4 12 2 3 1
Quiz: Minimax • Compute the value of each node in the game tree. • Which action should Max take? • What is Min’s optimal response? ∆ a b c ∇ ∇ ∇ ∆ ∆ ∆ 20 4 12 ∆ ∆ 1 6 9 5 7 4 10 2 30 -9 15
Answer: Minimax • Compute the value of each node in the game tree. • Which action should Max take? Action on right (c) • What is Min’s optimal response? Action on right ∆ 15 a b c ∇ ∇ ∇ 4 1 15 ∆ ∆ ∆ 20 4 12 ∆ 6 ∆ 7 1 10 30 15 6 9 5 7 4 10 2 30 -9 15
From Extensive Form toNormal Form Games Every “extensive form” game (even ones where you don’t have zero-sum utilities) can be made into a “normal form” game. ∆ A B ∇ ∇ D C D C ∆ ∆ 4 1 Each sequence of actions for a player becomes a row or a column. The size of the resulting matrix can be exponential in the size of the game tree. A A B B 5 7 4 10
From Normal Form games toExtensive Form games Not every Normal Form game can be represented using the Extensive Form I have showed you so far. ∆ C D ∆ ? ∇ ∇ ∇ D C D C ∇ 2 4 -3 -3 C D ? ∆ ∆ D C D C 2 4 -3 -3
From Normal Form games toExtensive Form games Can introduce new notation – information states – that allows the Extensive Form to represent any Normal Form game. ∆ C D ∆ ∇ ∇ ∇ D C D C ∇ 2 4 -3 -3 C D ∆ ∆ D C D C 2 4 -3 -3
From Normal Form games toExtensive Form games Information states are also useful for handling Partial Observability in turn-based games. Eg, in Poker, they can be used to represent the set of all hands your opponent may have been dealt. ∆ C D ∆ ∇ ∇ ∇ D C D C ∇ 2 4 -3 -3 C D ∆ ∆ D C D C 2 4 -3 -3
Perfect Information Games Definition: A game in extensive form has perfect information if every information state has only one node. (This is the same as our original version of game trees.) Perfect Information is basically just another name for full observability for game trees. We’ll talk more about partial observability later. Theorem (Zermelo, 1913): Every finite, perfect-information game in extensive form has a pure-strategy Nash equilibrium.
Relation between Minimax Algorithm and Minimax Theorem Recall that the Minimax Theorem says every 2-player, zero-sum game has a Value for each player and a Nash Equilibrium. The guy who proved this (von Neumann) used essentially the Minimax algorithm to prove the theorem. The Value of the root node in the Minimax algorithm is the same as the Value of the game for the Max player.
Quiz: Time Complexity of Minimax Let b be the branching factor of the game tree. Let m be the depth of the game tree. What is the time complexity of Minimax? O(b+m)? O(bm)? O(bm)? O(mb)? ∆ ∇ ∇ ∇ ∆ ∆ ∆ 20 4 12 ∆ ∆ 1 6 9 5 7 4 10 2 30 -9 15
Answer: Time Complexity of Minimax Let b be the branching factor of the game tree. Let m be the depth of the game tree. What is the time complexity of Minimax? O(b+m)? O(bm)? O(bm) O(mb)? ∆ ∇ ∇ ∇ ∆ ∆ ∆ 20 4 12 ∆ ∆ 1 6 9 5 7 4 10 2 30 -9 15
Quiz: Space Complexity of Minimax Let b be the branching factor of the game tree. Let m be the depth of the game tree. What is the space complexity of Minimax? O(b+m)? O(bm)? O(bm)? O(mb)? ∆ ∇ ∇ ∇ ∆ ∆ ∆ 20 4 12 ∆ ∆ 1 6 9 5 7 4 10 2 30 -9 15
Answer: Space Complexity of Minimax Let b be the branching factor of the game tree. Let m be the depth of the game tree. What is the space complexity of Minimax? O(b+m)? O(bm) O(bm)? O(mb)? ∆ ∇ ∇ ∇ ∆ ∆ ∆ 20 4 12 ∆ ∆ 1 6 9 5 7 4 10 2 30 -9 15
Quiz: Complexity of Minimax Chess: has an average branching factor of ~30, and each game takes on average ~40. If it takes ~1 milli-second to compute the value of each board position in the game tree, how long to figure out the value of the game using Minimax? A few milliseconds A few seconds A few minutes A few hours A few days A few years? A few decades? A few millenia(thousands of years)? More time than the age of the universe?
Quiz: Complexity of Minimax Chess: has an average branching factor of ~30, and each game takes on average ~40. If it takes ~1 milli-second to compute the value of each board position in the game tree, how long to figure out the value of the game using Minimax? A few milliseconds A few seconds A few minutes A few hours A few days A few years? A few decades? A few millenia(thousands of years)? More time than the age of the universe
Strategies for coping with complexity • Reduce b • Reduce m • Memoize