460 likes | 647 Views
Previously…. Adversarial Search. AIMA Chapter 5.1 – 5.5. AI vs. Human Players: the State of the Art. To Be Updated next year!. Deterministic Games in Practice.
E N D
Adversarial Search AIMA Chapter 5.1 – 5.5
AI vs. Human Players: the State of the Art To Be Updated next year!
Deterministic Games in Practice • Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used a precomputed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions. • Chess: Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply. • Othello: human champions refuse to compete against computers, who are too good.
Outline • Adversarial search problems (aka games) • Optimal (i.e., Minimax) decisions • - pruning • Imperfect, real-time decisions
Let’s Play! • Two players in a zero-sum game: • Winner gets paid and loser pays. • Easy to think in terms of a max player and min player • Player 1 wants to maximize value (MAX player) • Player 2 wants to minimize value (MIN player)
Game: Problem Formulation A gameis defined by 7 components: • Initial state • States • Players: defines which player has the move in state • Actions : returns set of legal moves in state • Transition model : returns state that results from the move in state .
Game: Problem Formulation A gameis defined by 7 components: • Terminal test returns true if game is over and false otherwise • Terminal states: states where the game has ended • Utility function gives final numeric value for a game that ends in terminal state for a player • Chess: White wins ;Black wins ; draw .
Game Tree (2-Player, Deterministic, Turn-Taking) • Key property:zero-sum game • Loosely, it means that there’s a loser for every winner • Total utility score over all agents sum to zero (or constant) • Aka constant-sum game • Makes the game adversarial • Non Zero-Sum Games?
Example : Game of NIM Several piles of sticks are given. We represent the configuration of the piles by a monotone sequence of integers, such as . A player may remove, in one turn, any number of sticks from one pile. Thus, would become if the player were to remove sticks from the last pile. The player who takes the last stick loses. • Represent the NIM game as a game tree.
Player Strategies A strategy for player : what will player do at every node of the tree that they make a move in? Need to specify behavior in states that will never be reached!
Winning Strategy A strategy for player 1 is called winning if for any strategy by player 2, the game ends with player 1 as the winner. A strategy for player 1 is called non-losing if for any strategy by player 2, the game ends in either a tie or a win for player 1. Theorem (Von Neumann): in the game of chess, only one of the following is true: • White has a winning strategy • Black has a winning strategy • Each player has a non-losing strategy.
Optimal Strategy at Node - Minimax Intuitively, • MAX chooses move to maximize the minimum payoff • MIN chooses move to minimize the maximum payoff
Minimax Play(Subperfect Nash Equilibrium) Backwards Induction P2 P1
Minimax Play(Subperfect Nash Equilibrium) Backwards Induction P2 P1 P2
Minimax Play(Subperfect Nash Equilibrium) Backwards Induction P2 P1 P2 P2
Minimax Play(Subperfect Nash Equilibrium) Backwards Induction P2 P1 P2 P2
Minimax Play(Subperfect Nash Equilibrium) What are the optimal strategies in this game? P2 P1 P2 P2
Properties of Minimax • Yes (if game tree is finite) • Yes (optimal gameplay) • Like DFS:
Minimax Algorithm • Runs in time polynomial in tree size • Returns a sub-perfect Nash equilibrium: the best action at every choice node. Are we done here?
Backwards Induction • Game trees are huge: chess has game tree with nodes (planet Earth has atoms) • Impossible to expand the entire tree
-Pruning • Basic idea:“If you have an idea that is surely bad, don't take the time to see how truly awful it is.” -- Pat Winston • Maintain a lower bound and upper bound of the values of, respectively, MAX’s and MIN’s nodes seen thus far. We can prune subtrees that will never affect minimax decision.
MIN MAX
MIN MAX
MIN MAX
MIN MAX
MIN MAX
MIN MAX
MIN MAX No point exploring the last two nodes! Choosing results in a loss of at least 7…
-Pruning • For each MAX node , is the highest observed value found on path from ; initially • For each MIN node , is the lowest observed value found on path from; initially • Alpha prune: given a MIN node , stop searching below if there is some MAX ancestor of with • Beta prune: given a MAX node , stop searching below if there is some MIN ancestor of with
Analysis of α-β Pruning • When we prune a branch, it never affects final outcome. • Good move ordering improves effectiveness of pruning • “Perfect” ordering: time complexity = • Good pruning strategies allow us to search twice as deep! • Chess: simple ordering (checks, then take pieces, then forward moves, then backwards moves) gets you close to best-case result. • It makes sense to have good expansion order heuristics. • Random ordering: time complexity = for
Summary:- Pruning Algorithm • Initially, , • is max along search path containing • is min along search path containing • If a MIN node has value , no need to explore further. • If a MAX node has value , no need to explore further.
Time Limit • Problem: very large search space in typical games • Solution:-pruning removes large parts of search space • Unresolved problem: Maximum depth of tree • Standard solutions: • evaluation function = estimated expected utility of state • cutoff test: e.g., depth limit
Heuristic Minimax Value Run minimax until depth ; then start using the evaluation function to choose nodes.
Evaluation Functions • An evaluation function is a mapping from game states to real values: • So far: • Let’s move beyond that… • Should be cheap to compute • For non-terminal states, must be strongly correlated with actual chances of winning • Tic-Tac-Toe:
Evaluation Functions • Chess: • Alan Turing’s evaluation function: where is the point value of white’s pieces and is the point value of black’s pieces. • Modern evaluation functions: weighted sum of position features • Example features: piece count, piece placement, controlled squares etc. • Deep Blue has about features. • How do we determine weights? Do they change dynamically?
Evaluation Functions • Suppose that • It is possible that for we have for all . • The evaluation function does not differentiate between and ! • … but it can tell us an expected utility for all nodes sharing feature values.
Let : all states whose feature values are as specified in the vector . • “All states where white has two pawns but black has a bishop.” • Suppose we know that in this case, • Black wins of games • White wins of games • of games end in a draw • Then expected utility for black is • Evaluation function need not return actual expected values, just maintain relative order of states.
Cutting Off Search • Modify minimax or -pruning algorithms by replacing • Terminal-Test(state) with Cutoff-Test(state, depth) • Utility(state) is replaced by Eval(state) • Can also be combined with iterative deepening
Stochastic Games • Many games have randomization: • Backgammon • Settlers of Catan • Poker • How do we deal with uncertainty? • Can we still use minimax?
Adding Chance Layers Calculate the expected value of a state (MUCH harder than deterministic games)
Example: Coin Toss Game(example from Brown and Sandholm “Safe and Nested Subgame Solving for Imperfect-Information Games”, NIPS 2017) • An unbiased coin is tossed; MAX player sees the result, MIN player doesn’t. • MAX player can choose to: • Sell the coin, getting a reward of , if the coin lands on heads and otherwise. • Play, in which case it’s player MIN’s turn. • In player MIN’s turn, needs to guess either “heads” or “tails”. Gets reward of 1 for correct guess.
Example: Coin Toss Game(example from Brown and Sandholm “Safe and Nested Subgame Solving for Imperfect-Information Games”, NIPS 2017) Play Play Sell Sell Heads Heads Tails Tails
Example: Coin Toss Game(example from Brown and Sandholm “Safe and Nested Subgame Solving for Imperfect-Information Games”, NIPS 2017) • We allow players to choose random strategies • MIN’s optimal strategy depends on - a value outside MIN’s subtree! “how desperate was MAX to let me play?” • Computing optimal strategies in perfect information games: no need to know what happens outside your subtree!