460 likes | 497 Views
Minimax and Alpha-Beta. Mike Maxim 17 Apr 03 15-211 Spring 2003. Announcements. Homework 6 has been released! Due May 1 st 11:59PM, so get going! Today’s lecture is very pertinent to this assignment. Quiz Postponed to Tuesday (4/22). Initial Questions.
E N D
Minimax and Alpha-Beta Mike Maxim 17 Apr 03 15-211 Spring 2003
Announcements • Homework 6 has been released! • Due May 1st 11:59PM, so get going! • Today’s lecture is very pertinent to this assignment. • Quiz Postponed to Tuesday (4/22)
Initial Questions • How do we get a program to play games well? • Could I be unstoppable with a computer with massive computation power? • How does Kasparov stay with programs like Deep Junior?
A little more precise… • What sort of “games” do you mean? • Strategy/Board Type Games • Chess • Othello • Go • Tic-Tac-Toe • Let’s look at how we might go about playing Tic-Tac-Toe
Consider this position We are playing X, and it is now our turn.
Let’s write out all possibilities Each number represents a position after each legal move we have.
Now let’s look at their options Here we are looking at all of the opponent responses to the first possible move we could make.
Now let’s look at their options Opponent options after our second possibility. Not good again…
Now let’s look at their options Struggling…
More interesting case Now they don’t have a way to win on their next move. So now we have to consider our responses to their responses.
Our options We have a win for any move they make. So the original position in purple is an X win.
Finishing it up… They win again if we take our fifth move.
Summary of the Analysis So which move should we make? ;-)
Looking closer at the process • Traverse the “game tree”. • Enumerate all possible moves at each node. The children of that node are the positions that result from making each move. A leaf is a position that is won or drawn for some side. • Make the assumption that we pick the best move for us, and the opponent picks the best move for him (causes most damage to us) • Pick the movethatmaximizes the minimum amount of success for our side. • This process is known as the Minimaxalgorithm.
Maximizing Success • In Tic-Tac-Toe there are only three forms of success: Win, Tie, Lose. • So the point is, if you have a move that leads to a Win make it. If you have no such move, then make the move that gives the tie. If not even this exists, then it doesn’t matter what you do.
When can we use Minimax? • Game Properties required for Minimax • Two players • No chance (coin flipping) • Perfect information • No hidden cards or hidden Chess pieces… • Non-Minimax Games • Poker (or any game that involves bluffing or somehow outwitting your opponent) • Arcade Games…
Example • The Game of Nim. • Each player alternates turns. • For each turn a player selects some amount of pennies from one of the stacks. • The player that takes the last penny wins.
Example • Let’s start with a simple configuration of Nim and use Minimax to select a move. • Our initial configuration consists of three piles, with 1, 2, and 3 pennies in each pile. • We can represent this configuration compactly by writing it as (1,2,3). Each position in this list represents the number of pennies in that stack. Order does not matter (I can just rearrange the stacks).
Drawing the Game Tree The first thing we need to take care of Is drawing the game tree. (1,2,3) (2,3) (1,1,3) (1,2,2) (1,3) (1,1,2) (1,2) One level of the tree. Whose move is it now?
Drawing the Game Tree (1,2,3) Us Them (2,3) (1,1,3) (1,2,2) (1,3) (1,1,2) (1,2) (2,2) (1,2) (2) (1,3) (3) (1,1,2) (1,1,1) (1,1) (1) Us (3) (2) (1) win (1,1) (1,2) (1,1,1) Them (2) (1) loss (1,1) Us (1) win Them Us loss
Some notes on the tree • Each level of the tree is called a ply. Our current tree is 6-ply deep. • To get the outcome at the root of the tree, we start at the bottom and work our way up. • If it is our turn (level is labeled Us) then we pick the maximum outcome from the children. • If it is the opponent’s turn (level is labeled Them) then we pick the minimum outcome from the children.
Analyzing the Tree (1,2,3) loss Us Them (2,3) loss (1,1,3) loss (1,2,2) loss (1,3) loss (1,1,2) loss (1,2) loss (2,2) (1,2) win (2) win (1,3) win (3) win (1,1,2) win (1,1,1) win (1,1) loss (1) win Us loss (3) loss (2) loss (1) loss win (1,1) win (1,2) loss (1,1,1) loss Them (2) win (1) win loss (1,1) loss Us (1) loss win Them Us loss
What did we just find out? • We lose no matter what we do. • If our opponent plays like he should, there is nothing we can do. • Keep in mind we didn’t really use any “strategy” here. We just enumerated all “lines” the game could progress down. • * It turns out in Nim there is a special trick that can tell you immediately whether you have a win, but that is not true for most strategy games.
What did we just find out? • Minimax gives us a mechanism to play perfectly. If we can build up the entire game tree, all we need to do is follow the procedure we just did to get the optimal move. • Why can’t we do this for Chess or Othello (or any reasonably complex game)?
Othello • There are O(364) possible positions in Othello (Note this is probably way too high). Each square can have a black square on it, a white square on it, or be empty, and there are 64 squares. • We cannot build this game tree in full, it is just too big. We need a way to approximate the bottom part of it, so we don’t have to build the whole thing.
Heuristics • A heuristic is an approximation that is typically fast and used to aid in optimization problems. • In this context, heuristics are used to “rate” board positions based on local information. • For example, in Chess I can “rate” a position by examining who has more pieces. The difference in black’s and white’s pieces would be the score of the position.
Heuristics and Minimax • We want a strategy that will let us cut off the game tree at a certain maximum ply. • At the bottom nodes of the tree, we apply the heuristic function to those positions. • Now instead of just Win, Loss, Tie, we have a score. • For a level of the tree that is Us, we want the move that yields the position with the highest score. A Them level entails that we want the child with the lowest score.
Heuristics and Minimax • When dealing with game trees, the heuristic function is generally referred to as the evaluation function, or the static evaluator. • The static evaluation takes in a board position, and gives it a score. • The higher the score, the better it is for you, the lower, the better for the opponent.
Implementing Minimax • The most important thing to note with Minimax is that while we can visualize the process as building a tree, when we implement the algorithm in code, we never actually build an explicit tree. The “tree” in the implementation lives on the call stack as a result of “tree like” recursive calls. This can be difficult to conceptualize at first, but think about it for a little bit and it should make some sense.
Pseudo Code int Minimax(Board b, boolean myTurn, int depth) { if (depth==0) return b.Evaluate(); // Heuristic for(each possible move i) value[i] = Minimax(b.move(i), !myTurn, depth-1); if (myTurn) return array_max(value); else return array_min(value); } It is clear from this code that we don’t use an explicit tree structure. However, the pattern of recursive calls forms a tree on the call stack.
Real Minimax Example Max 10 Min 10 -5 Max 10 16 7 -5 Min 10 2 12 16 2 7 -5 -80 Evaluation function applied to the leaves!
How fast? • Minimax right now is pretty slow even for a modest depth. • It is basically a brute force search. • What is the running time? • Each level of the tree has some average b moves per level. We have d levels. So the running time is O(bd).
Can we speed this up? • Let us observe the following game tree. Max 2 Min 2 1 Max 1 2 7 What do we know about the root? What do we know about the root’s right child?
Pruning • It is clear from this little example that in Minimax we sometimes do extra work. We evaluate nodes whose value has no impact on the rest of the search. • There is a way we can “prune” sections of the game tree off if we know that they are irrelevant to the outcome.
Alpha Beta Pruning • Idea: Track “window” of expectations. • Use two variables • – Best score so far at a max node: increases • At a child min node: • Parent wants max. To affect the parent’s current , our cannot drop below . • If ever gets less: • Stop searching further subtrees of that child. They do not matter! • – Best score so far at a min node: decreases • At a child max node. • Parent wants min. To affect the parent’s current , our cannot get above the parent’s. • If gets bigger than : • Stop searching further subtrees of that child. They do not matter! • Start the process with an infinite window ( = -, =).
Pseudo Code int AlphaBeta(Board b, boolean myTurn, int depth, int alpha, int beta) { if (depth==0) return b.Evaluate(); // Heuristic if (myTurn) { for(each possible move i && alpha < beta) alpha = max(alpha,AlphaBeta(b.move(i), !myTurn, depth-1,alpha,beta)); return alpha; } else { for(each possible move i && alpha < beta) beta = min(beta,AlphaBeta(b.move(i), !myTurn, depth-1,alpha,beta)); return beta; } }
Alpha Beta Example Max > ! Min =10 10 = 12 Max 10 12 Min 10 2 12
Alpha Beta Example > ! = 10 Max 10 Min 10 =7 7 Max 10 12 7 Min 10 2 12 2 7
Alpha Beta Pruning • Does Alpha Beta ever return a different root value than Minimax? • No! Alpha Beta does the same thing Minimax does, except it is able to detect parts of the tree that make no difference. Because it can detect this it doesn’t evaluate them. • What is the speedup? • The optimal Alpha Beta search tree is O(bd/2) nodes or the square root of the number of nodes in the regular Minimax tree. • The speedup is greatly dependent on the order in which you consider moves at each node. Why?
Transposition Tables • Another way to speed up basic Minimax is the use of memoization to build transposition tables. • We construct a table (hash table) that stores a board position and relevant information about that position. • Value for the node • Upper Bound, Lower Bound, or Exact Value • Best move at the position • Useful for move ordering! • If we encounter a position in our table during the search that already exists in the table, we can take advantage of that information at the current node. • Note that we cannot always just return the score from transposition table hits. Sometimes we may just have a bound in the table (if we had beta cutoff on that node). Must be extremely careful when using these tables with Alpha Beta! • The most useful thing you get from this is the best move, which aids greatly in ordering.
Other Optimizations • History and Killer Heuristics: Track which moves have tendency to cause beta cutoff. Order these moves at the beginning of the move list, since they have shone they are capable of being good. • Null Move: Skip your turn at a node (referred to as making a “null move”) and do a reduced depth search on the resulting position. If you get a good score back, you can prune the node (This works extremely well in Chess, although you will have to be very careful if you decide to use this in Othello programs). • Aspiration Window: Instead of starting with an infinite window, start instead with an “aspiration window” to get more cutoff. • Fast Performance: Often times the best game playing programs are the ones that search the deepest (hence all good programs being called “Deep”). In order to search deep, your program will have to be as efficient as possible. This includes optimizing your move generation and evaluation routines among others. • Many references online (mostly dealing with Chess) to more advanced optimizations. • E.g. http://www.seanet.com/~brucemo/topics/topics.htm
Iterative Deepening • In real life, often players have time limits on the amount of time they get per move. They need to produce a move within say 2 minutes. How do we get Minimax to search the maximum amount of depth within the time limit? • Start out with a 1-ply search and get a move. This should go very fast (well within time limit). We put the move we get back into a “best move area”. Now we do a 2-ply search and replace the current best move with the new move just calculated. We continue to increase the depth of the search tree until something forcibly shuts us down (on the time limit). However when we get killed, we have a move ready in the best move area from our earlier lower ply searches. So because we just keep increasing our depth, we don’t have to hardcode a max depth, the max depth adjusts to our time limit! • Ordering and Tables
Initial Questions • How do we get a program to play games well? • Could I be unstoppable with a computer with massive computation power? • How does Kasparov stay with programs like Deep Junior?
So how does Kasparov win? • Even the best Chess grandmasters say they only look 4 or 5 moves ahead each turn. Deep Junior looks up about 18-25 moves ahead. How does it lose!? • Kasparov has an unbelievable evaluation function. He is able to assess strategic advantages much better than programs can (although this is getting less true). • The moral, the evaluation function plays a large role in how well your program can play.
Summary • The Minimax algorithm provides a way to build and analyze a game tree. • Often times it is impossible to build the entire game tree, so we need a heuristic approximation. • Alpha Beta and other optimizations provide techniques for greatly enhancing the performance of game playing programs. • Good Luck in the Tournament!