120 likes | 140 Views
Understand the Min-Max algorithm for adversarial games like Tic-Tac-Toe, optimize for best moves, and learn about enhancements like Alpha-Beta pruning and ProbCut. Explore iterative deepening, transposition tables, and other strategies to improve decision-making in games.
E N D
ADVERSARIAL GAME SEARCH:Min-Max Search X O O X X O • Between 2 adversaries • Input: A board position • Output: Best move (actually, the score for moving there) • Objective function: • board position evaluated to a number (may be +-1, 0) • higher the number better the move: maximization • Task: To compute optimum value of objective function out of all possible legal moves • i.e. Optimize for the highest board value Computer X Evaluated best move value = 1 (win)
Adversarial Games: Min-Max algorithm Computer move X X X O O O O 9 options X X X X O O Adversary move 8 options X O X O Leaf: return +1 Leaf: return -1 • Search tree is going up to leaves (terminal board) to evaluate board • May not be practically feasible for a given computation time • 9! total worst case • Most algorithms will search for a pre-assigned depth or “ply” & evaluate board • Alternate move by Adversary/human & Algorithm/computer: • Minimizes the objective function for adversary move • Maximizes the objective function for self-move or computer move • the same algorithm alternately calls maximizer and minimizer
Maximizing part of the Min-Max algorithm: Input: A board position; Output: Best next move, with max evaluation value Function findCompMove( ) if( fullBoard( ) ) value = DRAW; else if( ( quickWinInfo = immediateCompWin( ) ) != null ) return quickWinInfo; // if the next move ends the game; recursion termination else value = COMP_LOSS; // initialize with lowest value, max-problem for( i = 1; i ≤ 9; i++ ) // try each square in tic-tac-toe if ( isEmpty( i) ) place( i, COMP ); // temporary placement on board, // as global variable responseValue = findHumanMove( ).value; unplace( i ); // Restore board: alg does not actually move if( responseValue > value ) // Update best move value = responseValue; bestMove = i; return new Movelnfo( bestMove, value );
Minimizing part of the Min-Max algorithm: Input: A board position; Output: Best opponent move, with min evaluation value Function findHumanMove( ) if( fullBoard( ) ) value = DRAW; else if( ( quickWinInfo = immediateHumanWin( ) ) != null ) return quickWinInfo; // if the next move ends the game; recursion termination else value = COMP_WIN; // initialize with lowest value for( i = 1; i ≤ 9; i++ ) // Try each square in tic-tac-toe if ( isEmpty( i) ) place( i, HUMAN ); // temporary placement on board, // as global variable responseValue = findCompMove( ).value; unplace( i ); // Restore board: alg does not actually move if( responseValue< value ) // Update best move value = responseValue; bestMove = i; return new Movelnfo( bestMove, value ); Driver call?
Alpha-beta pruning [- ] Max a=44 Max: get me >44 Min But will not return >40, so Do not call other children Must get >44 44 60>40, so call pruned 60 40 All branches pruned
Min-max algorithm with Alpha-beta pruning Alpha pruning Must be true: Max-node’s value ≥ min-node’s value Otherwise, useless to expand tree: Prun the branch Beta pruning From Weiss’ text
Min-max algorithm Variable Ply on Min-Max: When to stop a search tree path? Quiscent: Stable boards, returned values are close to each other, no higher ply needed Horizon effect: Maybe just below this there is a major event! Avoiding horizon effect strategies exist, e.g. remember from past search or games Delay tactic: stay conservative, don’t take risk, push the “horizon” to see what develops but, may actually just delay an eventual loss Iterative deepening: use (n-1)-th ply’s best move values toward alpha-beta in n-th ply Transposition table: multiple moves get to same board, hash table to avoid such moves From Weiss’ text
Min-max may lead to “wrong” path:very conservative search:presumes opponent is exactly as rational Text Figure 5.14 Max (min(99, 1000, 1000, 1000), min(100, 101, 102, 100) ) = Max (99, 100) = 100, is not the best, 1000’s on the other branch was But, that is the point, opponent will not let ‘you’ go there! However, the risk may be worth taking => utility driven search, not just on board evaluation ProbCut: Probabilistic alpha-beta pruning, - weight heuristic function from past experience – machine (machine-learning) or human (knowledge-based) - alpha-beta are on probability distribution, - pruning of probably useless branches rather than provably useless From Weiss’ text
Forward Pruning:beam search K-best nodes are explored simultaneously/iteratively Not pure depth first, rather k-best first, but the optimum may be beyond k-best at a particular ply From Weiss’ text
Lookup Table Database / Knowledge base of past games Specially on start- or end-game in chess Knowledge-base: compressed information as strategy, if-then-else From Weiss’ text
Dice Games: Backgamon Max-probability-Min-Probability-… Max-Min part “Or” branches, Dice throwing part “And” branches: Must consider all possibilities, but probabilistic weight may be added from past knowledge From Weiss’ text
Partially Observable Games: Card games Each node is really a subset of nodes belief on what “may be” the current nodes are search should filter possibilities: player plays a probing hand Utility function, as in “game therory” in economics – heuristic function typically, embeds probability measures as well From Weiss’ text