480 likes | 643 Views
Course: Engineering Artificial Intelligence. Dr. Radu Marinescu. Lecture 5. Today’s class. Game-playing. Overview. Computer programs which play 2-player games game-playing as search with the complication of an opponent General principles of game-playing and search evaluation functions
E N D
Course:Engineering Artificial Intelligence Dr. Radu Marinescu Lecture 5
Today’s class • Game-playing
Overview • Computer programs which play 2-player games • game-playing as search • with the complication of an opponent • General principles of game-playing and search • evaluation functions • mini-max principle • alpha-beta-pruning • heuristic techniques • Status of Game-Playing Systems • in chess, checkers, backgammon, Othello, etc, computers routinely defeat leading world players • Applications? • think of “nature” as an opponent • economics, war-gaming, medical drug treatment
History 1949 – Shannon paper 1951 – Turing paper 1958 – Bernstein program 55-60 – Simon-Newell program (α – β McCarthy?) 1961 – Soviet program 66-67 – MacHack 6 (MIT AI Lab) 70’s – NW Chess 4.5 80’s – Crazy Blitz 90’s – Belle, Hitech, Deep Thought, Deep Blue
Solving 2-player games • Two players, perfect information • Examples: • e.g., chess, checkers, tic-tac-toe • Configuration of the board = unique arrangement of “pieces” • Games as a Search Problem • States = board configurations • Operators = legal moves • Initial State = current configuration • Goal = winning configuration • payoff function = gives numerical value of outcome of the game
Game tree search • Game tree: encodes all possible games • We are not looking for a path, only the next move to make (that hopefully leads to a winning position) • Our best move depends on what the other player does
Example: partial game tree for Tic-Tac-Toe MAX (O) MIN (X) MAX (O) MIN (X)
Scoring function We will use the same scoring function for both players, simply negating the values to represent the opponent's scores.
Example: scoring function for Tic-Tac-Toe Leaves are either win (+1), loss (-1) or draw (0)
Min-Max: an optimal procedure • Designed to find the optimal strategy for MAX player and find best move: • 1. Generate the whole game tree to leaves • 2. Apply scoring (payoff) function to leaves • 3. Back-up values from leaves toward the root: • a MAX node computes the max of its child values • a MIN node computes the min of its child values • 4. When value reaches the root: choose max value and the corresponding move.
Properties of Min-Max • Complete? Yes (if tree is finite) • Optimal? Yes (against an optimal opponent) • Time complexity? O(bd) • Space complexity? O(bd) (depth-first exploration) • For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible • Chess: • b ~ 35 (average branching factor) • d ~ 100 (depth of game tree for typical game) • bd ~ 35100 ~10154 nodes!! • Tic-Tac-Toe • ~5 legal moves, total of 9 moves • 59 = 1,953,125 • 9! = 362,880 (Computer goes first) • 8! = 40,320 (Computer goes second)
Min-Max • Naïve Min-Max assumes that it is possible to search the full tree, where the scoring function of the leaves is either win (+1), loss (-1) or draw (0) • However: For most games, it is impossible to develop the whole search tree • Instead develop part of the tree (up to a certain depth or a number of plys) and evaluate promise of leaves using a static evaluation function.
Static evaluation function • The static evaluation function: • Estimates how good the current board configuration is for a player. • Typically, one figures how good it is for the player, and how good it is for the opponent, and subtracts the opponents score from the players • Othello: Number of white pieces - Number of black pieces • Chess: Value of all white pieces - Value of all black pieces • Typical values from -infinity (loss) to +infinity (win) or [-1, +1]. • If the board evaluation is X for a player, it’s -X for the opponent • Example: • Evaluating chess boards, Checkers, Tic-tac-toe
Evaluation functions: chess For Chess, typically a linear weighted sum of features weight of the piece (Q = 9, P=1, K=3, B=3.5, R=5) e.g.,
Evaluation functions: Tic-Tac-Toe Another example E(n) =
Backup values Two-ply minimax applied to the opening move of tic-tac-toe
Backup values Two-ply minimax applied to the X’s second move
Backup values Two-ply minimax applied to X’s move near end game
Deep Blue • 32 SP2 processors • Each with 8 dedicated chess processors = 256 CP • 50 – 100 billion moves in 3 min • 13-30 ply search
α – β pruning • In Min-Max there is a separation between node generation and evaluation. Backup Values
α – β pruning • Idea: • Do depth-first search to generate partial game tree • Give static evaluation function to leaves • Compute bound on internal nodes • Alpha (α), Beta (β) bounds: • α value for MAX node means that max real value is at least alpha • β for MIN node means that MIN can guarantee a value below beta • Computation: • Alpha of a MAX node is the maximum value of its seen children • Beta of a MIN node is the lowest value seen of its child node
When to prune • Pruning • Below a MIN node whose beta value is lower than or equal to the alpha value of its ancestors. • Below a MAX node having an alpha value greater than or equal to the beta value of any of its MIN nodes ancestors.
α – β properties • Guaranteed same value as Min-Max • In a perfectly ordered tree, expected work is O(bd/2), vs O(bd) for Min-Max, so can search twice as deep with the same effort • With good move orderings, the actual running time is close to the optimistic estimate
Game program • Move generator (ordered moves) 50% • Static evaluation 40% • Search control 10% Openings End games databases [all in place by late 60’s]
Move generator • Legal moves • Ordered by • Most valuable victim • Least valuable aggressor • Killer heuristics
Static evaluation • Initially - very complex • 70’s - very simple (material) • Now - Deep searchers: moderately complex (hardware) PC programs: elaborate, hand tuned
Other games • Backgammon • Involves randomness – dice rolls • Machine-learning based player was able to draw the world champion human player • Bridge • Involves hidden information – other players’ cards – and communication during bidding • Computer players play well but do not bid well • Go • No new elements but huge branching factor • No good computer players exist
Observations • Computers excel in well-defined activities where rules are clear • Chess • Mathematics • Success comes after a long period of gradual refinement
Summary • Game playing is best modeled as a search problem • Game trees represent alternate computer/opponent moves • Evaluation functions estimate the quality of a given board configuration for the MAX player. • Min-Max is a procedure which chooses moves by assuming that the opponent will always choose the move which is best for them • Alpha-Beta is a procedure which can prune large parts of the search tree and allow search to go deeper • For many well-known games, computer algorithms based on heuristic search match or out-perform human world experts • Reading: R&N Chapter 6