470 likes | 615 Views
Notes adapted from lecture notes for CMSC 421 by B.J. Dorr. game playing. Outline. What are games? Optimal decisions in games Which strategy leads to success? ï¡-ï¢ pruning Games of imperfect information Games that include an element of chance. What are and why study games?.
E N D
Notes adapted from lecture notes for CMSC 421 by B.J. Dorr game playing
Outline • What are games? • Optimal decisions in games • Which strategy leads to success? • - pruning • Games of imperfect information • Games that include an element of chance
What are and why study games? • Games are a form of multi-agent environment • What do other agents do and how do they affect our success? • Cooperative vs. competitive multi-agent environments. • Competitive multi-agent environments give rise to adversarial problems a.k.a. games • Why study games? • Fun; historically entertaining • Interesting subject of study because they are hard • Easy to represent and agents restricted to small number of actions
Relation of Games to Search • Search – no adversary • Solution is (heuristic) method for finding goal • Heuristics and CSP techniques can find optimal solution • Evaluation function: estimate of cost from start to goal through given node • Examples: path planning, scheduling activities • Games – adversary • Solution is strategy (strategy specifies move for every possible opponent reply). • Time limits force an approximate solution • Evaluation function: evaluate “goodness” of game position • Examples: chess, checkers, Othello, backgammon
Game setup • Two players: MAX and MIN • MAX moves first and they take turns until the game is over. Winner gets award, looser gets penalty. • Games as search: • Initial state: e.g. board configuration of chess • Successor function: list of (move,state) pairs specifying legal moves. • Terminal test: Is the game finished? • Utility function: Gives numerical value of terminal states. E.g. win (+1), lose (-1) and draw (0) in tic-tac-toe (next) • MAX uses search tree to determine next move.
Optimal strategies • Find the contingent strategy for MAX assuming an infallible MIN opponent. • Assumption: Both players play optimally !! • Given a game tree, the optimal strategy can be determined by using the minimax value of each node: MINIMAX-VALUE(n)= UTILITY(n) If n is a terminal maxs successors(n) MINIMAX-VALUE(s) If n is a max node mins successors(n) MINIMAX-VALUE(s) If n is a min node
Two-Ply Game Tree • MAX nodes • MIN nodes • Terminal nodes – utility values for MAX • Other nodes – minimax values • MAX’s best move at root – a1 - leads to the successor with the highest minimax value • MIN’s best to reply – b1 – leads to the successor with the lowest minimax value
What if MIN does not play optimally? • Definition of optimal play for MAX assumes MIN plays optimally: maximizes worst-case outcome for MAX. • But if MIN does not play optimally, MAX will do even better. [Can be proved.]
Minimax Algorithm function MINIMAX-DECISION(state) returns an action inputs: state, current state in game vMAX-VALUE(state) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s)) return v function MIN-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s)) return v
Properties of Minimax
Tic-Tac-Toe • Depth bound 2 • Breadth first search until all nodes at level 2 are generated • Apply evaluation function to positions at these nodes
Tic-Tac-Toe • Evaluation function e(p) of a position p • If p is not a winning position for either player • e(p) = (number of a complete rows, columns, or diagonals that are still open for MAX) – (number of a complete rows, columns, or diagonals that are still open for MIN) • If p is a win for MAX • e(p) = • If p is a win for MIN • e(p) = -
1 -1 -2 1 Tic-Tac-Toe - First stage o x 6-4=2 5-4=1 x o 4-6=-2 x o x x MAX’s move o 6-6=0 x 4-5=-1 o x o 5-6=-1 5-5=0 x x o o 6-5=1 5-5=0 o x x o 5-5=0 x x 5-6=-1 o x 6-5=1 x o
o x x 0 1 1 0 1 Tic-Tac-Toe - Second stage o MAX’s move o x 3-2=1 o x x x o x o o o x 4-2=2 4-3=1 o x x o x x x o o o o x 3-2=1 o o x x 4-3=1 4-2=2 x x x 5-2=3 o o o x o x o x 3-3=0 o x 4-2=2 x x o x x 3-2=1 o x o o o x 5-3=2 x o o x 3-2=1 x x o x 4-2=2 x o o o x o 3-3=0 5-2=3 3-3=0 x x o x x o x o o x 4-3=1 o x o x x 4-3=1 4-2=2 o x o x o o x 4-3=1 4-2=2 o x x o x o
Multiplayer games • Games allow more than two players • Single minimax values become vectors
Problem of minimax search • Number of games states is exponential to the number of moves. • Solution: Do not examine every node • ==> Alpha-beta pruning • Alpha = value of best choice found so far at any choice point along the MAX path • Beta = value of best choice found so far at any choice point along the MIN path • Revisit example …
-1 Tic-Tac-Toe - First stage Beta value = -1 B x Alpha value = -1 4-5=-1 o x 5-5=0 x o o 6-5=1 x o 5-5=0 x x A 5-6=-1 C o x 6-5=1 x o
Alpha-beta pruning • Search progresses in a depth first manner • Whenever a tip node is generated, it’s static evaluation is computed • Whenever a position can be given a backed up value, this value is computed • Node A and all its successors have been generated • Backed up value -1 • Node B and its successors have not yet been generated • Now we know that the backed up value of the start node is bounded from below by -1
Alpha-beta pruning • Depending on the back up values of the other successors of the start node, the final backed up value of the start node may be greater than -1, but it cannot be less • This lower bound – alpha value for the start nodea
Alpha-beta pruning • Depth first search proceeds – node B and its first successor node C are generated • Node C is given a static value of -1 • Backed up value of node B is bounded from above by -1 • This Upper bound on node B – beta value. • Therefore discontinue search below node B. • Node B. will not turn out to be preferable to node A
Alpha-beta pruning • Reduction in search effort achieved by keeping track of bounds on backed up values • As successors of a node are given backed up values, the bounds on backed up values can be revised • Alpha values of MAX nodes that can never decrease • Beta values of MIN nodes can never increase
Alpha-beta pruning • Therefore search can be discontinued • Below any MIN node having a beta value less than or equal to the alpha value of any of its MAX node ancestors • The final backed up value of this MIN node can then be set to its beta value • Below any MAX node having an alpha value greater than or equal to the beta value of any of its MINa node ancestors • The final backed up value of this MAX node can then be set to its alpha value
Alpha-beta pruning • during search, alpha and beta values are computed as follows: • The alpha value of a MAX node is set equal to the current largest final backed up value of its successors • The beta value of a MINof the node is set equal to the current smallest final backed up value of its successors
Alpha-Beta Example Do DF-search until first leaf Range of possible values [-∞,+∞] [-∞, +∞]
Alpha-Beta Example (continued) [-∞,+∞] [-∞,3]
Alpha-Beta Example (continued) [-∞,+∞] [-∞,3]
Alpha-Beta Example (continued) [3,+∞] [3,3]
Alpha-Beta Example (continued) [3,+∞] This node is worse for MAX [3,3] [-∞,2]
Alpha-Beta Example (continued) , [3,14] [3,3] [-∞,2] [-∞,14]
Alpha-Beta Example (continued) , [3,5] [3,3] [−∞,2] [-∞,5]
Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [−∞,2]
Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [-∞,2]
Alpha-Beta Algorithm function ALPHA-BETA-SEARCH(state) returns an action inputs: state, current state in game vMAX-VALUE(state, - ∞ , +∞) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v - ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s, , )) ifv ≥ then returnv MAX( ,v) return v
Alpha-Beta Algorithm function MIN-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v + ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s, , )) ifv ≤ then returnv MIN( ,v) return v
General alpha-beta pruning • Consider a node n somewhere in the tree • If player has a better choice at • Parent node of n • Or any choice point further up • n will never be reached in actual play. • Hence when enough is known about n, it can be pruned.
Final Comments about Alpha-Beta Pruning • Pruning does not affect final results • Entire subtrees can be pruned. • Good move ordering improves effectiveness of pruning • Alpha-beta pruning can look twice as far as minimax in the same amount of time • Repeated states are again possible. • Store them in memory = transposition table
Games that include chance • Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)
Games that include chance chance nodes • Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16) • [1,1], [6,6] chance 1/36, all other chance 1/18
Games that include chance • [1,1], [6,6] chance 1/36, all other chance 1/18 • Can not calculate definite minimax value, only expected value
Expected minimax value EXPECTED-MINIMAX-VALUE(n)= UTILITY(n) If n is a terminal maxs successors(n) MINIMAX-VALUE(s) If n is a max node mins successors(n) MINIMAX-VALUE(s) If n is a max node s successors(n) P(s) . EXPECTEDMINIMAX(s) If n is a chance node These equations can be backed-up recursively all the way to the root of the game tree.
Position evaluation with chance nodes • Left, A1 wins • Right A2 wins • Outcome of evaluation function may not change when values are scaled differently. • Behavior is preserved only by a positive linear transformation of EVAL.
Discussion • Examine section on state-of-the-art games yourself • Minimax assumes right tree is better than left, yet … • Return probability distribution over possible values • Yet expensive calculation
Discussion • Utility of node expansion • Only expand those nodes which lead to significanlty better moves • Both suggestions require meta-reasoning
Summary • Games are fun (and dangerous) • They illustrate several important points about AI • Perfection is unattainable -> approximation • Good idea what to think about • Uncertainty constrains the assignment of values to states • Games are to AI as grand prix racing is to automobile design.