G ames with Chance

Games with Chance 2012/04/25

Nondeterministic Games: Backgammon White moves clockwise toward 25. Black moves counterclockwise toward 0. A piece can move to any position unless there are multiple opponent pieces there; if there is one opponent, it is captured and must start over. White has rolled 6-5 and must choose among four legal moves: (5-10, 5-11), (5-11, 19-24), (5-10, 10-16), (5-11, 11-16)

Nondeterministic Games: Backgammon(cont.-1) • Backgammon • White can decide what his legal moves are, but cannot determine black’s because that depends on what black rolls • Game must include chance nodes • How to pick best move? • Cannot apply minimax directly

Nondeterministic Games: Backgammon(cont.-1) Chance nodes are included in the game tree

Nondeterministic Games in General In nondeterministic games, chance introduced by dice, card-shuffling, face-down shuffling

Algorithm for Nondeterministic Games • Expectiminimax gives perfect play • Expectiminimax(n) = Utility(n) if nTerminal maxsSuccessors(n)Expectiminimax(s) if nMAX minsSuccessors(n)Expectiminimax(s) if nMIN sSuccessors(n)P(s)•Expectiminimax(s) if nChance • Successor function for a chance node n augments the state of n with each possible dice roll to produce successor s and P(s)

[-∞,+∞] [1.5,1.5] [-∞,2] [-∞,+∞] [-∞,+∞] [-∞,+∞] [1.5,1.5] [1.5,1.5] [1.5,1.5] [-∞,+∞] [-∞,+∞] [-∞,+∞] [-∞,+∞] [-∞,0.5] [-∞,+∞] [-∞,+∞] [2,2] [2,2] [2,2] [1,1] [-∞,+∞] [-∞,2] [-∞,+∞] [-∞,+∞] [-∞,+∞] [-∞,+∞] [-∞,+∞] [-∞,+∞] [-∞,+∞] [2,2] [2,2] [2,2] [-∞,+∞] [1,1] [1,1] [1,1] [0,0] [-∞,+∞] [0,0] [-∞,0] [-∞,+∞] [-∞,1] [-∞,+∞] [-∞,+∞] [-∞,2] [-∞,+∞] [-∞,+∞] [-∞,+∞] 2 2 2 2 2 1 2 2 2 2 2 2 1 0 1 2 2 2 1 0 1 1 2 2 2 1 0 2 Pruning in Nondeterministic Game Trees • A version of - pruning is possible

[1.5,1.5] [0,2] [-2,2] [0,2] [1.5,1.5] [-2,2] [-2,2] [-2,2] [-2,1] [-2,2] [-2,2] [-2,2] [2,2] [2,2] [2,2] [2,2] [-2,2] [-2,2] [1,1] [-2,2] [-2,2] [1,1] [-2,2] [-2,2] [-2,2] [-2,2] [-2,2] [-2,2] [-2,2] [-2,0] [-2,2] [-2,2] [-2,2] [-2,2] [-2,2] [-2,2] 2 2 2 2 2 2 1 0 2 2 2 2 2 2 1 Pruning Contd. • More pruning occurs if we can bound the leaf values

Move to A1 is best Move to A2 is best Digression: Exact Value DO Matter • Behavior is preserved only by positive linear transformation of EvalHence Eval should be proportional to the expected payoff

Nondeterministic games in practice

Games of Imperfect Information • e.g., card game, where opponent’s initial cards are unknown • Typically we can calculate a probability for each possible deal • Seems just like having a big dice roll at the beginning of games • Idea: averaging over clairvoyancy • compute the minimax value of each action for each possible deal of the cards • choose the action with the highest expected value over all deals • Special case: if an action is optimal for all deals, it is optimal • GIB (Ginsberg, 1999), current best bridge program, approximate this idea by modifying averaging over clairvoyancy • generating 100 deals consistent with bidding information • picking the action that wins most tricks on average

Example • Four-card bridge/whist/hearts hand, Max to play first

Proper analysis

function ALPHA-BETA-SEARCH(state) returns an action inputs: state, current state in game v MAX-VALUE(state, –, ) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state, ,  ) returns a utility value inputs: state, current state in game , the value of the best alternative for MAX along the path to state , the value of the best alternative for MIN along the path to state if TERMINAL-TEST(state) then return UTILITY(state) v  – fora, s in SUCCESSORS(state) do v  MAX(v, MIN-VALUE(s, ,  )) ifvthen returnv// fail-high   MAX(, v) returnv - Algorithm

- Algorithm (cont.) function MIN-VALUE(state, ,  ) returns a utility value inputs: state, current state in game , the value of the best alternative for MAX along the path to state , the value of the best alternative for MIN along the path to state if TERMINAL-TEST(state) then return UTILITY(state) v   fora, s in SUCCESSORS(state) do v  MIN(v, MAX-VALUE(s, ,  )) ifvthen returnv // fail low   MIN(, v) returnv

Negamax(B. Chen, 2010)

G ames with Chance