Adversarial Search

Adversarial Search CMPT 420 / CMPG 720

Outline • Game playing • Game trees • Minimax • Alpha-beta pruning

Games vs. search problems • competitive environments: agents’ goals are in conflict • adversarial search problems (games)

Types of Games deterministic chance perfect information imperfect information

Games • deterministic, fully-observable, turn-taking, two–player, zero-sum games • Utility values at the end are equal and opposite • Tic-tac-toe

Game Search Formulation • Two players MAX and MIN take turns (with MAX playing first) • S0: • Player(s): • Action(s): • Result(s,a): • Terminal-test(s): • Utility(s,p):

Game Search Formulation • S0: initial state • Player(s): • Action(s): • Result(s,a): • Terminal-test(s): • Utility(s,p):

Game Search Formulation • S0: initial state • Player(s): which player has the move in a state • Action(s): • Result(s,a): • Terminal-test(s): • Utility(s,p):

Game Search Formulation • S0: initial state • Player(s): which player has the move in a state • Action(s): set of legal moves in a state • Result(s,a): • Terminal-test(s): • Utility(s,p):

Game Search Formulation • S0: initial state • Player(s): which player has the move in a state • Action(s): set of legal moves in a state • Result(s,a): transition model • Terminal-test(s): • Utility(s,p):

Game Search Formulation • S0: initial state • Player(s): which player has the move in a state • Action(s): set of legal moves in a state • Result(s,a): transition model • Terminal-test(s): true/false (terminal states) • Utility(s,p):

Game Search Formulation • S0: initial state • Player(s): which player has the move in a state • Actions(s): set of legal moves in a state • Result(s,a): transition model • Terminal-test(s): true/false (terminal states) • Utility(s,p): utility function defines the final value of a game that ends in terminal state s for a player p • zero-sum games: same total payoff

Game tree (1-player)

Partial Game Tree for Tic-Tac-Toe

Optimal strategies • MAX uses search tree to determine next move. • Assumption: Both players play optimally!! • Given a game tree, the optimal strategy can be determined by using the minimaxvalue of each node

Minimax • The minimax value of a node is the utility (for Max) of being in the corresponding state, assuming that both players play optimally. • Minimax(s) = • if Terminal-test(s) • if Player(s) = Max • if Player(s) = Min

Minimax • The minimax value of a node is the utility (for Max) of being in the corresponding state, assuming that both players play optimally. • Minimax(s) = • Utility (s) if Terminal-test(s) • max of Minimax(Result(s,a)) if Player(s) = Max • min of Minimax(Result(s,a)) if Player(s) = Min

Optimal Play 2 2 1 2 7 1 2 7 1 8 8 2 2 1 2 7 1 8 2 7 1 8 2 7 1 8 This is the optimal play MAX MIN

Two-Ply Game Tree

Two-Ply Game Tree The minimax decision Minimax maximizes the worst-case outcome for max.

What if MIN does not play optimally? • Definition of optimal play for MAX assumes MIN plays optimally: maximizes worst-case outcome for MAX. • But if MIN does not play optimally, MAX can do even better.

Minimax Algorithm function MINIMAX-DECISION(state) returns an action inputs: state, current state in game vMAX-VALUE(state) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  -∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s)) return v function MIN-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s)) return v

Properties of minimax • Complete? • Yes (if tree is finite) • Optimal? • Yes (against an optimal opponent) • Time complexity? • O(bm) • Space complexity? • O(bm) (depth-first exploration) • For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution is infeasible

Alpha-Beta Pruning • Problem with minimax search: exponential in the depth of the tree • Can we cut it in half? • It is possible to compute the minimax decision without looking at every node. • pruning: eliminate some parts of the tree

Alpha-beta pruning • We can improve on the performance of the minimax algorithm through alpha-beta pruning MAX MIN MAX 2 7 1 ?

Alpha-beta pruning • We can improve on the performance of the minimax algorithm through alpha-beta pruning MAX • We don’t need to compute the value at this node. • No matter what it is, it can’t affect the value of the root node. MIN MAX 2 7 1 ?

Alpha-Beta Example Do DFS until the first leaf Range of possible values [-∞,+∞] [-∞, +∞]

Alpha-Beta Example Do DFS until first leaf Range of possible values [-∞,+∞] [-∞, +∞]

Alpha-Beta Example (continued) [-∞,+∞] [-∞,3]

Alpha-Beta Example (continued) [-∞,+∞] [3,3]

Alpha-Beta Example (continued) [3,+∞] [3,3]

Alpha-Beta Example (continued) [3,+∞] [3,3] [-∞, ∞]

Alpha-Beta Example (continued) [3,+∞] [3,3] [-∞,2]

Alpha-Beta Example (continued) [3,+∞] This node is worse for MAX [3,3] [-∞,2]

Alpha-Beta Example (continued) , [3,14] [3,3] [-∞,2] [-∞, ∞]

Alpha-Beta Example (continued) , [3,14] [3,3] [-∞,2] [-∞,14]

Alpha-Beta Example (continued) , [3,5] [3,3] [−∞,2] [-∞,5]

Alpha-Beta Example (continued) [2,2] [3,3] [−∞,2]

Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [-∞,2]

α-β pruning example • Minimax(root) • = max(min(3,12,8),min(2,x,y),min(14,5,2)) • = max(3,min(2,x,y),2) • = 3

α-β pruning • We made the same minimax decision without ever evaluating two of the leaf nodes! • They are independent. • It is possible to prune entire subtrees.

α = value of the best choice found so far at any choice point along the path for max If v is worse than α, max will avoid it prune that branch Define β similarly for min Why is it called α-β?

Alpha-Beta Algorithm function ALPHA-BETA-SEARCH(state) returns an action inputs: state, current state in game vMAX-VALUE(state, - ∞ , +∞) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  - ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s,  , )) ifv ≥ then returnv  MAX( ,v) return v

Alpha-Beta Algorithm function MIN-VALUE(state,  , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  + ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s,  , )) ifv ≤ then returnv  MIN( ,v) return v

Comments: Alpha-Beta Pruning • Pruning does not affect the final results. • Entire subtrees can be pruned. • Good move ordering improves effectiveness of pruning. • With “perfect ordering,” time complexity is O(bm/2) • Alpha-beta pruning can look twice as far as minimax in the same amount of time

Adversarial Search