Adversarial Search

Adversarial Search(Games) Chapter 6

Outline • Summary of last lectures • Characterizing a Game • Optimal decisions • Why is full exploration of the search space not feasible? • The minimax algorithm • α-β pruning • Imperfect, real-time decisions • Extensions: multi-player, chance

Summary of the past lectures • System engineering process • Analysis, design, implementation • Agent’s Performance measures (non-functional requirements) • Agents types • Simple-reflex, model-based, goal-based, utility agents, learning agents • Environment types • Static/dynamic, deterministic/stochastic, fully/partially observable

Summary of the past lectures • Search (Goal-based agents) • Basic search algorithms and their variants • Uninformed search strategies • Limited information about the environment model • Iterative Deepening search, bidirectional search, avoiding repeated states • Informed search • Improve time and space complexity by having additional information about the environment for search • Heuristic function • Greedy best-first search • A* search, triangular inequality

Games vs. search problems • "Unpredictable" opponent  specifying a move for every possible opponent reply • Time limits  unlikely to find goal, must approximate

2-player zero-sum discrete finite deterministic games of perfect information What does it means? • Two player: :-) • Zero-sum: In any outcome of any game, Player A’s gains equal player B’s losses. • Discrete: All game states and decisions are discrete values. • Finite: Only a finite number of states and decisions. • Deterministic: No chance (no die rolls). • Perfect information: Both players can see the state, and each decision is made sequentially (no simultaneous moves). • Games: See next slide

2-player zero-sum discrete finite deterministic games of perfect information

2-player zero-sum discrete finite deterministic games of perfect information Hidden Information Stochastic Not Finite One Player Involves Animal Behave  Mutiplayer

2-player zero-sum discrete finite deterministic games of perfect information A Two-player zero-sum discrete finite deterministic game of perfect information is a quintuplet: ( S , I , N , T , V ) where:

Game tree (2-player, deterministic, turns)

Minimax Algorithm • Optimal play for deterministic games • Idea: choose move to position with highest minimax value = best achievable payoff against best play • E.g., a simple 2-ply game:

Utility of a situation in a game: • In most two-player games the termination situations have a certain value, mostly +1 for MAX (=win) -1 for MIN (=loose) 0 for a draw. • Also different values possible: e.g., Backgammon (-192 to +192), etc. • We can compute in any situation the minimax-value as follows:

Minimax Algorithm

Properties of minimax • Complete? Yes (if tree is finite) • Optimal? Yes (against an optimal opponent) • Time complexity? O(bm) • Space complexity? O(bm) (depth-first exploration) • Problem: explores the whole search-space For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible • So, how to proceed? b … branching factor m … maximum number of moves

Motivation for α-β pruning • The problem with minimax algorithm search is that the number of game states it has to examine is exponential in the number of moves: • α-β proposes to compute the correct minimax algorithm decision without looking at every node in the game tree.  PRUNING!

α-β pruning example

· 5 · 2 5 2 Pruning possible! α-β pruning example No pruning We see: possibility to prune depends on the order of processing the successors!

Properties of α-β • Pruning does not affect final result • Good move ordering improves effectiveness of pruning • With "perfect ordering," time complexity = O(bm/2) doubles possible depth of search doable in the same time • A simple example of the value of reasoning about which computations are relevant (a form of meta-reasoning)

Why is it called α-β? • α is the value of the best (i.e., highest-value) choice found so far at any choice point along the path for max • If v is worse than α, max will avoid it  prune that branch • Define β similarly for min

The α-β algorithm

Resource limits Suppose we have 100 secs, explore 104 nodes/sec106nodes per move  even with pruning not possible to explore the whole search space e.g. for chess! Standard approach: • cutoff test: e.g., depth limit (perhaps add quiescence search) • evaluation function = estimated desirability of position

Evaluation functions • For chess, typically linear weighted sum of features Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s) • e.g., weight of figures on the board: w1 = 9 with f1(s) = (number of white queens) – (number of black queens), etc. … Other features which could be taken into account: number of threats, good structure of pawns, measure of safety of the king.

Cutting off search MinimaxCutoff is identical to MinimaxValue except • Terminal? is replaced by Cutoff? • Utility is replaced by Eval Does it work in practice? bm = 106, b=35  m=4 4-ply lookahead is a hopeless chess player! • 4-ply ≈ human novice • 8-ply ≈ typical PC, human master • 12-ply ≈ Deep Blue, Kasparov

Deterministic games in practice • Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used a precomputed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions. • Chess: Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply. • Othello: human champions refuse to compete against computers, who are too good. • Go: human champions refuse to compete against computers, who are too bad. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves.

Some extensions • What if more than two players are in the game? 2-player algorithms (minimax, -, cutoff-eval) can be extended to multi-player in a straightforward way: • Instead of 1 value use a vector of values, where each player tries to maximize its own index-value in the vector • 2-player-zero-sum games are a special case of this, where the vector can be combined into one value since the values for both players are exactly opposite • What if an element of chance (i.e. non-determinism) is added? E.g. rolling dice in Backgammon? Expectiminimax  next slide

Minimax with Chance Nodes: Chance nodes have certain probabibilities.

EXPECTIMINIMAX… • Slight variation of MINIMAX: where P(s) is the probability of reaching s (e.g. probability of rolling a certain number with the dice)

Summary: • Games are fun to work on! • They illustrate several important points about AI • perfection is unattainable  must approximate • good idea to think about what to think about: ideas and expertise of masters deployed in evaluation functions (i.e. heuristics) What makes Game theory interesting in practice? • Exogenous events, i.e. non-determinism in planning can be modelled as opponent. • Multi-agent planning: cooperative vs. competitive  Can be modeled as multi-player games

Adversarial Search