810 likes | 934 Views
Game Playing 2. This Lecture. Alpha-beta pruning Games with chance Partially observable games. Nondeterminism. Uncertainty is caused by the actions of another agent (MIN), who competes with our agent (MAX). MAX’s play. MIN’s play. MAX cannot tell what move will be played.
E N D
This Lecture • Alpha-beta pruning • Games with chance • Partially observable games
Nondeterminism • Uncertainty is caused by the actions of another agent (MIN), who competes with our agent (MAX) MAX’s play MIN’s play MAX cannot tell what move will be played
Instead of a single path, the agent must construct an entire plan Nondeterminism • Uncertainty is caused by the actions of another agent (MIN), who competes with our agent (MAX) MAX’s play MIN’s play MAX must decide what to play for BOTH these outcomes
Minimax Backup MAX’s turn +1 0 -1 MIN’s turn +1 MAX’s turn 0 +1 0 -1 0 +1 0
Depth-First Minimax Algorithm • MAX-Value(S) • If Terminal?(S) return Result(S) • Return maxS’SUCC(S) MIN-Value(S’) • MIN-Value(S) • If Terminal?(S) return Result(S) • Return minS’SUCC(S) MAX-Value(S’) • MINIMAX-Decision(S) • Return action leading to state S’SUCC(S) that maximizes MIN-Value(S’)
Real-Time Game Playing with Evaluation Function • e(s): function indicating estimated favorability of a state to MAX • Keep track of depth, and add line: • If(depth(s) = cutoff) return e(s) • After terminal test
3 -1 Pruning This part of the tree can’t have any effect on the value that will be backed up to the root Can we do better? • Yes ! Much better ! 3 -1
2 Example The beta value of a MIN node is an upper bound on the final backed-up value. It can never increase b = 2
b = 1 2 1 Example The beta value of a MIN node is an upper bound on the final backed-up value. It can never increase
b = 1 2 1 Example a = 1 The alpha value of a MAX node is a lower bound on the final backed-up value. It can never decrease
b = -1 b = 1 2 1 -1 Example a = 1
b = -1 b = 1 Search can be discontinued below any MIN node whose beta value is less than or equal to the alpha value of one of its MAX ancestors 2 1 -1 Example a = 1
Alpha-Beta Pruning • Explore the game tree to depth h in depth-first manner • Back up alpha and beta values whenever possible • Prune branches that can’t lead to changing the final decision
Alpha-Beta Algorithm • Update the alpha/beta value of the parent of a node N when the search below N has been completed or discontinued • Discontinue the search below a MAX node N if its alpha value is the beta value of a MIN ancestor of N • Discontinue the search below a MIN node N if its beta value is the alpha value of a MAX ancestor of N
Example MAX MIN MAX MIN MAX MIN 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX MIN MAX MIN MAX MIN 0 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX MIN MAX MIN MAX 0 MIN 0 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX MIN MAX MIN MAX 0 MIN 0 -3 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX MIN MAX MIN MAX 0 MIN 0 -3 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX MIN MAX MIN 0 MAX 0 MIN 0 -3 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX MIN MAX MIN 0 MAX 0 3 MIN 0 -3 3 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX MIN MAX MIN 0 MAX 0 3 MIN 0 -3 3 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX MIN 0 MAX 0 MIN 0 MAX 0 3 MIN 0 -3 3 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX MIN 0 MAX 0 MIN 0 MAX 0 3 MIN 0 -3 3 5 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example 0 0 0 0 3 2 0 -3 3 2 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX MIN 0 MAX 0 MIN 0 MAX 0 3 2 MIN 0 -3 3 2 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX MIN 0 MAX 0 2 MIN 0 2 MAX 0 3 2 MIN 0 -3 3 2 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX MIN 0 MAX 0 2 MIN 0 2 MAX 0 3 2 MIN 0 -3 3 2 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX 0 MIN 0 MAX 0 2 MIN 0 2 MAX 0 3 2 MIN 0 -3 3 2 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX 0 MIN 0 MAX 0 2 MIN 0 2 MAX 0 3 2 MIN 0 -3 3 2 5 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX 0 MIN 0 MAX 0 2 MIN 0 2 MAX 0 3 2 1 MIN 0 -3 3 2 1 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX 0 MIN 0 MAX 0 2 MIN 0 2 MAX 0 3 2 1 MIN 0 -3 3 2 1 -3 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX 0 MIN 0 MAX 0 2 MIN 0 2 MAX 0 3 2 1 MIN 0 -3 3 2 1 -3 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX 0 MIN 0 MAX 0 2 1 MIN 0 2 1 MAX 0 3 2 1 MIN 0 -3 3 2 1 -3 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX 0 MIN 0 MAX 0 2 1 MIN 0 2 1 MAX 0 3 2 1 MIN 0 -3 3 2 1 -3 -5 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX 0 MIN 0 MAX 0 2 1 MIN 0 2 1 MAX 0 3 2 1 MIN 0 -3 3 2 1 -3 -5 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX 0 MIN 0 MAX 0 2 1 MIN 0 2 1 -5 MAX 0 3 2 1 -5 MIN 0 -3 3 2 1 -3 -5 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example 0 0 0 2 1 0 2 1 -5 0 3 2 1 -5 0 -3 3 2 1 -3 -5 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX 0 MIN 0 1 MAX 0 2 1 MIN 0 2 1 -5 MAX 0 3 2 1 -5 MIN 0 -3 3 2 1 -3 -5 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX 1 MIN 0 1 MAX 0 2 1 2 MIN 0 2 1 -5 2 MAX 0 3 2 1 -5 2 MIN 0 -3 3 2 1 -3 -5 2 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Example MAX 1 MIN 0 1 MAX 0 2 1 2 MIN 0 2 1 -5 2 MAX 0 3 2 1 -5 2 MIN 0 -3 3 2 1 -3 -5 2 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
a = 3 a = 3 3 b=-1 3 b=4 (4) -1 -1 4 How much do we gain? • Consider these two cases:
How much do we gain? • Assume a game tree of uniform branching factor b • Minimax examines O(bh) nodes, so does alpha-beta in the worst-case • The gain for alpha-beta is maximum when: • The children of a MAX node are ordered in decreasing backed up values • The children of a MIN node are ordered in increasing backed up values • Then alpha-beta examines O(bh/2) nodes [Knuth and Moore, 1975] • But this requires an oracle (if we knew how to order nodes perfectly, we would not need to search the game tree) • If nodes are ordered at random, then the average number of nodes examined by alpha-beta is ~O(b3h/4)
Alpha-Beta Implementation • MAX-Value(S,,) • If Terminal?(S) return Result(S) • For all S’SUCC(S) • max(,MIN-Value(S’,,)) • If , then return • Return • MIN-Value(S,,) • If Terminal?(S) return Result(S) • For all S’SUCC(S) • min(,MAX-Value(S’,,)) • If , then return • Return • Alpha-Beta-Decision(S) • Return action leading to state S’SUCC(S) that maximizes MIN-Value(S’,-,+)
Heuristic Ordering of Nodes • Order the nodes below the root according to the values backed-up at the previous iteration
Other Improvements • Adaptive horizon + iterative deepening • Extended search: Retain k>1 best paths, instead of just one, and extend the tree at greater depth below their leaf nodes (to help dealing with the “horizon effect”) • Singular extension: If a move is obviously better than the others in a node at horizon h, then expand this node along this move • Use transposition tables to deal with repeated states • Null-move search
Games of Chance • Dice games: backgammon, Yahtzee, craps, … • Card games: poker, blackjack, … • Is there a fundamental difference between the nondeterminism in chess-playing vs. the nondeterminism in a dice roll?