game playing

Notes adapted from lecture notes for CMSC 421 by B.J. Dorr game playing

Outline • What are games? • Optimal decisions in games • Which strategy leads to success? • - pruning • Games of imperfect information • Games that include an element of chance

What are and why study games? • Games are a form of multi-agent environment • What do other agents do and how do they affect our success? • Cooperative vs. competitive multi-agent environments. • Competitive multi-agent environments give rise to adversarial problems a.k.a. games • Why study games? • Fun; historically entertaining • Interesting subject of study because they are hard • Easy to represent and agents restricted to small number of actions

Relation of Games to Search • Search – no adversary • Solution is (heuristic) method for finding goal • Heuristics and CSP techniques can find optimal solution • Evaluation function: estimate of cost from start to goal through given node • Examples: path planning, scheduling activities • Games – adversary • Solution is strategy (strategy specifies move for every possible opponent reply). • Time limits force an approximate solution • Evaluation function: evaluate “goodness” of game position • Examples: chess, checkers, Othello, backgammon

Types of Games

Game setup • Two players: MAX and MIN • MAX moves first and they take turns until the game is over. Winner gets award, looser gets penalty. • Games as search: • Initial state: e.g. board configuration of chess • Successor function: list of (move,state) pairs specifying legal moves. • Terminal test: Is the game finished? • Utility function: Gives numerical value of terminal states. E.g. win (+1), lose (-1) and draw (0) in tic-tac-toe (next) • MAX uses search tree to determine next move.

Partial Game Tree for Tic-Tac-Toe

Optimal strategies • Find the contingent strategy for MAX assuming an infallible MIN opponent. • Assumption: Both players play optimally !! • Given a game tree, the optimal strategy can be determined by using the minimax value of each node: MINIMAX-VALUE(n)= UTILITY(n) If n is a terminal maxs  successors(n) MINIMAX-VALUE(s) If n is a max node mins  successors(n) MINIMAX-VALUE(s) If n is a min node

Two-Ply Game Tree

Two-Ply Game Tree • MAX nodes • MIN nodes • Terminal nodes – utility values for MAX • Other nodes – minimax values • MAX’s best move at root – a1 - leads to the successor with the highest minimax value • MIN’s best to reply – b1 – leads to the successor with the lowest minimax value

What if MIN does not play optimally? • Definition of optimal play for MAX assumes MIN plays optimally: maximizes worst-case outcome for MAX. • But if MIN does not play optimally, MAX will do even better. [Can be proved.]

Minimax Algorithm function MINIMAX-DECISION(state) returns an action inputs: state, current state in game vMAX-VALUE(state) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s)) return v function MIN-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s)) return v

Properties of Minimax    

Tic-Tac-Toe • Depth bound 2 • Breadth first search until all nodes at level 2 are generated • Apply evaluation function to positions at these nodes

Tic-Tac-Toe • Evaluation function e(p) of a position p • If p is not a winning position for either player • e(p) = (number of a complete rows, columns, or diagonals that are still open for MAX) – (number of a complete rows, columns, or diagonals that are still open for MIN) • If p is a win for MAX • e(p) =  • If p is a win for MIN • e(p) = - 

1 -1 -2 1 Tic-Tac-Toe - First stage o x 6-4=2 5-4=1 x o 4-6=-2 x o x x MAX’s move o 6-6=0 x 4-5=-1 o x o 5-6=-1 5-5=0 x x o o 6-5=1 5-5=0 o x x o 5-5=0 x x 5-6=-1 o x 6-5=1 x o

o x x 0 1 1 0 1 Tic-Tac-Toe - Second stage o MAX’s move o x 3-2=1 o x x x o x o o o x 4-2=2 4-3=1 o x x o x x x o o o o x 3-2=1 o o x x 4-3=1 4-2=2 x x x 5-2=3 o o o x o x o x 3-3=0 o x 4-2=2 x x o x x 3-2=1 o x o o o x 5-3=2 x o o x 3-2=1 x x o x 4-2=2 x o o o x o 3-3=0 5-2=3 3-3=0 x x o x x o x o o x 4-3=1 o x o x x 4-3=1 4-2=2 o x o x o o x 4-3=1 4-2=2 o x x o x o

Multiplayer games • Games allow more than two players • Single minimax values become vectors

Problem of minimax search • Number of games states is exponential to the number of moves. • Solution: Do not examine every node • ==> Alpha-beta pruning • Alpha = value of best choice found so far at any choice point along the MAX path • Beta = value of best choice found so far at any choice point along the MIN path • Revisit example …

-1 Tic-Tac-Toe - First stage Beta value = -1 B x Alpha value = -1 4-5=-1 o x 5-5=0 x o o 6-5=1 x o 5-5=0 x x A 5-6=-1 C o x 6-5=1 x o

Alpha-beta pruning • Search progresses in a depth first manner • Whenever a tip node is generated, it’s static evaluation is computed • Whenever a position can be given a backed up value, this value is computed • Node A and all its successors have been generated • Backed up value -1 • Node B and its successors have not yet been generated • Now we know that the backed up value of the start node is bounded from below by -1

Alpha-beta pruning • Depending on the back up values of the other successors of the start node, the final backed up value of the start node may be greater than -1, but it cannot be less • This lower bound – alpha value for the start nodea

Alpha-beta pruning • Depth first search proceeds – node B and its first successor node C are generated • Node C is given a static value of -1 • Backed up value of node B is bounded from above by -1 • This Upper bound on node B – beta value. • Therefore discontinue search below node B. • Node B. will not turn out to be preferable to node A

Alpha-beta pruning • Reduction in search effort achieved by keeping track of bounds on backed up values • As successors of a node are given backed up values, the bounds on backed up values can be revised • Alpha values of MAX nodes that can never decrease • Beta values of MIN nodes can never increase

Alpha-beta pruning • Therefore search can be discontinued • Below any MIN node having a beta value less than or equal to the alpha value of any of its MAX node ancestors • The final backed up value of this MIN node can then be set to its beta value • Below any MAX node having an alpha value greater than or equal to the beta value of any of its MINa node ancestors • The final backed up value of this MAX node can then be set to its alpha value

Alpha-beta pruning • during search, alpha and beta values are computed as follows: • The alpha value of a MAX node is set equal to the current largest final backed up value of its successors • The beta value of a MINof the node is set equal to the current smallest final backed up value of its successors

Alpha-Beta Example Do DF-search until first leaf Range of possible values [-∞,+∞] [-∞, +∞]

Alpha-Beta Example (continued) [-∞,+∞] [-∞,3]

Alpha-Beta Example (continued) [3,+∞] [3,3]

Alpha-Beta Example (continued) [3,+∞] This node is worse for MAX [3,3] [-∞,2]

Alpha-Beta Example (continued) , [3,14] [3,3] [-∞,2] [-∞,14]

Alpha-Beta Example (continued) , [3,5] [3,3] [−∞,2] [-∞,5]

Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [−∞,2]

Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [-∞,2]

Alpha-Beta Algorithm function ALPHA-BETA-SEARCH(state) returns an action inputs: state, current state in game vMAX-VALUE(state, - ∞ , +∞) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  - ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s,  , )) ifv ≥ then returnv  MAX( ,v) return v

Alpha-Beta Algorithm function MIN-VALUE(state,  , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  + ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s,  , )) ifv ≤ then returnv  MIN( ,v) return v

General alpha-beta pruning • Consider a node n somewhere in the tree • If player has a better choice at • Parent node of n • Or any choice point further up • n will never be reached in actual play. • Hence when enough is known about n, it can be pruned.

Final Comments about Alpha-Beta Pruning • Pruning does not affect final results • Entire subtrees can be pruned. • Good move ordering improves effectiveness of pruning • Alpha-beta pruning can look twice as far as minimax in the same amount of time • Repeated states are again possible. • Store them in memory = transposition table

Games that include chance • Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)

Games that include chance chance nodes • Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16) • [1,1], [6,6] chance 1/36, all other chance 1/18

Games that include chance • [1,1], [6,6] chance 1/36, all other chance 1/18 • Can not calculate definite minimax value, only expected value

Expected minimax value EXPECTED-MINIMAX-VALUE(n)= UTILITY(n) If n is a terminal maxs  successors(n) MINIMAX-VALUE(s) If n is a max node mins  successors(n) MINIMAX-VALUE(s) If n is a max node s  successors(n) P(s) . EXPECTEDMINIMAX(s) If n is a chance node These equations can be backed-up recursively all the way to the root of the game tree.

Position evaluation with chance nodes • Left, A1 wins • Right A2 wins • Outcome of evaluation function may not change when values are scaled differently. • Behavior is preserved only by a positive linear transformation of EVAL.

Discussion • Examine section on state-of-the-art games yourself • Minimax assumes right tree is better than left, yet … • Return probability distribution over possible values • Yet expensive calculation

Discussion • Utility of node expansion • Only expand those nodes which lead to significanlty better moves • Both suggestions require meta-reasoning

Summary • Games are fun (and dangerous) • They illustrate several important points about AI • Perfection is unattainable -> approximation • Good idea what to think about • Uncertainty constrains the assignment of values to states • Games are to AI as grand prix racing is to automobile design.

game playing

game playing

Presentation Transcript

Game Playing

Game Playing

Game Playing

Game Playing

Game Playing

Game-Playing

Game Playing

Game Playing

GAME PLAYING

Game Playing

Game playing

Game Playing

Game Playing

Game Playing

Game Playing

Game Playing

Game Playing

Game Playing

Game Playing

Game Playing

Game Playing

Game Playing