Intelligent Systems: Advanced Adversarial Search

Notes adapted from lecture notes for CMSC 421 by B.J. Dorr Intelligent Systems: Advanced Adversarial Search Stefan Schlobach With slides from Tom Lenaerts and others

Planet wars IS: games

Players Information (imperfect) Game states (perfect) IS: games

Part 1 RecapMinmaxHeuristics IS: games

Important: No online search yet While we apply MinMax, the environment does NOT change! IS: Advanced Search

Minimax Algorithm function MINIMAX-DECISION(state) returns an action inputs: state, current state in game vMAX-VALUE(state) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  - ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s)) return v function MIN-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s)) return v IS: games

Utility versus heuristics • Utility: value based on quality of the state • Wins with 1 innings and three wickets • Player X gets 33 points, player I 69 • Player X wins with 3 points, by 1 point • Heuristics: value based on estimation of the quality of the state • 2 pawns and a bishop is stronger than a castle. • Playing the trump As is better than a random jack (disputable) IS: games

Restrict search depth (and estimate quality of nodes) 3 MAX MIN 3 0 2 MAX 3 9 0 7 2 6 MIN 2 3 5 9 0 7 4 2 1 5 6

From perfect to imperfect information • Minimax requires too much leaf-node evaluations. • May be impractical within a reasonable amount of time. • SHANNON (1950): • Sacrifice perfect information for performance Interestingenoughthis is theopposite of what we will do withPhase 1 later this week: turn imperfect information into perfect one, and sample over all belief states IS: games

Heuristic EVAL • Idea: produce an estimate of the expected utility of the game from a given position. • Performance depends on quality of EVAL. • Requirements: • EVAL should order terminal-nodes in the same way as UTILITY. • Computation may not take too long. • For non-terminal states the EVAL should be strongly correlated with the actual chance of winning. • Only useful for quiescent (no wild swings in value in near future) states IS: games

Heuristic EVAL example Addition assumes independence Eval(s) = w1 f1(s) + w2 f2(s) + … + wnfn(s) IS: games

Heuristic difficulties: The immortal game (21 June 1851) IS: games

Horizon effect Fixed depth search thinks it can avoid the queening move IS: games

Week 3: Learning Heuristics IS: games

The good news (Schnapsen phase 2) I X Max Min Max Min Max IS: Problem Solving

The bad news 1 (Schnapsen phase 2) I X Max Min Max 5! * 5! = 14.400 6! * 6! = 518.400 Min Max IS: Problem Solving

The bad news 2 (Schnapsen phase 1) ? ? ? ? ? I X Max ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Min Max ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Min ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Max IS: Problem Solving

What’s next? • Trees are too big to systematically search (alpha-beta pruning) • Imperfect Information Games by Perfect Information Monte-Carlo Sampling IS: Problem Solving

But before that…. (25 minutes refreshment) IS: games

Part 1 alpha-betapruning: efficientMinmax IS: games

The taming of the beast (Part 2) IS: games

The bad news 1 (Schnapsen phase 2) I X Max Min Max 5! * 5! = 14.400 6! * 6! = 518.400 Min Max IS: Problem Solving

Problem of minimax search • Number of games states is exponential to the number of moves. • Solution: Do not examine every node • ==> Alpha-beta pruning • Alpha = value of best choice found so far at any choice point along the MAX path • Beta = value of best choice found so far at any choice point along the MIN path • Revisit example … IS: games

Alpha-Beta Example Do DF-search until first leaf Range of possible values [-∞,+∞] [-∞, +∞] IS: games

Alpha-Beta Example (continued) [-∞,+∞] [-∞,3] IS: games

Alpha-Beta Example (continued) [3,+∞] [3,3] IS: games

Alpha-Beta Example (continued) [3,+∞] This node is worse for MAX [3,3] [-∞,2] IS: games

Alpha-Beta Example (continued) , [3,14] [3,3] [-∞,2] [-∞,14] IS: games

Alpha-Beta Example (continued) , [3,5] [3,3] [−∞,2] [-∞,5] IS: games

Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [−∞,2] IS: games

Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [-∞,2] IS: games

Pauze? IS: games

Alpha-Beta Algorithm function ALPHA-BETA-SEARCH(state) returns an action inputs: state, current state in game vMAX-VALUE(state, - ∞ , +∞) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  - ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s,  , )) ifv ≥ then returnv  MAX( ,v) return v IS: games

Alpha-Beta Algorithm function MIN-VALUE(state,  , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  + ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s,  , )) ifv ≤ then returnv  MIN( ,v) return v IS: games

Comments about Alpha-Beta Pruning • Pruning does not affect final results • Entire subtrees can be pruned. • Good move ordering improves effectiveness of pruning • With “perfect ordering,” time complexity is O(bm/2) • Alpha-beta pruning can look twice as far as minimax in the same amount of time IS: games

More on milestone 1 • We needtoimplementPhase 2 extremelyefficiently (youwillsee later why). • So, on top of standard MinMaxyoushouldalsoimplementalpha-betapruning. • (andmaybe we willnotuseeither) IS: games

Part 3 Search withno or partial information IS: Advanced Search

Search with no or partial information • Partial knowledge of states and actions: • contingency problem • Percepts provide new information about current state; often interleave search and execution. • If uncertainty is caused by actions of another agent: • exploration problem • When states and actions of the environment are unknown. • sensorless or conformant problem • Agent may have no idea where it is; solution (if any) is a sequence. IS: Advanced Search

Sensorless problems • start in {1,2,3,4,5,6,7,8} e.g Right goes to {2,4,6,8}. Solution?? • [Right, Suck, Left,Clean] -> 7 • When the world is not fully observable: reason about a set of states that might be reached =belief state IS: Advanced Search

Sensorless problems • Search space of belief states • Solution = belief state with all members goal states. • If S states then 2S belief states. IS: Advanced Search

Belief state of vacuum-world IS: Advanced Search

Part 3 Games withpartial information SchnapsenPhase 1 IS: games

The bad news (Schnapsen phase 1) ? ? ? ? ? I X Max ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Min Max ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Min ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Max IS: Problem Solving

Uncertainty in Schnapsen • There is no chance (once the cards are distributed) just uncertainty • Uncertainty implies Imperfect Information Game. IS: Problem Solving

Players Information (imperfect) Game states (perfect) IS: games

Will simple MinMax work? IS: games

Belief states (Many of them) IS: games

The full search tree for Schnapsen? 14 over 5 * 4 Schnapsen: a simple game? A simple problem? IS: games

Perfect Information Monte-Carlo Sampling Phase 1 Allpossible belief spaces MinMax MinMax MinMax Phase 2 IS: games

Intelligent Systems: Advanced Adversarial Search

Intelligent Systems: Advanced Adversarial Search

Presentation Transcript

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search