530 likes | 572 Views
Notes adapted from lecture notes for CMSC 421 by B.J. Dorr. Intelligent Systems: Advanced Adversarial Search. Stefan Schlobach With slides from Tom Lenaerts and others. Planet wars. Players Information (imperfect). Game states (perfect). Part 1. Recap Minmax Heuristics.
E N D
Notes adapted from lecture notes for CMSC 421 by B.J. Dorr Intelligent Systems: Advanced Adversarial Search Stefan Schlobach With slides from Tom Lenaerts and others
Planet wars IS: games
Players Information (imperfect) Game states (perfect) IS: games
Part 1 RecapMinmaxHeuristics IS: games
Important: No online search yet While we apply MinMax, the environment does NOT change! IS: Advanced Search
Minimax Algorithm function MINIMAX-DECISION(state) returns an action inputs: state, current state in game vMAX-VALUE(state) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v - ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s)) return v function MIN-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s)) return v IS: games
Utility versus heuristics • Utility: value based on quality of the state • Wins with 1 innings and three wickets • Player X gets 33 points, player I 69 • Player X wins with 3 points, by 1 point • Heuristics: value based on estimation of the quality of the state • 2 pawns and a bishop is stronger than a castle. • Playing the trump As is better than a random jack (disputable) IS: games
Restrict search depth (and estimate quality of nodes) 3 MAX MIN 3 0 2 MAX 3 9 0 7 2 6 MIN 2 3 5 9 0 7 4 2 1 5 6
From perfect to imperfect information • Minimax requires too much leaf-node evaluations. • May be impractical within a reasonable amount of time. • SHANNON (1950): • Sacrifice perfect information for performance Interestingenoughthis is theopposite of what we will do withPhase 1 later this week: turn imperfect information into perfect one, and sample over all belief states IS: games
Heuristic EVAL • Idea: produce an estimate of the expected utility of the game from a given position. • Performance depends on quality of EVAL. • Requirements: • EVAL should order terminal-nodes in the same way as UTILITY. • Computation may not take too long. • For non-terminal states the EVAL should be strongly correlated with the actual chance of winning. • Only useful for quiescent (no wild swings in value in near future) states IS: games
Heuristic EVAL example Addition assumes independence Eval(s) = w1 f1(s) + w2 f2(s) + … + wnfn(s) IS: games
Heuristic difficulties: The immortal game (21 June 1851) IS: games
Horizon effect Fixed depth search thinks it can avoid the queening move IS: games
Week 3: Learning Heuristics IS: games
The good news (Schnapsen phase 2) I X Max Min Max Min Max IS: Problem Solving
The bad news 1 (Schnapsen phase 2) I X Max Min Max 5! * 5! = 14.400 6! * 6! = 518.400 Min Max IS: Problem Solving
The bad news 2 (Schnapsen phase 1) ? ? ? ? ? I X Max ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Min Max ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Min ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Max IS: Problem Solving
What’s next? • Trees are too big to systematically search (alpha-beta pruning) • Imperfect Information Games by Perfect Information Monte-Carlo Sampling IS: Problem Solving
Part 1 alpha-betapruning: efficientMinmax IS: games
The taming of the beast (Part 2) IS: games
The bad news 1 (Schnapsen phase 2) I X Max Min Max 5! * 5! = 14.400 6! * 6! = 518.400 Min Max IS: Problem Solving
Problem of minimax search • Number of games states is exponential to the number of moves. • Solution: Do not examine every node • ==> Alpha-beta pruning • Alpha = value of best choice found so far at any choice point along the MAX path • Beta = value of best choice found so far at any choice point along the MIN path • Revisit example … IS: games
Alpha-Beta Example Do DF-search until first leaf Range of possible values [-∞,+∞] [-∞, +∞] IS: games
Alpha-Beta Example (continued) [-∞,+∞] [-∞,3] IS: games
Alpha-Beta Example (continued) [-∞,+∞] [-∞,3] IS: games
Alpha-Beta Example (continued) [3,+∞] [3,3] IS: games
Alpha-Beta Example (continued) [3,+∞] This node is worse for MAX [3,3] [-∞,2] IS: games
Alpha-Beta Example (continued) , [3,14] [3,3] [-∞,2] [-∞,14] IS: games
Alpha-Beta Example (continued) , [3,5] [3,3] [−∞,2] [-∞,5] IS: games
Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [−∞,2] IS: games
Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [-∞,2] IS: games
Pauze? IS: games
Alpha-Beta Algorithm function ALPHA-BETA-SEARCH(state) returns an action inputs: state, current state in game vMAX-VALUE(state, - ∞ , +∞) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v - ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s, , )) ifv ≥ then returnv MAX( ,v) return v IS: games
Alpha-Beta Algorithm function MIN-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v + ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s, , )) ifv ≤ then returnv MIN( ,v) return v IS: games
Comments about Alpha-Beta Pruning • Pruning does not affect final results • Entire subtrees can be pruned. • Good move ordering improves effectiveness of pruning • With “perfect ordering,” time complexity is O(bm/2) • Alpha-beta pruning can look twice as far as minimax in the same amount of time IS: games
More on milestone 1 • We needtoimplementPhase 2 extremelyefficiently (youwillsee later why). • So, on top of standard MinMaxyoushouldalsoimplementalpha-betapruning. • (andmaybe we willnotuseeither) IS: games
Part 3 Search withno or partial information IS: Advanced Search
Search with no or partial information • Partial knowledge of states and actions: • contingency problem • Percepts provide new information about current state; often interleave search and execution. • If uncertainty is caused by actions of another agent: • exploration problem • When states and actions of the environment are unknown. • sensorless or conformant problem • Agent may have no idea where it is; solution (if any) is a sequence. IS: Advanced Search
Sensorless problems • start in {1,2,3,4,5,6,7,8} e.g Right goes to {2,4,6,8}. Solution?? • [Right, Suck, Left,Clean] -> 7 • When the world is not fully observable: reason about a set of states that might be reached =belief state IS: Advanced Search
Sensorless problems • Search space of belief states • Solution = belief state with all members goal states. • If S states then 2S belief states. IS: Advanced Search
Belief state of vacuum-world IS: Advanced Search
Part 3 Games withpartial information SchnapsenPhase 1 IS: games
The bad news (Schnapsen phase 1) ? ? ? ? ? I X Max ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Min Max ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Min ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Max IS: Problem Solving
Uncertainty in Schnapsen • There is no chance (once the cards are distributed) just uncertainty • Uncertainty implies Imperfect Information Game. IS: Problem Solving
Players Information (imperfect) Game states (perfect) IS: games
Will simple MinMax work? IS: games
Belief states (Many of them) IS: games
The full search tree for Schnapsen? 14 over 5 * 4 Schnapsen: a simple game? A simple problem? IS: games
Perfect Information Monte-Carlo Sampling Phase 1 Allpossible belief spaces MinMax MinMax MinMax Phase 2 IS: games