780 likes | 908 Views
Adversarial Search. Adversarial Search. Game playing Perfect play The minimax algorithm alpha-beta pruning Resource limitations Elements of chance Imperfect information. Game Playing State-of-the-Art.
E N D
Adversarial Search Game playing • Perfect play • The minimaxalgorithm • alpha-beta pruning • Resource limitations • Elements of chance • Imperfect information
Game Playing State-of-the-Art • Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 443,748,401,247 positions. Checkers is now solved! • Chess: Deep Blue defeated human world champion GaryKasparov in a six-game match in 1997. Deep Blue examined 200 million positions per second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply. Current programs are even better, if less historic. • Othello: Human champions refuse to compete against computers, which are too good. • Go: Human champions are beginning to be challenged by machines, though the best humans still beat the best machines. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves, along with aggressive pruning. • Pacman: unknown
What kind of games? • Abstraction: To describe a game we must capture every relevant aspect of the game. Such as: • Chess • Tic-tac-toe • … • Accessible environments: Such games are characterized by perfect information • Search: game-playing then consists of a search through possible game positions • Unpredictable opponent: introduces uncertainty thus game-playing must deal with contingency problems
Game Playing • Many different kinds of games! • Axes: • Deterministic or stochastic? • One, two, or more players? • Perfect information (can you see the state)? • Turn taking or simultaneous action? • Want algorithms for calculating a strategy (policy) which recommends a move in each state
Deterministic Games • Deterministic, single player, perfect information: • Know the rules • Know what actions do • Know when you win • E.g. Freecell, 8-Puzzle, Rubik’s cube • … it’s just search! • Slight reinterpretation: • Each node stores a value: the best outcome it can reach • This is the maximal outcome of its children (the max value) • Note that we don’t have path sums as before (utilities at end) • After search, can pick move that leads to best node
Deterministic Two-Player • E.g. tic-tac-toe, chess, checkers • Zero-sum games • One player maximizes result • The other minimizes result • Minimaxsearch • A state-space search tree • Players alternate • Each layer, or ply, consists of around of moves* • Choose move to position with highest minimax value = best achievable utility against best play
Games vs. search problems • “Unpredictable"opponentsolutionisa strategyspecifyinga movefor every possibleopponent reply • Timelimitsunlikelytofindgoal, must approximate • Plan of attack: • Computer considers possiblelinesof play (Babbage,1846) • Algorithm for perfectplay (Zermelo,1912;Von Neumann,1944) • Finitehorizon, approximate evaluation (Zuse, 1945;Wiener, 1948;Shannon,1950) • Firstchess program(Turing, 1951) • Machine learningtoimprove evaluationaccuracy (Samuel,1952- 57) • Pruning to allowdeepersearch (McCarthy,1956)
Searching for the next move • Complexity:many games have ahuge searchspace • Chess: b=35,m=100 nodes=35100 • ifeachnodetakesabout1 nstoexplore theneachmove willtakeabout10 50 millennia tocalculate. • Resource(e.g., time, memory)limit:optimal solutionnotfeasible/possible,thusmustapproximate • Pruning:makesthesearch moreefficientby discardingportionsof thesearch treethatcannot improvequalityresult. • Evaluationfunctions:heuristicstoevaluateutilityof a state withoutexhaustivesearch.
Two-player games • A game formulatedas a searchproblem board positionand turn definitionof legalmoves conditions for when game isover • Initial state: • Operators: • Terminalstate: • Utilityfunction: ofthe anumericvalue that describesthe outcome game. E.g.,-1, 0,1for loss, draw,win. (AKApayofffunction)
Example:Tic-Tac-Toe CS561-Lecture7-8-Macskassy-Spring2011
The minimax algorithm • Perfectplay for deterministicenvironmentswithperfect information • Basicidea: choose move withhighestminimaxvalue • =best achievable payoff against best play • Algorithm: • Generategametree completely • Determine utility ofeachterminalstate • Propagatetheutility valuesupwardinthethree byapplying • MINandMAXoperators onthenodesinthecurrentlevel • Attherootnodeuseminimaxdecisiontoselect themove • withthemax(ofthemin)utility value • Steps2and3 inthealgorithmassume thatthe opponentwillplay perfectly.
Generate Game Tree 1ply 1move
A sub-tree win lose draw CS561-Lecture7-8-Macskassy-Spring2011 20
What is a good move? win lose draw CS561-Lecture7-8-Macskassy-Spring2011
MiniMax Example MAX MIN CS561-Lecture7-8-Macskassy-Spring2011
MiniMax: Recursive Implementation CS561-Lecture7-8-Macskassy-Spring2011
Minimax Properties • Optimalagainsta perfectplayer. Otherwise? • Time complexity? • O(bm) • Space complexity? • O(bm) • For chess,b=35, m=100 • Exact solutioniscompletelyinfeasible • But, doweneed toexplorethe whole tree? max min 10 10 9 100
Resource Limits max • Cannot search to leaves • Depth-limited search • Instead, search a limited depth of tree • Replace terminal utilities with an eval function for non-terminal positions • Guarantee of optimal play is gone • More plies makes a BIG difference • Example: • Suppose we have 100 seconds, can explore 10K nodes / sec • So can check 1M nodes per move • α – β reaches about depth 8 – decent chess program 4 -2 4 min min -1 -2 4 9 ? ? ? ?
Evaluation Functions • Functionwhichscores non-terminals • Idealfunction:returnsthe utilityof theposition • Inpractice: typicallyweightedlinearsumoffeatures: • e.g.f1(s)=(num whitequeens–numblackqueens),etc.
Why Pacman starves • Heknowshisscorewill go upbyeatingthedot now • Heknowshisscore will go up justas muchby eatingthe dotlater on • Therearenopoint- scoring opportunities after eatingthedot • Therefore,waiting seems justas goodas eating
- pruning:generalprinciple Player m Opponent If > v thenMAXwillchosemso prunetreeunder n Similarfor forMIN Player n v Opponent CS561-Lecture7-8-Macskassy-Spring2011
- pruning:example1 [-∞,+∞] MAX MIN CS561-Lecture7-8-Macskassy-Spring2011 30
- pruning:example1 [-∞,+∞] MAX MIN 3 CS561-Lecture7-8-Macskassy-Spring2011
- pruning:example1 [-∞,+∞] MAX MIN [-∞,3] 3 CS561-Lecture7-8-Macskassy-Spring2011
- pruning:example1 [-∞,+∞] MAX MIN [-∞,3] 3 12 CS561-Lecture7-8-Macskassy-Spring2011
- pruning:example1 [-∞,+∞][3,+∞] MAX MIN [-∞,3] 3 12 8 CS561-Lecture7-8-Macskassy-Spring2011
- pruning:example1 [-∞,+∞][3,+∞] MAX MIN [3,2] [-∞,3] 3 12 8 2 CS561-Lecture7-8-Macskassy-Spring2011
- pruning:example1 [-∞,+∞][3,+∞] MAX MIN [3,2] [3,14] [-∞,3] 3 12 8 2 14 CS561-Lecture7-8-Macskassy-Spring2011
- pruning:example1 [-∞,+∞][3,+∞] MAX MIN [3,2] [3,5] [-∞,3] [3,14] 3 12 8 2 14 5 CS561-Lecture7-8-Macskassy-Spring2011
- pruning:example1 [-∞,+∞][3,+∞] MAX MIN [3,2] [3,5] [3,2] [-∞,3] [3,14] 3 12 8 2 14 5 2 CS561-Lecture7-8-Macskassy-Spring2011
- pruning:example1 [-∞,+∞][3,+∞] MAX Selectedmove MIN [3,2] [3,5] [3,2] [-∞,3] [3,14] 3 12 8 2 14 5 2 CS561-Lecture7-8-Macskassy-Spring2011
- pruning:example2 [-∞,+∞] MAX MIN CS561-Lecture7-8-Macskassy-Spring2011 40
- pruning:example2 [-∞,+∞] MAX [-∞,2] MIN 2 CS561-Lecture7-8-Macskassy-Spring2011
- pruning:example2 [-∞,+∞] MAX [-∞,2] MIN 5 2 CS561-Lecture7-8-Macskassy-Spring2011
- pruning:example2 [-∞,+∞][2,+∞] MAX [-∞,2] MIN 14 5 2 CS561-Lecture7-8-Macskassy-Spring2011
- pruning:example2 [-∞,+∞][2,+∞] MAX [-∞,2] MIN [2,5] 5 14 5 2 CS561-Lecture7-8-Macskassy-Spring2011
- pruning:example2 [-∞,+∞][2,+∞] MAX [-∞,2] MIN [2,5][2,1] 1 5 14 5 2 CS561-Lecture7-8-Macskassy-Spring2011
- pruning:example2 [-∞,+∞][2,+∞] MAX [-∞,2] MIN [2,8] [2,5][2,1] 8 1 5 14 5 2 CS561-Lecture7-8-Macskassy-Spring2011
- pruning:example2 [-∞,+∞][2,+∞] MAX [-∞,2] MIN [2,8] [2,5][2,1] 12 8 1 5 14 5 2 CS561-Lecture7-8-Macskassy-Spring2011
- pruning:example2 [-∞,+∞][2,+∞][3, +∞] MAX [-∞,2] MIN [2,8] [2,3] [2,5][2,1] 3 12 8 1 5 14 5 2 CS561-Lecture7-8-Macskassy-Spring2011
- pruning:example2 [-∞,+∞][2,+∞][3, +∞] MAX Selectedmove [-∞,2] MIN [2,8] [2,3] [2,5][2,1] 3 12 8 1 5 14 5 2 CS561-Lecture7-8-Macskassy-Spring2011
- pruning:example3 MIN MAX [6,∞] MIN CS561-Lecture7-8-Macskassy-Spring2011 50
- pruning:example3 [-∞,6] MIN [-∞,6] MAX [6,∞] MIN CS561-Lecture7-8-Macskassy-Spring2011