410 likes | 584 Views
Multi-Player Games: Overview and Recent Research. Spencer Polk COMP 4106 February 24, 2014. Overview. Game Playing history and introduction Two-Player Games (brief review) Mini-Max style Multi-Player algorithms Monte Carlo style Multi-Player algorithms
E N D
Multi-Player Games: Overview andRecent Research Spencer Polk COMP 4106 February 24, 2014
Overview • Game Playing history and introduction • Two-Player Games (brief review) • Mini-Max style Multi-Player algorithms • Monte Carlo style Multi-Player algorithms • Current research at Carleton in Multi-Player games
Game Playing: Introduction • Canmachines outplay man? • Legends and Greek Mythology • “The Turk” (left) was presented to nobility as a chess-playing automaton • Dream has just come true! The Turk (1770)
Game Playing: Introduction • The Turk: Played Napoleon, Benjamin Franklin, and Edgar Allen Poe • The Turk was, of course, a fraud • King vs Rook Strategies: Solved by automaton – 1914 • True AI game playing – Claude Shannon: 1950 • Before AI even named as a field – 1956 Dartmouth Conference • Shannon did not first propose Mini-Max theorem, but did first propose Mini-Max algorithm
Game Playing: Introduction • Focus was on chess – still very focused today • Shannon studied chess in 1950 paper • Shannon saw as academic exercise only • Saw no practical purpose; no available hardware • 1970s: First commercially available chess playing programs • 1980s: Chess programs playing at expert levels • Still far to go to Grandmaster level
Game Playing: Introduction • 1997: Deep Blue defeats Kasparov • First defeat of a chess grandmaster • Field branched out • Poker • Go Kasparov vs Deep Blue
Mini-Max Algorithm Sample Mini-Max Tree
Mini-Max Algorithm function integer minimax(node, depth): if node is terminal or depth <= 0 then return heuristic value of node else if node is max then val = −∞ for all child of node do val = max(val, minimax(child; depth − 1) end for else val = ∞ for all child of node do val = min(val; minimax(child; depth − 1) end for end if end if
Alpha-Beta Pruning Alpha-cutoff at blue 4 (MIN node)
Alpha-Beta Pruning • Creates bounds on maximum and minimum values • Alpha Cutoff – Already guaranteed more • Beta Cutoff – Opponent can guarantee less • Tighter bounds = More pruning! • How to improve bounds? – Move ordering! • Expert knowledge (game histories) • Ordering heuristics • Iterative Deepening
Extending to Multi-Player Games • Mini-Max developed for Chess • Exclusively two-player, zero-sum game • Now, want to play multi-player games • Chinese Checkers • Multi-player Othello • Need to extend Mini-Max to multi-player games • Many ways to do this…
Extending to Multi-Player Games • Problem: Mini-Max holds a single value for score • For two players, this is fine • Game is zero sum, so second player’s score is negation • Single value is very valuable – Pruning • Multi-player needs a way to do this • Simple solution: ALL opponents are negation • MAX-MIN-MIN, etc • Called Paranoid Algorithm
Paranoid Algorithm Sample Paranoid Tree
Paranoid Algorithm function integer paranoid(node, depth): if node is terminal or depth <= 0 then return heuristic value of node else if node is max then val = −∞ for all child of node do val = max(val, paranoid(child, depth − 1) end for else val = ∞ for all child of node do val = min(val, paranoid(child, depth − 1) end for end if return val end if
Paranoid Algorithm • Algorithm exact same as Mini-Max in many cases • Pros: • Very simple to implement • Subject to Alpha-Beta Pruning (on MAX/MIN border) • Cons: • Sees all players as coalition – bad play • Limited look-ahead for perspective player
Max-N Algorithm • 1986: Luckhardt and Irani • Attempt to address coalition problem • Keeps a tuple of scores, not a single value • Assumption: Player maximizes their own score • No consideration for other scores • Heuristic returns value for all N players: eg [5, 2, 11] • Nth player maximizes Nth score
Max-N Algorithm Sample Max-N Tree
Max-N Algorithm function integer[] max-n(node, depth): if node is terminal or depth <= 0 then return heuristic value of node else val = −∞ tuple = [] for all child of node do val = max(val, max-n(child; depth − 1)[node.player]) if val changed tuple = max-n(child; depth-1) end if end for return tuple end if
Max-N Algorithm • In terms of raw Mini-Max: Very simple extension • Pros: • Players “look out for number one” • More realistic play • Perspective player can see more opportunities • Cons: • Pruning is very complicated – not as good • Can wind up worse than Paranoid again
Best-Reply Search • Relatively new: February 2011 • All opponents considered to be one player • They only get one turn • Only opponent with best move gets to act • Return to MAX-MIN-MAX-MIN… • Essentially the same algorithm as Mini-Max… Again
Best-Reply Search BRS (One level)
Best-Reply Search function integer best-reply(node, depth): if node is terminal or depth <= 0 then return heuristic value of node else if node is max then val = −∞ for all child of node do val = max(val, best-reply(child; depth − 1) end for else val = ∞ for all opponents do for all opponent’s child at node do val = min(val; best-reply(child; depth − 1) end for end for end if end if
Best-Reply Search • Attempt to get “best of both worlds” • Pros: • Balance between coalition and free-for-all • Allows Alpha-Beta pruning • Significant look-ahead for perspective player • Cons: • Illegal game states analyzed • Not valid for some games
Monte-Carlo Methods • Entirely different way of looking at game playing • No heuristics or searching • Driven by random game playing • Good when there is no natural heuristic • Example: Go • Very simple example: Play 50 random games after each move, pick one with most wins
UCT • Stands for Upper-Confidence bounds applied to Trees • Monte Carlo method used with trees • Navigate from root to leaf • Navigation method is key – leads tree expansion • Play random game(s) at leaf from that position • Propagate win/loss rate back up the tree • After time elapsed – pick move with best win rate
UCT • From Root: Pick explored leaf that maximizes UCTValue • UCTValue = winrate + +sqrt(ln(parent.visits)/visits) • ALWAYS explore unexplored leaf first • Continue until an unexplored leaf is reached • Propagate win or loss back up – usually single value in UCT • Multi-Player and Two-Player are exactly the same
UCT function integer uct(node, depth): for time-steps do position = root while position is explored val = −∞ for child of position !– Unexplored node check--! val = max(val, UCTValue(child)) position = val.node end for end while Play random game(s) at child while position is not root update win-rate for player at node position = position.parent end while end for
Adaptive Data Structures • Other, completely unrelated field • Concerned with record access frequency • Problem: Elements in data structure accessed with different frequency • Solution: Change the structure of the data structure as elements are accessed • Can use list, tree or other • We will use it here to order players
ADS – Move to Front Order of access: 3, 1, 1, 4, …
ADS – Transposition Rule Order of access: R3, R1, R1, …
Threat-ADS Heuristic BRS with Threat-ADS (one level)
Threat-ADS Heuristic • Our contribution • ADS operations are constant, and small • ADS updated to move player with most threatening move forward • Achieves move ordering for Alpha-Beta Pruning
BRS with Threat-ADS function integer brs_threat_ads(node, depth): if node is terminal or depth <= 0 then return heuristic value of node else if node is max then val = −∞ for all child of node do val = max(val, best-reply(child; depth − 1) end for else val = ∞ for all opponents in ADS do for all opponent’s child at node do val = min(val; best-reply(child; depth − 1) end for end for ADS.update(val.opponent) end if end if
Experimental Framework • Game needed to test Threat-ADS heuristic • Needs: • BRS must be applicable • Game should be simple to implement • Use established games Focus and Chinese Checkers • Also develop the Virus Game
Virus Game • Turn based game with N players • Played on 2D board • Goal is to eliminate all other players • Turn: Player “infects” a square they are adjacent to • All nearby squares, according to a configured pattern, are given to that player
Experimental Setup • One player: BRS with Threat-ADS • Others: Random (Interested in tree pruning) • Take Node Count over first few turns of the game • Count each node expanded, but not those pruned • Average over 200 games • Run for each of three games mentioned
Discussion • See improvement in NC in all games • Improvement from 6% to 10% reduction in size • Represents hundreds of thousands of nodes • All results statistically significant to 95% certainty • Focus strongest results • Shows benefits of ADS in even simple capacity to game playing
Research Conclusions and Future Work • Threat-ADS cannot worsen the BRS • ADS operations: O(1) • Threat-ADS only relies on basic BRS structure • Opens up new connections between ADS and MPG • Many possibilities for future work being explored now
Project Ideas • Paranoid, Max-N, BRS game playing • UCT in two player game • UCT in simple multi-player game • Other Monte Carlo game playing algorithms you find • Creative application of ADS to algorithm discussed in class