410 likes | 629 Views
Monte Carlo Tree Search: Insights and Applications BCS Real AI Event. Simon Lucas Game Intelligence Group University of Essex. Outline. General machine intelligence: the ingredients Monte Carlo Tree Search A quick overview and tutorial Example application: Mapello
E N D
Monte Carlo Tree Search:Insights and ApplicationsBCS Real AI Event Simon Lucas Game Intelligence Group University of Essex
Outline • General machine intelligence: the ingredients • Monte Carlo Tree Search • A quick overview and tutorial • Example application: Mapello • Note: Game AI is Real AI !!! • Example test problem: Physical TSP • Results of open competitions • Challenges and future directions
General Machine Intelligence: the ingredients • Evolution • Reinforcement Learning • Function approximation • Neural nets, N-Tuples etc • Selective search / Sample based planning / Monte Carlo Tree Search
Conventional Game Tree Search • Minimax with alpha-beta pruning, transposition tables • Works well when: • A good heuristic value function is known • The branching factor is modest • E.g. Chess: Deep Blue, Rybka • Super-human on a smartphone! • Tree grows exponentially with search depth
Go • Much tougher for computers • High branching factor • No good heuristic value function • MCTS to the rescue! “Although progress has been steady, it will take many decades of research and development before world-championship–calibre go programs exist”. Jonathan Schaeffer, 2001
Monte Carlo Tree Search (MCTS) Upper Confidence bounds for Trees (UCT)Further reading:
Attractive Features • Anytime • Scalable • Tackle complex games and planning problems better than before • May be logarithmically better with increased CPU • No need for heuristic function • Though usually better with one • Next we’ll look at: • General MCTS • UCT in particular
MCTS: the main idea • Tree policy: choose which node to expand (not necessarily leaf of tree) • Default (simulation) policy: random playout until end of game
MCTS Algorithm • Decompose into 6 parts: • MCTS main algorithm • Tree policy • Expand • Best Child (UCT Formula) • Default Policy • Back-propagate • We’ll run through these then show demos
MCTS Main Algorithm • BestChild simply picks best child node of root according to some criteria: e.g. best mean value • In our pseudo-code BestChild is called from TreePolicy and from MctsSearch, but different versions can be used • E.g. final selection can be the max value child or the most frequently visited one
TreePolicy • Note that node selected for expansion does not need to be a leaf of the tree • But it must have at least one untried action
Best Child (UCT) • This is the standard UCT equation • Used in the tree • Higher values of c lead to more exploration • Other terms can be added, and usually are • More on this later
DefaultPolicy • Each time a new node is added to the tree, the default policy randomly rolls out from the current state until a terminal state of the game is reached • The standard is to do this uniformly randomly • But better performance may be obtained by biasing with knowledge
Backup • Note that v is the new node added to the tree by the tree policy • Back up the values from the added node up the tree to the root
All Moves As First (AMAF),Rapid Value Action Estimates (RAVE) • Additional term in UCT equation: • Treat actions / moves the same independently of where they occur in the move sequence
Othello • Each move you must Pincer one or more opponent counters between the one you place and an existing one of your colour • Pincered counters are flipped to your own colour • Winner is player with most pieces at the end
Basics of Good Game Design • Simple rules • Balance • Sense of drama • Outcome should not be obvious
Othello Example – white leads: -58(from http://radagast.se/othello/Help/strategy.html )
Mapello • Take the counter-flipping drama of Othello • Apply it to novel situations • Obstacles • Power-ups (e.g. triple square score) • Large maps with power-plays e.g. line fill • Novel games • Allow users to design maps that they are expert in • The map design is part of the game • Research bonus: large set of games to experiment with
Need Rapidly Smart AI • Give players a challenging game • Even when the game map can be new each time • Obvious easy to apply approaches • TD Learning • Monte Carlo Tree Search (MCTS • Combinations of these … • E.g. Silver et al, ICML 2008 • Robles et al, CIG 2011
MCTS (see Browne et al, TCIAIG 2012) • Simple algorithm • Anytime • No need for a heuristic value function • E-E balance • Works well across a range of problems
Demo • TDL learns reasonable weights rapidly • How well will this play at 1 ply versus limited toll-out MCTS?
For Strong Play … • Combine MCTS, TDL, N-Tuples
Where to play / buy • Coming to Android (November 2012) • Nestorgames (http://www.nestorgames.com)
MCTS in Real-Time Games: PTSP • Hard to get long-term planning without good heuristics
MCTS: Challenges and Future Directions • Better handling of problems with continuous action spaces • Some work already done on this • Better understanding of handling real-time problems • Use of approximations and macro-actions • Stochastic and partially observable problems / games of incomplete and imperfect information • Hybridisation: • with evolution • with other tree search algorithms
Conclusions • MCTS: a major new approach to AI • Works well across a range of problems • Good performance even with vanilla UCT • Best performance requires tuning and heuristics • Sometimes the UCT formula is modified or discarded • Can be used in conjunction with RL • Self tuning • And with evolution • E.g. evolving macro-actions
Further reading and links • http://ptsp-game.net/ • http://www.pacman-vs-ghosts.net/