Monte Carlo Tree Search: Insights and Applications BCS Real AI Event

Monte Carlo Tree Search:Insights and ApplicationsBCS Real AI Event Simon Lucas Game Intelligence Group University of Essex

Outline • General machine intelligence: the ingredients • Monte Carlo Tree Search • A quick overview and tutorial • Example application: Mapello • Note: Game AI is Real AI !!! • Example test problem: Physical TSP • Results of open competitions • Challenges and future directions

General Machine Intelligence: the ingredients • Evolution • Reinforcement Learning • Function approximation • Neural nets, N-Tuples etc • Selective search / Sample based planning / Monte Carlo Tree Search

Conventional Game Tree Search • Minimax with alpha-beta pruning, transposition tables • Works well when: • A good heuristic value function is known • The branching factor is modest • E.g. Chess: Deep Blue, Rybka • Super-human on a smartphone! • Tree grows exponentially with search depth

Go • Much tougher for computers • High branching factor • No good heuristic value function • MCTS to the rescue! “Although progress has been steady, it will take many decades of research and development before world-championship–calibre go programs exist”. Jonathan Schaeffer, 2001

Monte Carlo Tree Search (MCTS) Upper Confidence bounds for Trees (UCT)Further reading:

Attractive Features • Anytime • Scalable • Tackle complex games and planning problems better than before • May be logarithmically better with increased CPU • No need for heuristic function • Though usually better with one • Next we’ll look at: • General MCTS • UCT in particular

MCTS: the main idea • Tree policy: choose which node to expand (not necessarily leaf of tree) • Default (simulation) policy: random playout until end of game

MCTS Algorithm • Decompose into 6 parts: • MCTS main algorithm • Tree policy • Expand • Best Child (UCT Formula) • Default Policy • Back-propagate • We’ll run through these then show demos

MCTS Main Algorithm • BestChild simply picks best child node of root according to some criteria: e.g. best mean value • In our pseudo-code BestChild is called from TreePolicy and from MctsSearch, but different versions can be used • E.g. final selection can be the max value child or the most frequently visited one

TreePolicy • Note that node selected for expansion does not need to be a leaf of the tree • But it must have at least one untried action

Expand

Best Child (UCT) • This is the standard UCT equation • Used in the tree • Higher values of c lead to more exploration • Other terms can be added, and usually are • More on this later

DefaultPolicy • Each time a new node is added to the tree, the default policy randomly rolls out from the current state until a terminal state of the game is reached • The standard is to do this uniformly randomly • But better performance may be obtained by biasing with knowledge

Backup • Note that v is the new node added to the tree by the tree policy • Back up the values from the added node up the tree to the root

MCTS Builds Asymmetric Trees (demo)

All Moves As First (AMAF),Rapid Value Action Estimates (RAVE) • Additional term in UCT equation: • Treat actions / moves the same independently of where they occur in the move sequence

Using for a new problem:Implement the State interface

Example Application: Mapello

Othello • Each move you must Pincer one or more opponent counters between the one you place and an existing one of your colour • Pincered counters are flipped to your own colour • Winner is player with most pieces at the end

Basics of Good Game Design • Simple rules • Balance • Sense of drama • Outcome should not be obvious

Othello Example – white leads: -58(from http://radagast.se/othello/Help/strategy.html )

Black wins with score of 16

Mapello • Take the counter-flipping drama of Othello • Apply it to novel situations • Obstacles • Power-ups (e.g. triple square score) • Large maps with power-plays e.g. line fill • Novel games • Allow users to design maps that they are expert in • The map design is part of the game • Research bonus: large set of games to experiment with

Example Initial Maps

Or how about this?

Need Rapidly Smart AI • Give players a challenging game • Even when the game map can be new each time • Obvious easy to apply approaches • TD Learning • Monte Carlo Tree Search (MCTS • Combinations of these … • E.g. Silver et al, ICML 2008 • Robles et al, CIG 2011

MCTS (see Browne et al, TCIAIG 2012) • Simple algorithm • Anytime • No need for a heuristic value function • E-E balance • Works well across a range of problems

Demo • TDL learns reasonable weights rapidly • How well will this play at 1 ply versus limited toll-out MCTS?

For Strong Play … • Combine MCTS, TDL, N-Tuples

Where to play / buy • Coming to Android (November 2012) • Nestorgames (http://www.nestorgames.com)

MCTS in Real-Time Games: PTSP • Hard to get long-term planning without good heuristics

Optimal TSP order != PTSP Order

MCTS: Challenges and Future Directions • Better handling of problems with continuous action spaces • Some work already done on this • Better understanding of handling real-time problems • Use of approximations and macro-actions • Stochastic and partially observable problems / games of incomplete and imperfect information • Hybridisation: • with evolution • with other tree search algorithms

Conclusions • MCTS: a major new approach to AI • Works well across a range of problems • Good performance even with vanilla UCT • Best performance requires tuning and heuristics • Sometimes the UCT formula is modified or discarded • Can be used in conjunction with RL • Self tuning • And with evolution • E.g. evolving macro-actions

Further reading and links • http://ptsp-game.net/ • http://www.pacman-vs-ghosts.net/

Monte Carlo Tree Search: Insights and Applications BCS Real AI Event