220 likes | 238 Views
Explore the computational study of traditional games, model games in a general game system, generate rulesets, and more with AI self-play testing. Discover the requirements for AI in general game playing and strategies for learning from self-play. Analyze the state of the art in deep learning and explore downsides and improvements. Gain insights into self-play policy learning objectives and features for general games.
E N D
Learning Strategic Features for General Games Dennis Soemers June 5, 2019 @DennisSoemers
Digital Ludeme Project • Computational study of traditional games throughout history • Model games in general game system (Ludii) • Generate plausible reconstructions of rulesets • Data-driven • AI self-play to ``play-test’’ generated rulesets
AI Requirements • Play approx. 1000 strategy games • and many more variants… • Need General Game Players! • Ideally strong, human-level AI • Do not need super-human AI • Automated strategy learning • Learning from self-play • Interpretable strategies
General Game Playing (GGP) Repeated X times • Monte Carlo tree search (MCTS) • Prevailing GGP approach • Can be improved with learned policies Selection Play-out Expansion Backpropagation
Policy Learning from Self-play • State of the art: Deep Learning • AlphaGo, AlphaGo Zero, AlphaZero…
Downsides of Deep Learning • Start learning from scratch per game • Difficult for 1000 games • Requires some domain knowledge • One policy output node per action • How many actions possible in <unknown game>? • Difficult for General Game Playing • Expensive
General Game Features • Binary features for state-action pairs • Local patterns • Use underlying graph-representation • Widely applicable • Single format, many games
Which features to use? • Learn features andweightssimultaneously • Start withatomic features • Simple patternswith a single “test” • Combine pairs of features • Maximise correlationwithpolicy’sobjective • Minimise correlationwithconstituents D. J. N. J. Soemers, É. Piette, C. Browne (2019). “Biasing MCTS with Features for General Games”. In 2019 IEEE Congress on EvolutionaryComputation.
Self-play Policy Learning Objectives • Minimise cross-entropybetweenlearned policy and MCTS visitcounts • AlphaGo Zero, AlphaZero, etc. • MCTS is exploratoryby design • Trained policy alsoexploratory!
Self-play Policy Learning Objectives • Do we want ourtrained policy tobeexploratory? • Bias MCTS Selection Yes • Bias MCTS Play-out Maybe • Interpretlearnedstrategies No • Usestrategyfor No “game distancefunction” D. J. N. J. Soemers, É. Piette, M. Stephenson, C. Browne (2019). “Learning PoliciesfromSelf-Play with Policy Gradientsand MCTS Value Estimates”. In 2019 IEEE Conference on Games.
Conclusion • Promisingresults on 10 games so far • Alltwo-player, full information, deterministic • Work in progress: • Scaling up to more games • Multi-player, hidden info, nondeterministic, … • Speeding up features • Interpretinglearnedstrategies