1 / 22

Learning Strategic Features for General Games

Explore the computational study of traditional games, model games in a general game system, generate rulesets, and more with AI self-play testing. Discover the requirements for AI in general game playing and strategies for learning from self-play. Analyze the state of the art in deep learning and explore downsides and improvements. Gain insights into self-play policy learning objectives and features for general games.

kerrie
Download Presentation

Learning Strategic Features for General Games

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Strategic Features for General Games Dennis Soemers June 5, 2019 @DennisSoemers

  2. Digital Ludeme Project • Computational study of traditional games throughout history • Model games in general game system (Ludii) • Generate plausible reconstructions of rulesets • Data-driven • AI self-play to ``play-test’’ generated rulesets

  3. AI Requirements • Play approx. 1000 strategy games • and many more variants… • Need General Game Players! • Ideally strong, human-level AI • Do not need super-human AI • Automated strategy learning • Learning from self-play • Interpretable strategies

  4. General Game Playing (GGP) Repeated X times • Monte Carlo tree search (MCTS) • Prevailing GGP approach • Can be improved with learned policies Selection Play-out Expansion Backpropagation

  5. Policy Learning from Self-play • State of the art: Deep Learning • AlphaGo, AlphaGo Zero, AlphaZero…

  6. Downsides of Deep Learning • Start learning from scratch per game • Difficult for 1000 games • Requires some domain knowledge • One policy output node per action • How many actions possible in <unknown game>? • Difficult for General Game Playing • Expensive

  7. General Game Features • Binary features for state-action pairs • Local patterns • Use underlying graph-representation • Widely applicable • Single format, many games

  8. General Game Features

  9. Which features to use? • Learn features andweightssimultaneously • Start withatomic features • Simple patternswith a single “test” • Combine pairs of features • Maximise correlationwithpolicy’sobjective • Minimise correlationwithconstituents D. J. N. J. Soemers, É. Piette, C. Browne (2019). “Biasing MCTS with Features for General Games”. In 2019 IEEE Congress on EvolutionaryComputation.

  10. Self-play Policy Learning Objectives • Minimise cross-entropybetweenlearned policy and MCTS visitcounts • AlphaGo Zero, AlphaZero, etc. • MCTS is exploratoryby design • Trained policy alsoexploratory!

  11. Self-play Policy Learning Objectives • Do we want ourtrained policy tobeexploratory? • Bias MCTS Selection Yes • Bias MCTS Play-out  Maybe • Interpretlearnedstrategies  No • Usestrategyfor No “game distancefunction” D. J. N. J. Soemers, É. Piette, M. Stephenson, C. Browne (2019). “Learning PoliciesfromSelf-Play with Policy Gradientsand MCTS Value Estimates”. In 2019 IEEE Conference on Games.

  12. Conclusion • Promisingresults on 10 games so far • Alltwo-player, full information, deterministic • Work in progress: • Scaling up to more games • Multi-player, hidden info, nondeterministic, … • Speeding up features • Interpretinglearnedstrategies

  13. Thank you!

  14. AI (UCT) without features in Gomoku

  15. AI (UCT) without features in Gomoku

  16. AI (Biased MCTS) with features in Gomoku

  17. AI (Biased MCTS) with features in Gomoku

  18. Learning Curves (CEC 2019)

  19. Learning Curves - Pruned (CEC 2019)

  20. Learning Curves (COG 2019)

  21. Policy Entropy (COG 2019)

More Related