Learning Strategic Features for General Games

Learning Strategic Features for General Games Dennis Soemers June 5, 2019 @DennisSoemers

Digital Ludeme Project • Computational study of traditional games throughout history • Model games in general game system (Ludii) • Generate plausible reconstructions of rulesets • Data-driven • AI self-play to ``play-test’’ generated rulesets

AI Requirements • Play approx. 1000 strategy games • and many more variants… • Need General Game Players! • Ideally strong, human-level AI • Do not need super-human AI • Automated strategy learning • Learning from self-play • Interpretable strategies

General Game Playing (GGP) Repeated X times • Monte Carlo tree search (MCTS) • Prevailing GGP approach • Can be improved with learned policies Selection Play-out Expansion Backpropagation

Policy Learning from Self-play • State of the art: Deep Learning • AlphaGo, AlphaGo Zero, AlphaZero…

Downsides of Deep Learning • Start learning from scratch per game • Difficult for 1000 games • Requires some domain knowledge • One policy output node per action • How many actions possible in <unknown game>? • Difficult for General Game Playing • Expensive

General Game Features • Binary features for state-action pairs • Local patterns • Use underlying graph-representation • Widely applicable • Single format, many games

General Game Features

Which features to use? • Learn features andweightssimultaneously • Start withatomic features • Simple patternswith a single “test” • Combine pairs of features • Maximise correlationwithpolicy’sobjective • Minimise correlationwithconstituents D. J. N. J. Soemers, É. Piette, C. Browne (2019). “Biasing MCTS with Features for General Games”. In 2019 IEEE Congress on EvolutionaryComputation.

Self-play Policy Learning Objectives • Minimise cross-entropybetweenlearned policy and MCTS visitcounts • AlphaGo Zero, AlphaZero, etc. • MCTS is exploratoryby design • Trained policy alsoexploratory!

Self-play Policy Learning Objectives • Do we want ourtrained policy tobeexploratory? • Bias MCTS Selection Yes • Bias MCTS Play-out  Maybe • Interpretlearnedstrategies  No • Usestrategyfor No “game distancefunction” D. J. N. J. Soemers, É. Piette, M. Stephenson, C. Browne (2019). “Learning PoliciesfromSelf-Play with Policy Gradientsand MCTS Value Estimates”. In 2019 IEEE Conference on Games.

Conclusion • Promisingresults on 10 games so far • Alltwo-player, full information, deterministic • Work in progress: • Scaling up to more games • Multi-player, hidden info, nondeterministic, … • Speeding up features • Interpretinglearnedstrategies

Thank you!

AI (UCT) without features in Gomoku

AI (Biased MCTS) with features in Gomoku

Learning Curves (CEC 2019)

Learning Curves - Pruned (CEC 2019)

Learning Curves (COG 2019)

Policy Entropy (COG 2019)

Learning Strategic Features for General Games

Learning Strategic Features for General Games

Presentation Transcript

Games and Strategic Behavior

GENERAL FEATURES

Learning Positional Features for Annotating Chess Games: A Case Study

Learning in games

METAGAMER: An Agent for Learning and Planning in General Games

General Features

Learning Games For Kids

BA 511 Strategic Games

General Features of English for Palestine

General-sum games

Strategic Learning

Games for Learning

General Features of English for Palestine

Free Learning Games for Kids

Strategic Learning

Learning Positional Features for Annotating Chess Games : A Case Study

Modeling transfer of learning in games of strategic interaction

Games for Learning

Some General Features

Games in Strategic Form

Educational Games for kids, Learning games for kids

Learning From Games