Learning in Computer Go

Learning in Computer Go David Silver

The Problem • Large state space • Approximately 10172 states • Game tree of about 10360 nodes • Branching factor of about 200 • Evaluating a position is hard • No good heuristics known • Volatile • Highly non-linear

Four ways to evaluate a position • Don’t even try • Hand-crafted heuristic • Monte Carlo simulation • Learned heuristic

Four choices about learning • What to learn • How to learn • State representation • Knowledge representation

What to learn • Global evaluation function • Shape • Life and death • Connectivity • Eyes

Global evaluation function • Several related concepts • Evaluation function • Heuristic • Value function • What to evaluate • Probability of winning • Expected score • How to evaluate • Sum of point territory estimates • Other approaches?

Shape • Local pattern information • Move recommendations • Learning shape from expert games • Stoutamire, Enderton, Van der Werf, Dahl • Learning shape by RL • NeuroGo v3

Life and Death • Two problems: • Will a group live or die? • Can a group live or die? • Solving the ‘can’ question • Alpha-beta search with learned heuristic [Wolf] • Solving the ‘will’ question • Supervised learning using rich feature set [Werf] • Reinforcement learning, averaged over group [Dahl]

Connectivity • Correlation between two points • Estimate potential groups of stones • Estimate potential regions of empty points • ‘Will connect’ (NeuroGo v3) • Reinforcement learning of local connectivity. • Pathfinding module for global connectivity. • Connectivity map used for learning global evaluation function

What else can we learn? • Eyes • Heuristics for endgame • Many other features…

How to learn • Reinforcement Learning • Supervised Learning • Combined Approaches • Evolutionary Methods

Reinforcement Learning • Temporal Difference Learning • Schraudolph, Dayan, Sejnowski • Enzenberger (NeuroGo) • Dahl (Honte) • Variants of TD() • TD(0) • TD() • TD-leaf() • Training methodology • Self-play • Expert games (Q-learning)

Supervised Learning • Learn to mimic expert play • Expert move as +ve training example • Random move as -ve training example • Need a ranking metric and error function • e.g. Stoutamire, Enderton, Van der Werf, Dahl • Learn from labelled final game positions • e.g. final score, life and death • Data is either noisy or sparse

Combined approaches • Can combine elements of both reinforcement and supervised learning. • e.g. Dahl’s Honte • Search • Local searches for eyes, connections, life and death • Global search using learned territory evaluation • Supervised learning • Local move prediction (shape) • Reinforcement learning • Life and death • Territory

Evolutionary Methods • Evolve a neural network to evaluate game positions • Donnelly, Lubberts, Richards, Rutquist • Evolve rules to match positions [Kojima] • ‘Feed’ rules according to matches • Split successful rules • Weight rules according to success in predicting response • Different kinds of rule • Flexible (production rules) • Fixed (within radius from move) • Semi-fixed (within radius of move, empty points only)

State Representation • Invariances • Graph representations • Feature selection • Dimensionality reduction

Invariances • Go board has many symmetries • Rotational • Reflectional • Colour inversion • Invariant under translation • Edges must be dealt with • Schraudolph, Dayan, Sejnowski

Graph Representations • Connected blocks are also (approximately) invariant. • Graepel’s ‘Common Fate Property’ • Used previously by Baum, Stoutamire, Enzenberger. • Generate a graph between units • Turn connected blocks and empty intersections into nodes • Turn adjacencies between units into edges • Learn on graph representation • Learn relationships between units (NeuroGo v2)

Learning Relations in NeuroGo (v2)

Feature selection • Raw board representation can be enhanced by any number of features • Comparison of important features (Werf) • Most significant: Stones, Liberties, Last Move • Also significant: Edge, Captures, Nearby stones • Trade-off between feature complexity and training time

Feature comparison in NeuroGo (v3)

Dimensionality Reduction • Can use feature extraction techniques • Werf compares a variety of algorithms • PCA performs well all round • Modified Eigenspace Separation Transform does even better • A combination may be best overall

Knowledge Representation • Pattern Databases • Neural Networks • Rules • Decision Trees • Others

Pattern Databases • Successful in commercial games • Can be learned in similar format • Go++ combines handcrafted pattern database and professional shape database (trade secret!)

Neural Networks • Can learn and represent pattern information • Successfully used in practice • Multilayer perceptrons + backpropagation • e.g. Schraudolph, Enzenberger, Werf, Dahl • Variants • Resilient backpropagation (Werf) • Linear architecture (e.g. Werf)

Rules • Horn clauses • Deductive inferencing (Kojima) • Production rules • Evolutionary approach (Kojima)

Decision Trees • Encodes patterns in concise, flexible form • Tilde (Ramon, Blockeel) • Relational representation language • Inductive logic programming • Successfully learns nakade shapes • Learned heuristic compares favourably to GoTools at life and death.

Other representations • Support Vector Machines (Graepel) • Boltzmann Machines (Stern, MacKay)

Conclusions • Common successful ideas • General approach • My approach

Common successful ideas • Global evaluation function • Reinforcement learning • Exploiting invariances • Carefully selected features • Neural network • Local move prediction • Supervised learning • +ve expert move, -ve random move • Neural network • But hasn’t led to a strong Go program

General Approach • There are many different approaches to learning in Go. • Focus on what to learn, and why it will help to play stronger Go. • What do we want to evaluate? • What knowledge do we need? • Which features will help? • Then select appropriate learning algorithms. • How should we train? • How should knowledge be represented?

My Approach • What to learn • Win/lose value function • How to learn • Reinforcement learning • Options • State representation • Predictive state representation • Can/will features • Knowledge representation • Kanerva code (high dimensional patterns) • Linear architecture

Learning in Computer Go

Learning in Computer Go

Presentation Transcript

Nintendo a go-go Computer games: Learning tools for the digital native?

Computer Learning

Learning Computer Basics

Scalable Learning in Computer Vision

Active Learning in Computer Science?

IGB GO —— A self-learning GO program

Computer Enhanced Learning

Computer Learning

Computer Learning

Learning GO

Computer Learning

Computer Learning

LEARNING ON THE GO!

Learning Shape in Computer Go

A Contribution to Reinforcement Learning; Application to Computer Go

Computer Enhanced Learning

Computer Go : A Go player

Learning to Go: Mobile Learning

Learning Shape in Computer Go

Pattern Matching in Computer Go

Computer Learning