Design of Evaluation Functions using Neural Networks in the Game of Go

Design of Evaluation Functions using Neural Networks in the Game of Go Presentation and translation: Hashimoto Tsuyoshi Authors: Hiroyuki Nagayoshi, Masaru Todoroki Department of Quantum Engineering and System Science,School of Engineering, The University of Tokyo Hashimoto Tsuyoshi

Background • Go is the hardest game for computers as following two reasons. 1. Search space is vast. 2. Difficulty of evaluation functions. We focus on problem 2. Hashimoto Tsuyoshi

Importance of evaluation functions If accurate evaluation functions are made… • It is possible to make strong programs even with shallow search. • Combining with best-first search can make search space smaller. Hashimoto Tsuyoshi

Difficulty of static evaluation functions • Chess・・losses and gains of pieces have so strong correlation with positional judgment that accurate evaluation is possible. • Shogi・・ Thinking losses and gains of pieces, mobility and consistency of castles, accurate evaluation is possible. • Go・・It is hard to evaluate Moyo or influence accurately. Life and death of stones is also difficult for evaluating without search. Hashimoto Tsuyoshi

Current Go evaluation functions • Life and death of stones + influence evaluation for uncertain territory like Moyo or influence is bad. • Learning by neural network it is impossible to learn accurate evaluations because of too many parameters, lack of considering symmetry. Hashimoto Tsuyoshi

The goal As evaluation functions of Go, • Share parameters • Use multi-layer neural network which units are locally connected We show its validity by learning using game records. Hashimoto Tsuyoshi

Characteristic of this network • Connection only with 3 x 3 neighborhood • Equation among the same group • Bypass between each inner layer and input layer Hashimoto Tsuyoshi

Output layer Inner layer Input layer Structure of neural network Probability to be black territory Probability to be white territory presence or absence of black stone presence or absence of white stone

Connection of units • Connect with 36 units (3x3 neighborhoods on under layer and input layer) • Describe influence of stones gradually spreading Inner layer A unit Right under inner layer Input layer Hashimoto Tsuyoshi

Sharing parameters • Sharing parameters by positioning relation between units • Symmetric neural network • 3 categories: • Right under • vertical and horizontal • diagonal Under layer Input layer Hashimoto Tsuyoshi

Sharing parameters • 3 kinds of parameters ( corner, edge, center) • Parameters are independent on board size corner edge center Hashimoto Tsuyoshi

Equation of output among the same group • Stones belonging to the same group • =The same life and death • =The same outputs are desirable • Equation realizes the same outputs! Input position Structure of the group Hashimoto Tsuyoshi

Effect of output equation among the same group • Equation decreases learning errors 3 Without equation With equation 2.5 learning errors 2 1.5 1 0.5 0 0 1 2 3 4 5 Numbers of input layers Hashimoto Tsuyoshi

Training of network • Training of network has been done by self-play learning like TD-learning • No good results The reason・・・programs are too weak? • Here we use game records of professional players! Hashimoto Tsuyoshi

・・・１ ・・・０ Describe of positions at input layer black white Shape input layer input layer Hashimoto Tsuyoshi

・・・1 ・・・0 Training data Black territory White territory Input position Game-end position Teacher data Hashimoto Tsuyoshi

Speed up learning • multi-layer neural network = simple back propagation causes considerably slow learning speed • Here we implement learning by quasi Newton method which is a method for non-linear optimization Hashimoto Tsuyoshi

Effect of quasi Newton method 100 steepest descent method quasi Newton method The quasi Newton method decreases learning errors faster than the steepest descent method 10 learning errors 1 0.1 0 200 400 600 800 1000 Iteration numbers Hashimoto Tsuyoshi

Learning at end positions • 100 end positions extracted from game records, 80 positions are data for training and 20 positions are data for verification • Numbers of inner layers are 1 to 6, an iteration number for learning is 10000 times Hashimoto Tsuyoshi

Results • no over-fitting 7 Learning error Prediction error 6 Errors per a position 5 4 3 2 1 0 0 1 2 3 4 5 6 Numbers of input layers Hashimoto Tsuyoshi

+1 0 -1 Results 2 Inner layers 6 Inner layers Hashimoto Tsuyoshi

Learning of probability to be territories • 50 game records, 30 are for learning, 20 are for verification • Compare estimated probabilities with posterior probability Hashimoto Tsuyoshi

Results 100 100 Statistical probability(%) Statistical probability(%) 80 80 60 60 40 40 20 20 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Predicted probability(%) Predicted probability(%) Data for learning Data for verification Hashimoto Tsuyoshi

Current problems • Assessment of life and death is not proper. One of the reason is too few game records was used for learning. • The number of liberties or eyes may be necessary for the input of network. Hashimoto Tsuyoshi

Summary We proposed a multi-layer neural network evaluation function. The features of our neural network are local connection of its neural units and sharing parameters for considering invariance in Go positions. Using game records, we obtain good learning results for end positions and probability predicting territories. Hashimoto Tsuyoshi

Design of Evaluation Functions using Neural Networks in the Game of Go

Design of Evaluation Functions using Neural Networks in the Game of Go

Presentation Transcript

The Game of Go

Image Compression Using Neural Networks

Classification Using Neural Networks

Using Matlab Neural Networks Toolbox

Challenges in the Design and Evaluation of Content -Centric Networks

Functions of Game Engines

FINANCIAL FORECASTING USING NEURAL NETWORKS

The Game of Go

Part-Of-Speech Tagging using Neural Networks

The Game of Go

THE GAME OF GO

Character Recognition Using Neural Networks

Characterisation of Membrane Fouling Using Neural Networks

The Game of Go

Solution of ODE/PDE’s using RBF Neural Networks

Performance of Neural Networks

Diagnosis of hyperglycemia using Artificial Neural Networks

The Game of Go

CSC2535: Computation in Neural Networks Lecture 1: The history of neural networks

Dimensions of Neural Networks

Design of Evaluation Functions using Neural Networks in the Game of Go

Robust Neural Networks using Motes