250 likes | 369 Views
Design of Evaluation Functions using Neural Networks in the Game of Go. Presentation and translation: Hashimoto Tsuyoshi Authors: Hiroyuki Nagayoshi, Masaru Todoroki Department of Quantum Engineering and System Science,School of Engineering, The University of Tokyo. Background.
E N D
Design of Evaluation Functions using Neural Networks in the Game of Go Presentation and translation: Hashimoto Tsuyoshi Authors: Hiroyuki Nagayoshi, Masaru Todoroki Department of Quantum Engineering and System Science,School of Engineering, The University of Tokyo Hashimoto Tsuyoshi
Background • Go is the hardest game for computers as following two reasons. 1. Search space is vast. 2. Difficulty of evaluation functions. We focus on problem 2. Hashimoto Tsuyoshi
Importance of evaluation functions If accurate evaluation functions are made… • It is possible to make strong programs even with shallow search. • Combining with best-first search can make search space smaller. Hashimoto Tsuyoshi
Difficulty of static evaluation functions • Chess・・losses and gains of pieces have so strong correlation with positional judgment that accurate evaluation is possible. • Shogi・・ Thinking losses and gains of pieces, mobility and consistency of castles, accurate evaluation is possible. • Go・・It is hard to evaluate Moyo or influence accurately. Life and death of stones is also difficult for evaluating without search. Hashimoto Tsuyoshi
Current Go evaluation functions • Life and death of stones + influence evaluation for uncertain territory like Moyo or influence is bad. • Learning by neural network it is impossible to learn accurate evaluations because of too many parameters, lack of considering symmetry. Hashimoto Tsuyoshi
The goal As evaluation functions of Go, • Share parameters • Use multi-layer neural network which units are locally connected We show its validity by learning using game records. Hashimoto Tsuyoshi
Characteristic of this network • Connection only with 3 x 3 neighborhood • Equation among the same group • Bypass between each inner layer and input layer Hashimoto Tsuyoshi
Output layer Inner layer Input layer Structure of neural network Probability to be black territory Probability to be white territory presence or absence of black stone presence or absence of white stone
Connection of units • Connect with 36 units (3x3 neighborhoods on under layer and input layer) • Describe influence of stones gradually spreading Inner layer A unit Right under inner layer Input layer Hashimoto Tsuyoshi
Sharing parameters • Sharing parameters by positioning relation between units • Symmetric neural network • 3 categories: • Right under • vertical and horizontal • diagonal Under layer Input layer Hashimoto Tsuyoshi
Sharing parameters • 3 kinds of parameters ( corner, edge, center) • Parameters are independent on board size corner edge center Hashimoto Tsuyoshi
Equation of output among the same group • Stones belonging to the same group • =The same life and death • =The same outputs are desirable • Equation realizes the same outputs! Input position Structure of the group Hashimoto Tsuyoshi
Effect of output equation among the same group • Equation decreases learning errors 3 Without equation With equation 2.5 learning errors 2 1.5 1 0.5 0 0 1 2 3 4 5 Numbers of input layers Hashimoto Tsuyoshi
Training of network • Training of network has been done by self-play learning like TD-learning • No good results The reason・・・programs are too weak? • Here we use game records of professional players! Hashimoto Tsuyoshi
・・・1 ・・・0 Describe of positions at input layer black white Shape input layer input layer Hashimoto Tsuyoshi
・・・1 ・・・0 Training data Black territory White territory Input position Game-end position Teacher data Hashimoto Tsuyoshi
Speed up learning • multi-layer neural network = simple back propagation causes considerably slow learning speed • Here we implement learning by quasi Newton method which is a method for non-linear optimization Hashimoto Tsuyoshi
Effect of quasi Newton method 100 steepest descent method quasi Newton method The quasi Newton method decreases learning errors faster than the steepest descent method 10 learning errors 1 0.1 0 200 400 600 800 1000 Iteration numbers Hashimoto Tsuyoshi
Learning at end positions • 100 end positions extracted from game records, 80 positions are data for training and 20 positions are data for verification • Numbers of inner layers are 1 to 6, an iteration number for learning is 10000 times Hashimoto Tsuyoshi
Results • no over-fitting 7 Learning error Prediction error 6 Errors per a position 5 4 3 2 1 0 0 1 2 3 4 5 6 Numbers of input layers Hashimoto Tsuyoshi
+1 0 -1 Results 2 Inner layers 6 Inner layers Hashimoto Tsuyoshi
Learning of probability to be territories • 50 game records, 30 are for learning, 20 are for verification • Compare estimated probabilities with posterior probability Hashimoto Tsuyoshi
Results 100 100 Statistical probability(%) Statistical probability(%) 80 80 60 60 40 40 20 20 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Predicted probability(%) Predicted probability(%) Data for learning Data for verification Hashimoto Tsuyoshi
Current problems • Assessment of life and death is not proper. One of the reason is too few game records was used for learning. • The number of liberties or eyes may be necessary for the input of network. Hashimoto Tsuyoshi
Summary We proposed a multi-layer neural network evaluation function. The features of our neural network are local connection of its neural units and sharing parameters for considering invariance in Go positions. Using game records, we obtain good learning results for end positions and probability predicting territories. Hashimoto Tsuyoshi