1 / 25

Design of Evaluation Functions using Neural Networks in the Game of Go

Design of Evaluation Functions using Neural Networks in the Game of Go. Presentation and translation: Hashimoto Tsuyoshi Authors: Hiroyuki Nagayoshi, Masaru Todoroki Department of Quantum Engineering and System Science,School of Engineering, The University of Tokyo. Background.

todd
Download Presentation

Design of Evaluation Functions using Neural Networks in the Game of Go

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design of Evaluation Functions using Neural Networks in the Game of Go Presentation and translation: Hashimoto Tsuyoshi Authors: Hiroyuki Nagayoshi, Masaru Todoroki Department of Quantum Engineering and System Science,School of Engineering, The University of Tokyo Hashimoto Tsuyoshi

  2. Background • Go is the hardest game for computers as following two reasons. 1. Search space is vast. 2. Difficulty of evaluation functions. We focus on problem 2. Hashimoto Tsuyoshi

  3. Importance of evaluation functions If accurate evaluation functions are made… • It is possible to make strong programs even with shallow search. • Combining with best-first search can make search space smaller. Hashimoto Tsuyoshi

  4. Difficulty of static evaluation functions • Chess・・losses and gains of pieces have so strong correlation with positional judgment that accurate evaluation is possible. • Shogi・・ Thinking losses and gains of pieces, mobility and consistency of castles, accurate evaluation is possible. • Go・・It is hard to evaluate Moyo or influence accurately. Life and death of stones is also difficult for evaluating without search. Hashimoto Tsuyoshi

  5. Current Go evaluation functions • Life and death of stones + influence evaluation for uncertain territory like Moyo or influence is bad. • Learning by neural network it is impossible to learn accurate evaluations because of too many parameters, lack of considering symmetry. Hashimoto Tsuyoshi

  6. The goal As evaluation functions of Go, • Share parameters • Use multi-layer neural network which units are locally connected We show its validity by learning using game records. Hashimoto Tsuyoshi

  7. Characteristic of this network • Connection only with 3 x 3 neighborhood • Equation among the same group • Bypass between each inner layer and input layer Hashimoto Tsuyoshi

  8. Output layer Inner layer Input layer Structure of neural network Probability to be black territory Probability to be white territory presence or absence of black stone presence or absence of white stone

  9. Connection of units • Connect with 36 units (3x3 neighborhoods on under layer and input layer) • Describe influence of stones gradually spreading Inner layer A unit Right under inner layer Input layer Hashimoto Tsuyoshi

  10. Sharing parameters • Sharing parameters by positioning relation between units • Symmetric neural network • 3 categories: • Right under • vertical and horizontal • diagonal Under layer Input layer Hashimoto Tsuyoshi

  11. Sharing parameters • 3 kinds of parameters ( corner, edge, center) • Parameters are independent on board size corner edge center Hashimoto Tsuyoshi

  12. Equation of output among the same group • Stones belonging to the same group • =The same life and death • =The same outputs are desirable • Equation realizes the same outputs! Input position Structure of the group Hashimoto Tsuyoshi

  13. Effect of output equation among the same group • Equation decreases learning errors 3 Without equation With equation 2.5 learning errors 2 1.5 1 0.5 0 0 1 2 3 4 5 Numbers of input layers Hashimoto Tsuyoshi

  14. Training of network • Training of network has been done by self-play learning like TD-learning • No good results The reason・・・programs are too weak? • Here we use game records of professional players! Hashimoto Tsuyoshi

  15. ・・・1 ・・・0 Describe of positions at input layer black white Shape input layer input layer Hashimoto Tsuyoshi

  16. ・・・1 ・・・0 Training data Black territory White territory Input position Game-end position Teacher data Hashimoto Tsuyoshi

  17. Speed up learning • multi-layer neural network = simple back propagation causes considerably slow learning speed • Here we implement learning by quasi Newton method which is a method for non-linear optimization Hashimoto Tsuyoshi

  18. Effect of quasi Newton method 100 steepest descent method quasi Newton method The quasi Newton method decreases learning errors faster than the steepest descent method 10 learning errors 1 0.1 0 200 400 600 800 1000 Iteration numbers Hashimoto Tsuyoshi

  19. Learning at end positions • 100 end positions extracted from game records, 80 positions are data for training and 20 positions are data for verification • Numbers of inner layers are 1 to 6, an iteration number for learning is 10000 times Hashimoto Tsuyoshi

  20. Results • no over-fitting 7 Learning error Prediction error 6 Errors per a position 5 4 3 2 1 0 0 1 2 3 4 5 6 Numbers of input layers Hashimoto Tsuyoshi

  21. +1 0 -1 Results 2 Inner layers 6 Inner layers Hashimoto Tsuyoshi

  22. Learning of probability to be territories • 50 game records, 30 are for learning, 20 are for verification • Compare estimated probabilities with posterior probability Hashimoto Tsuyoshi

  23. Results 100 100 Statistical probability(%) Statistical probability(%) 80 80 60 60 40 40 20 20 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Predicted probability(%) Predicted probability(%) Data for learning Data for verification Hashimoto Tsuyoshi

  24. Current problems • Assessment of life and death is not proper. One of the reason is too few game records was used for learning. • The number of liberties or eyes may be necessary for the input of network. Hashimoto Tsuyoshi

  25. Summary We proposed a multi-layer neural network evaluation function. The features of our neural network are local connection of its neural units and sharing parameters for considering invariance in Go positions. Using game records, we obtain good learning results for end positions and probability predicting territories. Hashimoto Tsuyoshi

More Related