140 likes | 153 Views
Honte, a Go-Playing Program Using Neural Nets. Frederik Dahl. Combined approach. Supervised learning Shape evaluation Reinforcement learning Group safety Territory Heuristic evaluation Influence Search Capture Connectivity Life and death. Architecture.
E N D
Honte, a Go-Playing Program Using Neural Nets Frederik Dahl
Combined approach • Supervised learning • Shape evaluation • Reinforcement learning • Group safety • Territory • Heuristic evaluation • Influence • Search • Capture • Connectivity • Life and death
Shape evaluation: Multilayer perceptron • 190 inputs • Receptive field of radius 3 • Distance to edge • Liberties • Captured stones • 50 hidden nodes • Single output • Will an expert play here?
Shape evaluation:Training and performance • Trained on 400 expert games • Expert move used as positive example (+1) • Random legal move as negative example (0) • Error backpropagation • error = target - eval • Performance measured by treating prediction as evaluation function • What percentage of legal moves are ranked below the expert move?
Local search • Selective search for local goals • Capture • Connectivity • Life and death • Only considers moves suggested by shape evaluating network • Deep and narrow search • Captures common-sense knowledge
Group safety evaluation:Multilayer perceptron • Groups defined by connectable blocks • 13 inputs • Number of stones in group • Number of liberties in group • Number of proven eyes • Average opponent influence over liberties • 20 hidden nodes • 1 output • Probability of group survival
Group safety evaluation:Temporal difference learning • Trained by self-play • Reward signal for the group is the average final safety of stones • 0 = captured • 1 = survived • TD(0) is used, replaying games backwards • Very simple idea: • error = eval(next) - eval(now)
Influence evaluation • Consider random walks from an intersection • How likely to end up at a black or white stone? • Can also take account of group safety estimates
Territory evaluation • Another multilayer perceptron • 4 Inputs • Revised influence (for both sides) • Distance from edge • 10 hidden nodes • 1 output • Predicted territory value • Trained by TD(0) using eventual territory value as reward signal
Playing strength • Playing 19x19 Go • Approximately even against Handtalk 97-06e • Wins more than 50% against Ego 1.0 • Weaknesses • Confuses group safety with group strength • Has no concept of the aji of a group
New version of WinHonte 1.03 Neural net to evaluate sente/gote Trial version available online! Recent work
Conclusions • Go knowledge can be learned • Combining different forms of knowledge can be a good idea • Multilayer perceptrons provide a flexible representation • Local search can be used effectively as input features for learning