1 / 45

IGB GO —— A self-learning GO program

IGB GO —— A self-learning GO program. Lin WU Information & Computer Science University of California, Irvine. Outline. Background: What is GO? Existing GO programs IGB GO Past work: Three past scenarios Present scenario Discussion Conclusion Demon. What is GO.

marcel
Download Presentation

IGB GO —— A self-learning GO program

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IGB GO—— A self-learning GO program Lin WU Information & Computer Science University of California, Irvine

  2. Outline • Background: • What is GO? • Existing GO programs • IGB GO • Past work: • Three past scenarios • Present scenario • Discussion • Conclusion • Demon. Lin WU, lwu@ics.uci.edu

  3. What is GO • Black and white player play alternatively. • Black plays first. • Basic concepts: • Liberty • Eye • Territory • Unconditional live • Position Lin WU, lwu@ics.uci.edu

  4. What is GO (cont.) • Rules • Stone(s) are captured, if the liberty becomes 0. • Captured stones are removed from board • Winner is determined by counting the territory Lin WU, lwu@ics.uci.edu

  5. Existing GO programs • There are many existing GO programs • KCC Igo • HARUKA • Go++ • Goemate • Hand talk • The Many Faces of Go: www.smart-games.com • GNU GO: www.gnu.org/software/gnugo/gnugo.html • NeuroGo: www.markus-enzenberger.de/neurogo.html • etc. • None of them can beat average amateur players. Lin WU, lwu@ics.uci.edu

  6. Conceptual Architecture • Pattern libraries: • Library for opening • Library of corner • Library for the internal part of the board • Libraries for attack, defense, connection, etc. • Engine: match the board position against the libraries • Evaluation: determine the best, if there are multiple hits Lin WU, lwu@ics.uci.edu

  7. Architecture I • The many faces of GO (1981- now) • Knowledge Representation in The Many Faces of Go, David Fotland, February 27, 1993 • Joseki database of standard corner patterns (36,000 moves) • a pattern database of 8x8 patterns (4,000 moves) • a rule based expert system with about 200 rules that suggests plausible moves for full board evaluation Lin WU, lwu@ics.uci.edu

  8. Architecture II • GNU GO (1989.3 – now) • GNU GO documentation • Pattern libraries • General: patterns.db, patterns2.db • Fuseki (opening): fuseki.db • Eyes: eyes.db • Connection: conn.db • Influence: influence.db, barriers.db • Etc • GNU Go engine: calculate states of different level, pattern matching, move reasoning. Lin WU, lwu@ics.uci.edu

  9. Why pattern based system? • Simple rules doesn’t mean simple game • Simple rules means extremely huge searching space • Board evaluation is hard, especially in the middle of the game • The representation space is extremely huge • The evaluation function is sensitive to small difference of input • Result: to get reliable evaluation results, the level of search have to be very high • Pattern based system • Avoid search by pattern matching Lin WU, lwu@ics.uci.edu

  10. Complexity —— Search time Lin WU, lwu@ics.uci.edu

  11. Problems of pattern based system • Everything is manual work • As system become larger, it’s harder to improve the pattern database. • As database becomes larger, more likely to be inconsistent. • Results: • Performance improves slower as the performance becomes better. Lin WU, lwu@ics.uci.edu

  12. Outline • Background: • What is GO? • Existing GO programs • IGB GO • Past work: • Three past scenarios • Present scenario • Discussion • Conclusion • Demon. Lin WU, lwu@ics.uci.edu

  13. IGB GO • http://contact.ics.uci.edu/go.html • A GO program which can improve its performance automatically • How? • Use artificial neural networks to learn the evaluation function. • Improving the quality of the neural networks by improving the quality of training data. Lin WU, lwu@ics.uci.edu

  14. Architecture of the neural networks • 6 planes • 1 input plane • 1 output plane • 4 transmission • Use recurrent neural network to learn two functions Lin WU, lwu@ics.uci.edu

  15. How to improving the training data • Initiate a group of neural networks • Let neural networks play against each other • Identify the set of good moves • Train neural networks over those good moves • Repeat 2. Lin WU, lwu@ics.uci.edu

  16. Two key issues of this system • Given the neural networks, how to identify “the good moves” • Given the good moves, how to improve neural networks’ performance efficiently Lin WU, lwu@ics.uci.edu

  17. Outline • Background: • What is GO? • Existing GO programs • IGB GO • Past work: • Three past scenarios • Present scenario • Discussion • Conclusion • Demon. Lin WU, lwu@ics.uci.edu

  18. Play against itself • Randomly initiate a neural network • The neural network plays against itself over a set of initial setups. • If black(or white) wins, learn the black(or white) moves. • Update weights, repeat 2. Lin WU, lwu@ics.uci.edu

  19. Play against itself— Good move identification • Win: the color who gets larger territory • Good moves: all the moves played by wining color Lin WU, lwu@ics.uci.edu

  20. Play against itself — Results • Results • First, improve • Then, begin to get worse • Last, learn a very deterministic and bad pattern • Improvement: No guarantee. Lin WU, lwu@ics.uci.edu

  21. Group playing • Initiate a group of neural networks (18) • Randomly assign a neural network to another as a pair. • Members in a pair play against each other • Identify the set of good moves • Train the loser neural networks over those good moves • Repeat 2. Lin WU, lwu@ics.uci.edu

  22. Group playing— Good move identification • Each pair has two players (A and B) • Game1: A plays black, B plays white, get a result R1 • Game2: B plays black, A plays white, get a result R2 • If R1 > R2, then A is better player. B is the loser. So B learn all the moves played by A. Lin WU, lwu@ics.uci.edu

  23. Group playing — Results • Results • Improve at beginning. • If a player dominates, the whole system degrades as “play against itself”. • No indication of converge till now. (9 machines, 1 month on 9 by 9 board) • Improvement: No guarantee. Lin WU, lwu@ics.uci.edu

  24. ABC scenario • Initiate a group of neural networks • Randomly assign three different neural networks (A,B,C) in a group • Let A and B play against each other • Identify the set of good moves • Train neural networks over those good moves • Repeat 2. Lin WU, lwu@ics.uci.edu

  25. ABC scenario — Good move identification • For a given pair with player A and player B • Suppose B is the loser. • Randomly assign a teacher C • C will tell B, what move C will make for every B’s turn • C’s suggested move is the same as that of B • C’s suggested move is different from B • Based on C’s suggest move, A play with B again • Better: understandable good move • The same • Worse • The set of good moves is all the understandable good moves Lin WU, lwu@ics.uci.edu

  26. ABC scenario — Results • Results • It took 1 week to get a best player from 3 randomly initialized players • The best player was beaten by another randomly initialized player. • The speed of improving became slower as the performance increased. • Improvement: guarantee. • Training Speed: unacceptable slow Lin WU, lwu@ics.uci.edu

  27. Present scenario • Output representation: • Two papers: • Temporal Difference Learning of Position Evaluation in the Game of Go, Nicol N. Schraudolph, Peter Dayan, and Terrence J. Sejnowski, Advances in Neural Information Processing 6, 1994 • Learning to evaluate GO positions via temporal difference methods, Nicol N. Schraudolph, Peter Dayan, and Terrence J. Sejnowski, Soft Computing Techniques in Game Playing, 2000 • Each intersection has an output: real number [0,1] • The likelihood to make a move => the likelihood of securing that intersection as black territory at the end of the game. • Reinforcement learning • Good move identification: reinforcement learning identify good moves automatically Lin WU, lwu@ics.uci.edu

  28. Present scenario — Results • Improvement: guarantee. • Training Speed: better than ABC scenario, but still slow • Results • 5x5: • 3 - 4 hours training: beat random player 100% • 1 - 2 weeks (168-336 h): comparable to GNUGO • Prediction accuracy is >90% after the board is occupied >50% • 7x7: after 1 month of training, GNUGO beats it without any difficulty Lin WU, lwu@ics.uci.edu

  29. Outline • Background: • What is GO? • Existing GO programs • IGB GO • Past work: • Three past scenarios • Present scenario • Discussion • Conclusion • Demon. Lin WU, lwu@ics.uci.edu

  30. Old architecture Target is inconsistent Target is harder to learn, spatial complexity 325 / 8 (105911076180.375) for 5x5 Quality of training data is bad New architecture Target is consistent, and at the end of the game, it’s true target. Target correlates mainly to local information, so the complexity should be much less than 325 / 8 Quality of training data is determined by the neural network itself. Why better results Lin WU, lwu@ics.uci.edu

  31. Is present arch. enough — search time complexity Lin WU, lwu@ics.uci.edu

  32. Known Problems • Intrinsic hard problems: • No complexity bounds for the number of iterations to get a better player • Representation space is extremely huge Lin WU, lwu@ics.uci.edu

  33. Known Problems — Technical • Temporary technical problems: • Lack position-level evaluation method • Unable to respond to some unusual cases correctly • Unable to AUTOMATICALLY identify the unusual cases, which will cause problems • Time complexity per iteration: • Play a match: O(n6W) • Learn a match: O(n4W) for TD0, O(n6W) for Q-Learning • (19/5)6 = 3011 Lin WU, lwu@ics.uci.edu

  34. Bounds for iteration • Maybe exponential • Observation: • Human being: the complexity increases as the level of player increases. • Present implementation: same as above • Important to know • How fast the complexity increases, as the level of player increases? Lin WU, lwu@ics.uci.edu

  35. The complexity could be exponential • Suppose, one player dominate the whole system, or a small group of players dominate the whole system • How much time is needed for obtaining a better new player or a better group? • Repeat the experiment, with the same amount of time, there is a 50% chance to get a better one, due to the symmetry • At least exponential to 2. Lin WU, lwu@ics.uci.edu

  36. Position-level performance evaluation • With it • Study the iteration bounds empirically • The evaluation results can be used to find good tradeoff between performance and searching space • Without it • Every method is trial and error, but there exists infinite number of potential methods to try. Lin WU, lwu@ics.uci.edu

  37. Time complexity per iteration • Separate “play” and “learn” • A database of training data • Training data: • Best players play against each other • Online server • Manually find ways to beat the best player. • All players learn the generated training data Lin WU, lwu@ics.uci.edu

  38. Unusual move identification • Difficulty • Search space is huge  Hard to identify automatically • Possible solution • Use database to record all such moves, once they appear • Can be implemented the same as training database Lin WU, lwu@ics.uci.edu

  39. Why it’s so hard • No method touches the tough problem explicitly. • Key problems: • extremely huge searching space • hard to evaluate positions • Present strategy is to reduce the searching space by improve evaluation function. Lin WU, lwu@ics.uci.edu

  40. Why it’s so hard (cont.) • Reinforcement learning may not be enough • Nicol N. Schraudolph, 6 years without any observable progress • Arthur Samuel, “no progress has been made in overcoming [this defect]” (11 years, 1956-1967) (Blondie24, p146-147) • Neural network may not learn • Why? Representation space is huge even for the last move • 90% occupied, 9x9 board, equal number of black and white • Solution • Generalization ability • Automatically identify features Lin WU, lwu@ics.uci.edu

  41. == ? Lesson I • Ability to improve The best? • The speed of improving: • 5x5: 3 - 4 hours training to beat random 1 - 2 weeks (168-336 h) to be comparable to GNUGO • 7x7: after 1 month of training, GNUGO is still able to win. Lin WU, lwu@ics.uci.edu

  42. == ? Lesson II Deterministic function between input and output Neural network can learn it without any difficulty • No • The intrinsic complexity of the function • Neural network can only learn the correlation between the input and the output, as a result of hill climbing Lin WU, lwu@ics.uci.edu

  43. Conclusion • A self-learning GO program is possible but exists several technically difficult problems • Automatic feature discovery • Automatic learning from failure • Position-level performance evaluation Lin WU, lwu@ics.uci.edu

  44. Demon. http://contact.ics.uci.edu/go.html Lin WU, lwu@ics.uci.edu

  45. Thanks for coming Lin WU, lwu@ics.uci.edu

More Related