1 / 81

Evolution and Coevolution of ANNs playing Go

This article explores the evolution and coevolution of artificial neural networks (ANNs) playing the strategic board game of Go. It discusses the experimental setup, training of Go-playing ANNs, and the evolution of these systems. The game of Go, its rules, and its complexity are also covered.

tspooner
Download Presentation

Evolution and Coevolution of ANNs playing Go

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evolution and Coevolution of ANNs playing Go Peter Mayer, 2004

  2. Outline • Computers and Games • The game of Go • Experimental Setup • Training of Go playing ANNs • Evolution of Go Playing ANNs • Summary and Outlook

  3. Games • Algorithms designed since AIs onset • Clearly defined rules • Still complex • Chess received the most attention • More researched than Go • Two main approaches • Rely on expertise – directly programmed weighted features; Extensive knowledge • Use evolution – less knowledge; more versatility

  4. The game of Go • Oldest (unaltered) strategic board game in the world • 10,000,000 players in Japan “alone” • Fairly simple rules • BUT difficult to master • Immense tree (~200 opts) • Complex structures • Many concurrent goals

  5. Go Rules • 19x19 board • Empty in the beginning • Black & White “stones” • Black starts • Each turn • Place 1 stone • At an intersection • Never move stones • OR pass

  6. Go Rules [2] • Objective - Get the most points ! • Points are acquired by: • Securing Territories • Capturing opp’s pieces

  7. Go Rules [3] • Stones at a vertically or horizontally adjacent intersection are called a group • An empty intersection adjacent to a stone or group is called a "liberty" of that group • 1 Liberty = group in “atari” • No liberties -> CAPTURE ! Group is removed • Example – Black places stone in X resulting in right figure

  8. Go Rules [4] • Stones can be placed anywhere, but cannot commit suicide (except Chinese) • Legal if stone simultaneously captures opponent’s group (2 right figures) Suicide – white cannot place at X White CAN place at X Result: capture

  9. Go Rules [5] • Same position cannot occur more than once • Endless repetitions: • Black can capture at upper figure by placing at X • White - same by placing at Y • Black – repeat… • Ko rule • White may not place at Y before playing somewhere else first • Avoid any repetitions

  10. Go Rules – Live and Dead groups • “Dead” groups if impossible to prevent capture • It is not necessary to do so • Group remains on board • At end of game, removed and added to captured stones • “Living” groups are impossible to capture • Group with 2 “eyes” – even if white surrounds it, playing at X or Y is suicide • Opponent must play elsewhere

  11. Go Basics – End game • Play continues until both players pass • Players then alternatively play stones at “neutral” points – adjacent to both White and Black • Also known as “dame” (DAH-MAY) • Dead stones are removed from the board and counted with other prisoners (1 point per prisoner) • Also - 1 point for each intersection surrounded by player’s stones (“territory”)

  12. Go Basics – End game example • Prisoners were removed already • All 4 points marked X are dame – worthless • Black has • 7 points in UR (territory); 2 points in LL • 1 removed prisoner • TOTAL = 10 points • White has • 5 in UL; 2 in LR • 2 prisoners • TOTAL = 9 points • Black wins unless komi (5.5 pts compensation) is due

  13. Ranking and Handicaps • Determine Go players’ strength • Resemblance to martial arts • Both amateur and professional ranking system • Amateur • 35 kyu to 1 kyu • THEN 1 dan to 7 dan • Pro • 1 dan to 9 dan • Awarded only by Go institutions • Pro dans are much stronger than amateur dans

  14. Ranking and Handicaps (2) • Handicaps • Weaker player starts with several stones on the board • Placed at specific places • Helps make games more even • Difference in ranks ~ number of handicap stones needed to win • 2 stones to even 2 dan against 4 dan • 4 to even 3 kyu and 2 dan • The most powerful Go programs reach only … • … 10 kyu!

  15. Outline • Computers and Games • The game of Go • Experimental Setup • Training of Go playing ANNs • Evolution of Go Playing ANNs • Summary and Outlook

  16. Experimental Setup • Opponent Go players • ANN player • Go board (input) representations • Move (output) representations • Coevolution • Hall of Fame coevolution • Cultural coevolution • General evolution setup

  17. Go Players - Random • No strategy • Pass move also • “Knows” only the rules of go • legality of moves • Usually weakest opponent

  18. Go Players – Naïve Player • Roughly human-beginner level • Able to save and capture stones • Knows about • Lost stones • Saving - connecting stones to living groups • Weak stones (not savable)

  19. Go Players – Naïve Strategy • A subset of JaGo’s (main opponent) strategy • Outline (arranged by priority): • Attempt to save • Try to put opponent into atari • Connect weak stones • Capture opponent groups in atari • Check intersections for placing stones • In random order • Make sure no (own) liberties decrease below 2 as a result • Perform Random move

  20. Go Players – JaGo Player • Java based program • Best computer player used • Not a strong player ~16 kyu • Knows standard techniques • Mainly save & capture • Uses pattern matching • Looks at entire board • 32 patterns, with rotations and mirrors

  21. Go Players – JaGo Strategy (1) • Save stones in atari • Try to decrease liberties of large groups • Find own savable larger groups • Attack opponent’s groups (decreasing order:) • With 2 or more liberties and attackable • With 2 or more stones & less than 3 liberties • With 2 or less liberties

  22. Go Players – JaGo Strategy (2) • Save own groups with few liberties if savable • Start pattern matching – Response; Center • Random move order • Seek opponent’s groups to capture in 2 moves • Perform random move which isn’t of a bad pattern • Capture opponent’s single liberties • Connect own weak stones • PASS

  23. Go Players – JaGo Patterns (1)

  24. Go Players – JaGo Patterns (2)

  25. Go Players – GNU Go • Advantages • 5x5 to 19x19 boards • Handles handicaps well • Rated 10 kyu • Problems • 5x5 solved – open an C3 for 18.5 points (komi=5.5) – always wins in Black • GNU Go passes on B3, C2-4, D3 (only correct at C3) • Premature convergence of evolution

  26. ANN Player • Inform ANN about actual position • Evaluate ANN output to receive next move • Representation is important! • Intention maps • For each Go move (including PASS) – value between [0,1] • High value – high intention to make move (and v.v) • Select legal move with highest value To avoid predictability – consider sub optimal moves also (“creativity factor”)

  27. Player Strength • Commonly to receive a rating unrated Go players play against rated players (same in Chess) • The strength s of a player is determined by • The score of 1000 double games • Against each of 3 opponents: R, N, JaGo • Divided by the number of games (6,000) • 1 is perfect strength • 3 opponents help resist over-fitting

  28. Player Competence • Strength is not understanding of rules (legality) • E.g. 2 players receive same score but only one always tried legal moves first • The competence C of a player is defined as follows: • bi = games; i = moves; tij = #tried illegal moves; kij = #possible illegal move • C is the averaged on all games

  29. Board Representations • 19x19 boards • far too large • Even for evolved agents • Use only 5x5 boards

  30. Board Representations • Should preprocess position to make ANNs life easier • Tested in training experiments • Standard Input Representation (SIR) • 2 neurons at each intersection :- • 1 per player’s piece; 1 per opponent’s • No distinction between B and W stones • Optional – 1 neuron to tell if B or W • (2*b^2) neurons (were b is board size) = 50

  31. Representations - NIR • Naïve Input Representation • More compact • 1 neuron per intersection • Set to -1 (player’s stone) or 1 (opponent’s) • 0 if empty • Uses half of SIRs neurons = 25

  32. Representations - LVIR • Limited View Input Representation • Splits the Go board into several quadratic areas of size 3x3 • Idea – simplest way of capturing stones works within this area • E.g. capture of 1 stone by surrounding it • Areas overlap at middle row and middle column • Coding – similar to SIR • w is number of areas (=4) • 72 Neurons • Could also be Naïve

  33. Clever Representations • Based on image processing and circuits • We want less raw inputs to allow ANN to concentrate more on features • Manhattan distance • Used in integrated circuits where wires run parallel to X or Y axis • Got its name from Manhattan NY, where streets are aligned in grid • P1 = (x1, x2) • P2 = (y1, y2)

  34. Clever Representations • Manhattan distance is related to distance of Go stones (no diagonals) • distance = [#(separating stones) – 1] • 1 if next to each other • 2 if separated by one stone • 3 for knight’s move or two separating stones

  35. Representations: c-o-Matrix • Co-occurrence-matrices • Used in image processing • Many parameters are derived from it • Mean, Sd, energy, contrast, homogeneity, … • Quadratic • Based on a relation p between image positions (symmetric if p is)

  36. Representations: c-o-Matrix • Elements C[i][j] = • Number of times pixels occur in an image of a specified value (color) • In the relation specified by p • Relative to other pixels • Size is number of different colors

  37. Representations: c-o-Matrix • An actual go board is an “image” with 3 different colors (including empty) • Example • p1: Manhattan distance of 1 between 2 points • First matrix row: • B near B 16 times • B near W 3 times • B near empty 11 times

  38. Representations: c-o-Matrix • Does not say much about absolute positions – must combine • SIR and C for whole board • NIR and C for whole board • NIR and Cs for 3x3 areas • sLVIR and Cs for 3x3 areas • NLVIR and Cs for 3x3 areas

  39. Output Representations • Only 2  • Standard Output Representation (SOR) • Each intersection is represented by 1 neuron • 1 for PASS • (b^2 + 1) neurons

  40. Output Representations • Row Column Output Representation (RCOR) • Used to decrease ANN size • 5 neurons for columns; 5 for rows • 1 for PASS • (2b + 1) neurons • Intention more complicated: • PASS intention is square of relevant neuron • RCOR Limits intention map: • v1>v2  y1>y2  v4>v3 • All values positive, non-zero

  41. Coevolution • Derives non-static fitness, as in nature • 1 or more populations; interacting • Competitive [battle] vs. Cooperative [subtasks] • Advantages • “Who needs enemies when you got friends like these?” – saves finding opponents; Especially in Go where no strong program exists • Variety in fitness – adaptive opponents • No upper bound for improvement

  42. Coevolution Methods Applied • Based on work by Lubberts & Mikkulainen [2001] • Hall of Fame • Host population and Master population • Maintaining the ability of host population to beat opponents of previous generations • Each generation, the best individual is added to HoF • All population competes against sample of the HoF

  43. Coevolution - HoF • Applied in this resaearch • HoF initially filled without competition • Individuals get their fitness by competing against the masters • When full - host with highest win rates (against masters) joins HoF • Replace first Master to lose all games • Coevolutionary progress cannot be directly seen • Both populations constantly changeing

  44. Cultural Coevolution • A new approach! • Maintains “culture” of masters resembling HoF • To enter culture, host must defeat all masters • Masters never replaced – unlimited culture size • Every individual receives a fitness score by competing against all masters • Culture growth rate decreases rapidly • Every new master is the strongest found (yet)

  45. Cultural Coevolution [2] • Numerous advantages • Maintains ability to defeat weak players • Keeps good solutions found • Same player cannot enter twice • Needs to defeat itself • Culture’s performance never decreases • Avoid focusing on a specific player’s weakness • As soon as any master is immune, the hosts have to find another way • More masters  less likely to remember all weaknesses

  46. General Evolution Setup • Opponents – Random; Naïve; JaGo • Fitness = strength • Rate of wins against all 3 opponents • 6,000 games of both colors • Not using scores, only win rates • Defeating more opponents is better • Generalized Multi-Layer Perceptrons (GMLPs) • All non-loop connections are permitted • Evolving • Hidden neurons; connections; weights; bias (for non-input)

  47. General Evolution Setup [2] • 2 binary Chromosomes used • 1 for connections : 0-no 1-yes • 1 for hidden neurons (if 0, no connections also) • Number of possible connections: • ni, nh, no – number of input, hidden and output neurons • Determines size of chromosome • Real-Chromosome • Weights & Bias values (seen as weights) • Size is number of connections + number of bias vals (for non-input)

  48. General Evolution Setup [3] • Tournament selection (size 2) • 2 point crossover • Binary mutation • Flip bits with 1/L probability • Real-Chromosome Mutation • multiple-σSA • Each object maintains altering “strategy” params which alter distribution of “object” params • Normal distributions used for both

  49. Setup – Recurrent Nets • Difficult to learn Go without structured input • Experiments with recurrent nets included • Allow loops for input Ns • Naturally represent adjacent board intersections • No hidden Ns • Played against JaGo • Typically output changes without input change due to feedback loops • Computed output only once! • Only 2 directly connected Ns influence each other • Evolutions should connect only close Ns

  50. Outline • Computers and Games • The game of Go • Experimental Setup • Training of Go playing ANNs • Evolution of Go Playing ANNs • Summary and Outlook

More Related