820 likes | 834 Views
This article explores the evolution and coevolution of artificial neural networks (ANNs) playing the strategic board game of Go. It discusses the experimental setup, training of Go-playing ANNs, and the evolution of these systems. The game of Go, its rules, and its complexity are also covered.
E N D
Evolution and Coevolution of ANNs playing Go Peter Mayer, 2004
Outline • Computers and Games • The game of Go • Experimental Setup • Training of Go playing ANNs • Evolution of Go Playing ANNs • Summary and Outlook
Games • Algorithms designed since AIs onset • Clearly defined rules • Still complex • Chess received the most attention • More researched than Go • Two main approaches • Rely on expertise – directly programmed weighted features; Extensive knowledge • Use evolution – less knowledge; more versatility
The game of Go • Oldest (unaltered) strategic board game in the world • 10,000,000 players in Japan “alone” • Fairly simple rules • BUT difficult to master • Immense tree (~200 opts) • Complex structures • Many concurrent goals
Go Rules • 19x19 board • Empty in the beginning • Black & White “stones” • Black starts • Each turn • Place 1 stone • At an intersection • Never move stones • OR pass
Go Rules [2] • Objective - Get the most points ! • Points are acquired by: • Securing Territories • Capturing opp’s pieces
Go Rules [3] • Stones at a vertically or horizontally adjacent intersection are called a group • An empty intersection adjacent to a stone or group is called a "liberty" of that group • 1 Liberty = group in “atari” • No liberties -> CAPTURE ! Group is removed • Example – Black places stone in X resulting in right figure
Go Rules [4] • Stones can be placed anywhere, but cannot commit suicide (except Chinese) • Legal if stone simultaneously captures opponent’s group (2 right figures) Suicide – white cannot place at X White CAN place at X Result: capture
Go Rules [5] • Same position cannot occur more than once • Endless repetitions: • Black can capture at upper figure by placing at X • White - same by placing at Y • Black – repeat… • Ko rule • White may not place at Y before playing somewhere else first • Avoid any repetitions
Go Rules – Live and Dead groups • “Dead” groups if impossible to prevent capture • It is not necessary to do so • Group remains on board • At end of game, removed and added to captured stones • “Living” groups are impossible to capture • Group with 2 “eyes” – even if white surrounds it, playing at X or Y is suicide • Opponent must play elsewhere
Go Basics – End game • Play continues until both players pass • Players then alternatively play stones at “neutral” points – adjacent to both White and Black • Also known as “dame” (DAH-MAY) • Dead stones are removed from the board and counted with other prisoners (1 point per prisoner) • Also - 1 point for each intersection surrounded by player’s stones (“territory”)
Go Basics – End game example • Prisoners were removed already • All 4 points marked X are dame – worthless • Black has • 7 points in UR (territory); 2 points in LL • 1 removed prisoner • TOTAL = 10 points • White has • 5 in UL; 2 in LR • 2 prisoners • TOTAL = 9 points • Black wins unless komi (5.5 pts compensation) is due
Ranking and Handicaps • Determine Go players’ strength • Resemblance to martial arts • Both amateur and professional ranking system • Amateur • 35 kyu to 1 kyu • THEN 1 dan to 7 dan • Pro • 1 dan to 9 dan • Awarded only by Go institutions • Pro dans are much stronger than amateur dans
Ranking and Handicaps (2) • Handicaps • Weaker player starts with several stones on the board • Placed at specific places • Helps make games more even • Difference in ranks ~ number of handicap stones needed to win • 2 stones to even 2 dan against 4 dan • 4 to even 3 kyu and 2 dan • The most powerful Go programs reach only … • … 10 kyu!
Outline • Computers and Games • The game of Go • Experimental Setup • Training of Go playing ANNs • Evolution of Go Playing ANNs • Summary and Outlook
Experimental Setup • Opponent Go players • ANN player • Go board (input) representations • Move (output) representations • Coevolution • Hall of Fame coevolution • Cultural coevolution • General evolution setup
Go Players - Random • No strategy • Pass move also • “Knows” only the rules of go • legality of moves • Usually weakest opponent
Go Players – Naïve Player • Roughly human-beginner level • Able to save and capture stones • Knows about • Lost stones • Saving - connecting stones to living groups • Weak stones (not savable)
Go Players – Naïve Strategy • A subset of JaGo’s (main opponent) strategy • Outline (arranged by priority): • Attempt to save • Try to put opponent into atari • Connect weak stones • Capture opponent groups in atari • Check intersections for placing stones • In random order • Make sure no (own) liberties decrease below 2 as a result • Perform Random move
Go Players – JaGo Player • Java based program • Best computer player used • Not a strong player ~16 kyu • Knows standard techniques • Mainly save & capture • Uses pattern matching • Looks at entire board • 32 patterns, with rotations and mirrors
Go Players – JaGo Strategy (1) • Save stones in atari • Try to decrease liberties of large groups • Find own savable larger groups • Attack opponent’s groups (decreasing order:) • With 2 or more liberties and attackable • With 2 or more stones & less than 3 liberties • With 2 or less liberties
Go Players – JaGo Strategy (2) • Save own groups with few liberties if savable • Start pattern matching – Response; Center • Random move order • Seek opponent’s groups to capture in 2 moves • Perform random move which isn’t of a bad pattern • Capture opponent’s single liberties • Connect own weak stones • PASS
Go Players – GNU Go • Advantages • 5x5 to 19x19 boards • Handles handicaps well • Rated 10 kyu • Problems • 5x5 solved – open an C3 for 18.5 points (komi=5.5) – always wins in Black • GNU Go passes on B3, C2-4, D3 (only correct at C3) • Premature convergence of evolution
ANN Player • Inform ANN about actual position • Evaluate ANN output to receive next move • Representation is important! • Intention maps • For each Go move (including PASS) – value between [0,1] • High value – high intention to make move (and v.v) • Select legal move with highest value To avoid predictability – consider sub optimal moves also (“creativity factor”)
Player Strength • Commonly to receive a rating unrated Go players play against rated players (same in Chess) • The strength s of a player is determined by • The score of 1000 double games • Against each of 3 opponents: R, N, JaGo • Divided by the number of games (6,000) • 1 is perfect strength • 3 opponents help resist over-fitting
Player Competence • Strength is not understanding of rules (legality) • E.g. 2 players receive same score but only one always tried legal moves first • The competence C of a player is defined as follows: • bi = games; i = moves; tij = #tried illegal moves; kij = #possible illegal move • C is the averaged on all games
Board Representations • 19x19 boards • far too large • Even for evolved agents • Use only 5x5 boards
Board Representations • Should preprocess position to make ANNs life easier • Tested in training experiments • Standard Input Representation (SIR) • 2 neurons at each intersection :- • 1 per player’s piece; 1 per opponent’s • No distinction between B and W stones • Optional – 1 neuron to tell if B or W • (2*b^2) neurons (were b is board size) = 50
Representations - NIR • Naïve Input Representation • More compact • 1 neuron per intersection • Set to -1 (player’s stone) or 1 (opponent’s) • 0 if empty • Uses half of SIRs neurons = 25
Representations - LVIR • Limited View Input Representation • Splits the Go board into several quadratic areas of size 3x3 • Idea – simplest way of capturing stones works within this area • E.g. capture of 1 stone by surrounding it • Areas overlap at middle row and middle column • Coding – similar to SIR • w is number of areas (=4) • 72 Neurons • Could also be Naïve
Clever Representations • Based on image processing and circuits • We want less raw inputs to allow ANN to concentrate more on features • Manhattan distance • Used in integrated circuits where wires run parallel to X or Y axis • Got its name from Manhattan NY, where streets are aligned in grid • P1 = (x1, x2) • P2 = (y1, y2)
Clever Representations • Manhattan distance is related to distance of Go stones (no diagonals) • distance = [#(separating stones) – 1] • 1 if next to each other • 2 if separated by one stone • 3 for knight’s move or two separating stones
Representations: c-o-Matrix • Co-occurrence-matrices • Used in image processing • Many parameters are derived from it • Mean, Sd, energy, contrast, homogeneity, … • Quadratic • Based on a relation p between image positions (symmetric if p is)
Representations: c-o-Matrix • Elements C[i][j] = • Number of times pixels occur in an image of a specified value (color) • In the relation specified by p • Relative to other pixels • Size is number of different colors
Representations: c-o-Matrix • An actual go board is an “image” with 3 different colors (including empty) • Example • p1: Manhattan distance of 1 between 2 points • First matrix row: • B near B 16 times • B near W 3 times • B near empty 11 times
Representations: c-o-Matrix • Does not say much about absolute positions – must combine • SIR and C for whole board • NIR and C for whole board • NIR and Cs for 3x3 areas • sLVIR and Cs for 3x3 areas • NLVIR and Cs for 3x3 areas
Output Representations • Only 2 • Standard Output Representation (SOR) • Each intersection is represented by 1 neuron • 1 for PASS • (b^2 + 1) neurons
Output Representations • Row Column Output Representation (RCOR) • Used to decrease ANN size • 5 neurons for columns; 5 for rows • 1 for PASS • (2b + 1) neurons • Intention more complicated: • PASS intention is square of relevant neuron • RCOR Limits intention map: • v1>v2 y1>y2 v4>v3 • All values positive, non-zero
Coevolution • Derives non-static fitness, as in nature • 1 or more populations; interacting • Competitive [battle] vs. Cooperative [subtasks] • Advantages • “Who needs enemies when you got friends like these?” – saves finding opponents; Especially in Go where no strong program exists • Variety in fitness – adaptive opponents • No upper bound for improvement
Coevolution Methods Applied • Based on work by Lubberts & Mikkulainen [2001] • Hall of Fame • Host population and Master population • Maintaining the ability of host population to beat opponents of previous generations • Each generation, the best individual is added to HoF • All population competes against sample of the HoF
Coevolution - HoF • Applied in this resaearch • HoF initially filled without competition • Individuals get their fitness by competing against the masters • When full - host with highest win rates (against masters) joins HoF • Replace first Master to lose all games • Coevolutionary progress cannot be directly seen • Both populations constantly changeing
Cultural Coevolution • A new approach! • Maintains “culture” of masters resembling HoF • To enter culture, host must defeat all masters • Masters never replaced – unlimited culture size • Every individual receives a fitness score by competing against all masters • Culture growth rate decreases rapidly • Every new master is the strongest found (yet)
Cultural Coevolution [2] • Numerous advantages • Maintains ability to defeat weak players • Keeps good solutions found • Same player cannot enter twice • Needs to defeat itself • Culture’s performance never decreases • Avoid focusing on a specific player’s weakness • As soon as any master is immune, the hosts have to find another way • More masters less likely to remember all weaknesses
General Evolution Setup • Opponents – Random; Naïve; JaGo • Fitness = strength • Rate of wins against all 3 opponents • 6,000 games of both colors • Not using scores, only win rates • Defeating more opponents is better • Generalized Multi-Layer Perceptrons (GMLPs) • All non-loop connections are permitted • Evolving • Hidden neurons; connections; weights; bias (for non-input)
General Evolution Setup [2] • 2 binary Chromosomes used • 1 for connections : 0-no 1-yes • 1 for hidden neurons (if 0, no connections also) • Number of possible connections: • ni, nh, no – number of input, hidden and output neurons • Determines size of chromosome • Real-Chromosome • Weights & Bias values (seen as weights) • Size is number of connections + number of bias vals (for non-input)
General Evolution Setup [3] • Tournament selection (size 2) • 2 point crossover • Binary mutation • Flip bits with 1/L probability • Real-Chromosome Mutation • multiple-σSA • Each object maintains altering “strategy” params which alter distribution of “object” params • Normal distributions used for both
Setup – Recurrent Nets • Difficult to learn Go without structured input • Experiments with recurrent nets included • Allow loops for input Ns • Naturally represent adjacent board intersections • No hidden Ns • Played against JaGo • Typically output changes without input change due to feedback loops • Computed output only once! • Only 2 directly connected Ns influence each other • Evolutions should connect only close Ns
Outline • Computers and Games • The game of Go • Experimental Setup • Training of Go playing ANNs • Evolution of Go Playing ANNs • Summary and Outlook