Evolution and Coevolution of ANNs playing Go

Evolution and Coevolution of ANNs playing Go Peter Mayer, 2004

Outline • Computers and Games • The game of Go • Experimental Setup • Training of Go playing ANNs • Evolution of Go Playing ANNs • Summary and Outlook

Games • Algorithms designed since AIs onset • Clearly defined rules • Still complex • Chess received the most attention • More researched than Go • Two main approaches • Rely on expertise – directly programmed weighted features; Extensive knowledge • Use evolution – less knowledge; more versatility

The game of Go • Oldest (unaltered) strategic board game in the world • 10,000,000 players in Japan “alone” • Fairly simple rules • BUT difficult to master • Immense tree (~200 opts) • Complex structures • Many concurrent goals

Go Rules • 19x19 board • Empty in the beginning • Black & White “stones” • Black starts • Each turn • Place 1 stone • At an intersection • Never move stones • OR pass

Go Rules [2] • Objective - Get the most points ! • Points are acquired by: • Securing Territories • Capturing opp’s pieces

Go Rules [3] • Stones at a vertically or horizontally adjacent intersection are called a group • An empty intersection adjacent to a stone or group is called a "liberty" of that group • 1 Liberty = group in “atari” • No liberties -> CAPTURE ! Group is removed • Example – Black places stone in X resulting in right figure

Go Rules [4] • Stones can be placed anywhere, but cannot commit suicide (except Chinese) • Legal if stone simultaneously captures opponent’s group (2 right figures) Suicide – white cannot place at X White CAN place at X Result: capture

Go Rules [5] • Same position cannot occur more than once • Endless repetitions: • Black can capture at upper figure by placing at X • White - same by placing at Y • Black – repeat… • Ko rule • White may not place at Y before playing somewhere else first • Avoid any repetitions

Go Rules – Live and Dead groups • “Dead” groups if impossible to prevent capture • It is not necessary to do so • Group remains on board • At end of game, removed and added to captured stones • “Living” groups are impossible to capture • Group with 2 “eyes” – even if white surrounds it, playing at X or Y is suicide • Opponent must play elsewhere

Go Basics – End game • Play continues until both players pass • Players then alternatively play stones at “neutral” points – adjacent to both White and Black • Also known as “dame” (DAH-MAY) • Dead stones are removed from the board and counted with other prisoners (1 point per prisoner) • Also - 1 point for each intersection surrounded by player’s stones (“territory”)

Go Basics – End game example • Prisoners were removed already • All 4 points marked X are dame – worthless • Black has • 7 points in UR (territory); 2 points in LL • 1 removed prisoner • TOTAL = 10 points • White has • 5 in UL; 2 in LR • 2 prisoners • TOTAL = 9 points • Black wins unless komi (5.5 pts compensation) is due

Ranking and Handicaps • Determine Go players’ strength • Resemblance to martial arts • Both amateur and professional ranking system • Amateur • 35 kyu to 1 kyu • THEN 1 dan to 7 dan • Pro • 1 dan to 9 dan • Awarded only by Go institutions • Pro dans are much stronger than amateur dans

Ranking and Handicaps (2) • Handicaps • Weaker player starts with several stones on the board • Placed at specific places • Helps make games more even • Difference in ranks ~ number of handicap stones needed to win • 2 stones to even 2 dan against 4 dan • 4 to even 3 kyu and 2 dan • The most powerful Go programs reach only … • … 10 kyu!

Experimental Setup • Opponent Go players • ANN player • Go board (input) representations • Move (output) representations • Coevolution • Hall of Fame coevolution • Cultural coevolution • General evolution setup

Go Players - Random • No strategy • Pass move also • “Knows” only the rules of go • legality of moves • Usually weakest opponent

Go Players – Naïve Player • Roughly human-beginner level • Able to save and capture stones • Knows about • Lost stones • Saving - connecting stones to living groups • Weak stones (not savable)

Go Players – Naïve Strategy • A subset of JaGo’s (main opponent) strategy • Outline (arranged by priority): • Attempt to save • Try to put opponent into atari • Connect weak stones • Capture opponent groups in atari • Check intersections for placing stones • In random order • Make sure no (own) liberties decrease below 2 as a result • Perform Random move

Go Players – JaGo Player • Java based program • Best computer player used • Not a strong player ~16 kyu • Knows standard techniques • Mainly save & capture • Uses pattern matching • Looks at entire board • 32 patterns, with rotations and mirrors

Go Players – JaGo Strategy (1) • Save stones in atari • Try to decrease liberties of large groups • Find own savable larger groups • Attack opponent’s groups (decreasing order:) • With 2 or more liberties and attackable • With 2 or more stones & less than 3 liberties • With 2 or less liberties

Go Players – JaGo Strategy (2) • Save own groups with few liberties if savable • Start pattern matching – Response; Center • Random move order • Seek opponent’s groups to capture in 2 moves • Perform random move which isn’t of a bad pattern • Capture opponent’s single liberties • Connect own weak stones • PASS

Go Players – JaGo Patterns (1)

Go Players – JaGo Patterns (2)

Go Players – GNU Go • Advantages • 5x5 to 19x19 boards • Handles handicaps well • Rated 10 kyu • Problems • 5x5 solved – open an C3 for 18.5 points (komi=5.5) – always wins in Black • GNU Go passes on B3, C2-4, D3 (only correct at C3) • Premature convergence of evolution

ANN Player • Inform ANN about actual position • Evaluate ANN output to receive next move • Representation is important! • Intention maps • For each Go move (including PASS) – value between [0,1] • High value – high intention to make move (and v.v) • Select legal move with highest value To avoid predictability – consider sub optimal moves also (“creativity factor”)

Player Strength • Commonly to receive a rating unrated Go players play against rated players (same in Chess) • The strength s of a player is determined by • The score of 1000 double games • Against each of 3 opponents: R, N, JaGo • Divided by the number of games (6,000) • 1 is perfect strength • 3 opponents help resist over-fitting

Player Competence • Strength is not understanding of rules (legality) • E.g. 2 players receive same score but only one always tried legal moves first • The competence C of a player is defined as follows: • bi = games; i = moves; tij = #tried illegal moves; kij = #possible illegal move • C is the averaged on all games

Board Representations • 19x19 boards • far too large • Even for evolved agents • Use only 5x5 boards

Board Representations • Should preprocess position to make ANNs life easier • Tested in training experiments • Standard Input Representation (SIR) • 2 neurons at each intersection :- • 1 per player’s piece; 1 per opponent’s • No distinction between B and W stones • Optional – 1 neuron to tell if B or W • (2*b^2) neurons (were b is board size) = 50

Representations - NIR • Naïve Input Representation • More compact • 1 neuron per intersection • Set to -1 (player’s stone) or 1 (opponent’s) • 0 if empty • Uses half of SIRs neurons = 25

Representations - LVIR • Limited View Input Representation • Splits the Go board into several quadratic areas of size 3x3 • Idea – simplest way of capturing stones works within this area • E.g. capture of 1 stone by surrounding it • Areas overlap at middle row and middle column • Coding – similar to SIR • w is number of areas (=4) • 72 Neurons • Could also be Naïve

Clever Representations • Based on image processing and circuits • We want less raw inputs to allow ANN to concentrate more on features • Manhattan distance • Used in integrated circuits where wires run parallel to X or Y axis • Got its name from Manhattan NY, where streets are aligned in grid • P1 = (x1, x2) • P2 = (y1, y2)

Clever Representations • Manhattan distance is related to distance of Go stones (no diagonals) • distance = [#(separating stones) – 1] • 1 if next to each other • 2 if separated by one stone • 3 for knight’s move or two separating stones

Representations: c-o-Matrix • Co-occurrence-matrices • Used in image processing • Many parameters are derived from it • Mean, Sd, energy, contrast, homogeneity, … • Quadratic • Based on a relation p between image positions (symmetric if p is)

Representations: c-o-Matrix • Elements C[i][j] = • Number of times pixels occur in an image of a specified value (color) • In the relation specified by p • Relative to other pixels • Size is number of different colors

Representations: c-o-Matrix • An actual go board is an “image” with 3 different colors (including empty) • Example • p1: Manhattan distance of 1 between 2 points • First matrix row: • B near B 16 times • B near W 3 times • B near empty 11 times

Representations: c-o-Matrix • Does not say much about absolute positions – must combine • SIR and C for whole board • NIR and C for whole board • NIR and Cs for 3x3 areas • sLVIR and Cs for 3x3 areas • NLVIR and Cs for 3x3 areas

Output Representations • Only 2  • Standard Output Representation (SOR) • Each intersection is represented by 1 neuron • 1 for PASS • (b^2 + 1) neurons

Output Representations • Row Column Output Representation (RCOR) • Used to decrease ANN size • 5 neurons for columns; 5 for rows • 1 for PASS • (2b + 1) neurons • Intention more complicated: • PASS intention is square of relevant neuron • RCOR Limits intention map: • v1>v2  y1>y2  v4>v3 • All values positive, non-zero

Coevolution • Derives non-static fitness, as in nature • 1 or more populations; interacting • Competitive [battle] vs. Cooperative [subtasks] • Advantages • “Who needs enemies when you got friends like these?” – saves finding opponents; Especially in Go where no strong program exists • Variety in fitness – adaptive opponents • No upper bound for improvement

Coevolution Methods Applied • Based on work by Lubberts & Mikkulainen [2001] • Hall of Fame • Host population and Master population • Maintaining the ability of host population to beat opponents of previous generations • Each generation, the best individual is added to HoF • All population competes against sample of the HoF

Coevolution - HoF • Applied in this resaearch • HoF initially filled without competition • Individuals get their fitness by competing against the masters • When full - host with highest win rates (against masters) joins HoF • Replace first Master to lose all games • Coevolutionary progress cannot be directly seen • Both populations constantly changeing

Cultural Coevolution • A new approach! • Maintains “culture” of masters resembling HoF • To enter culture, host must defeat all masters • Masters never replaced – unlimited culture size • Every individual receives a fitness score by competing against all masters • Culture growth rate decreases rapidly • Every new master is the strongest found (yet)

Cultural Coevolution [2] • Numerous advantages • Maintains ability to defeat weak players • Keeps good solutions found • Same player cannot enter twice • Needs to defeat itself • Culture’s performance never decreases • Avoid focusing on a specific player’s weakness • As soon as any master is immune, the hosts have to find another way • More masters  less likely to remember all weaknesses

General Evolution Setup • Opponents – Random; Naïve; JaGo • Fitness = strength • Rate of wins against all 3 opponents • 6,000 games of both colors • Not using scores, only win rates • Defeating more opponents is better • Generalized Multi-Layer Perceptrons (GMLPs) • All non-loop connections are permitted • Evolving • Hidden neurons; connections; weights; bias (for non-input)

General Evolution Setup [2] • 2 binary Chromosomes used • 1 for connections : 0-no 1-yes • 1 for hidden neurons (if 0, no connections also) • Number of possible connections: • ni, nh, no – number of input, hidden and output neurons • Determines size of chromosome • Real-Chromosome • Weights & Bias values (seen as weights) • Size is number of connections + number of bias vals (for non-input)

General Evolution Setup [3] • Tournament selection (size 2) • 2 point crossover • Binary mutation • Flip bits with 1/L probability • Real-Chromosome Mutation • multiple-σSA • Each object maintains altering “strategy” params which alter distribution of “object” params • Normal distributions used for both

Setup – Recurrent Nets • Difficult to learn Go without structured input • Experiments with recurrent nets included • Allow loops for input Ns • Naturally represent adjacent board intersections • No hidden Ns • Played against JaGo • Typically output changes without input change due to feedback loops • Computed output only once! • Only 2 directly connected Ns influence each other • Evolutions should connect only close Ns

Evolution and Coevolution of ANNs playing Go

Evolution and Coevolution of ANNs playing Go

Presentation Transcript

Evolution and Coevolution of ANNs playing Go

Coevolution

Coevolution

Coevolution

Coevolution : The joint evolution of two species with close ecological relationships

Chapter 20: Coevolution and Mutualism

Technology/Business Innovation and Coevolution

Coevolution of Industries and Academic Disciplines

Playing Evolution Games in the Classroom

Coevolution

What is Coevolution?

Product Evolution: Music playing dvices

Competitive Coevolution (Predator-Prey Coevolution)

The Evolution of Role Playing Games

Coevolution

Evolution of Music Playing Devices

Chapter 20: Coevolution and Mutualism

Coevolution

ANNs (Artificial Neural Networks)

Playing Hide and Go Seek with God

Evolution and Coevolution of Artificial Neural Networks playing Go