160 likes | 420 Views
CEC 2006 Othello Competition. Simon M. Lucas Computer Science Dept, University of Essex Thomas P. Runarsson Science Institute, University of Iceland. Motivation. Othello is an interesting unsolved game Good test-bed for CI-Games research Objective: Find best position evaluation function
E N D
CEC 2006 Othello Competition Simon M. Lucas Computer Science Dept, University of Essex Thomas P. Runarsson Science Institute, University of Iceland
Motivation • Othello is an interesting unsolved game • Good test-bed for CI-Games research • Objective: Find best position evaluation function • Challenges for competition: • Which architecture works best? • Which is the best way to train such an architecture • E.g. Temporal difference learning or co-evolution or …
TDL versus CEL • Which method works best to learn a game strategy? • Which method learns fastest? • Which method achieves ultimately higher play? • Standard CEL only uses game results • TDL also exploits information available during the game
The setup • Each game is played as follows • All legal next board positions are generated • The position evaluation function of the next player is used to evaluate each board position • Move is chosen that leads to most favourable position for that player • i.e. 1-ply lookahead
Motivation • Focus on machine learning rather than game-tree search • Force random moves (with prob. 0.1, 0.01 or 0.0) • Get a more robust evaluation of playing ability
Volatile Piece Difference move Move
Standard “Heuristic” Weights(lighter = more advantageous)
Random move prob=0.0 0 22 20 0 2 name: kjkim-mlp-3 1 22 17 1 4 name: AleZ V 2 22 17 0 5 name: NButtBradford1b 3 22 14 2 6 name: mlp-again2 4 22 13 2 7 name: delete-me-cel-1-10 5 22 13 1 8 name: brookdale4 6 22 8 2 12 name: tomy0 7 22 7 1 14 name: fedevadeculo 8 22 5 0 17 name: last weebl 9 22 5 1 16 name: jesz3 10 22 5 2 15 name: Jorge 11 22 1 2 19 name: tpr-tdl-01-500000
Random Move Prob = 0.01 0 220 181 4 35 name: kjkim-mlp-3 1 220 170 7 43 name: AleZ V 2 220 161 12 47 name: mlp-again2 3 220 157 5 58 name: NButtBradford1b 4 220 138 10 72 name: brookdale4 5 220 137 17 66 name: delete-me-cel-1-10 6 220 73 7 140 name: fedevadeculo 7 220 70 14 136 name: tomy0 8 220 58 17 145 name: Jorge 9 220 55 7 158 name: jesz3 10 220 46 2 172 name: last weebl 11 220 17 12 191 name: tpr-tdl-01-500000
Random Move Prob = 0.1 0 220 163 1 56 name: kjkim-mlp-3 1 220 161 4 55 name: mlp-again2 2 220 158 3 59 name: AleZ V 3 220 153 9 58 name: brookdale4 4 220 150 6 64 name: delete-me-cel-1-10 5 220 147 5 68 name: NButtBradford1b 6 220 73 7 140 name: fedevadeculo 7 220 71 5 144 name: Jorge 8 220 68 4 148 name: jesz3 9 220 67 3 150 name: tomy0 10 220 58 6 156 name: last weebl 11 220 21 7 192 name: tpr-tdl-01-500000
Winner: Kyung-Joong KimYonsei University, Seoul • Approach • Used GA • Initialised population with 64 : 32 : 1 MLP supplied as sample by organisers • Then used GA to evolve it (100 generations of 100 MLPs) • Interesting: not significantly better against standard heuristics player (with eps = 0.1) • But better on average against a wider range of players
Future Competitions • Implement additional standard architectures • Blondie-style MLP • Scanning n-tuple grid features • Encourage people to supply their own architectures (allowed for this contest, but not well publicised)