430 likes | 613 Views
AI techniques for the game of Go. Erik van der Werf Universiteit Maastricht / ReSound Algorithm R&D. Contents. Introduction Searching techniques The Capture Game Solving Go on Small Boards Learning techniques Move Prediction Learning to Score Predicting Life & Death
E N D
AI techniques for the game of Go Erik van der Werf Universiteit Maastricht / ReSound Algorithm R&D
Contents • Introduction • Searching techniques • The Capture Game • Solving Go on Small Boards • Learning techniques • Move Prediction • Learning to Score • Predicting Life & Death • Estimating Potential Territory • Summary of results • Conclusions
The game of Go • Deceivingly simple rules • Black and White move in turns • A move places a stone on the board • Surrounded stones are captured • Direct repetition is forbidden (Ko-rule) • The game is over when both players pass • Theplayer controlling most intersections wins
Computer Go • Even the best Go programs have no chance against strong amateurs • Human players superior in area’s such as • pattern recognition • spatial reasoning • Learning
Playing strength 29 stones handicap
Problem statement How can Artificial Intelligence techniques be used to improve the strength of Go programs? We focused on Searching techniques & Learning techniques
Searching techniques • Very successful for other board games • Evaluate positions by ‘thinking ahead’ • Research • Recognizing positions ‘that are irrelevant’ • Fast heuristic evaluations • Provably correct knowledge • Move ordering (the best moves first) • Re-use of partial results from the search process
The Capture Game • Simplified version of Go • First to capture a stone wins the game • Passing not allowed • Detecting final positions trivial (unlike normal Go) • Search method • Iterative Deepening Principal Variation Search • Enhanced transposition table • Move ordering using shared tables for both colours for killer and history heuristic
Heuristic evaluation for the capture game • Based on four principles: • Maximize liberties • Maximize territory • Connect stones • Make eyes Low order liberties (max. distance 3) Euler number (objects – holes) Fast computation using a bit-board representation
Solutions for the Capture Game • All boards up to 5x5 were solved • Winner decided by board-size parity • Will initiative take over at 6 x 6? Solution for 44 (White wins) Solution for 55 (Black wins)
Solutions for the Capture Game on 6x6 Initiative takes over at 6 6
Solving Go on Small Boards • Iterative Deepening Principal Variation Search • Enhanced transposition table • Exploit board symmetry • Internal unconditional bounds • Effective move ordering • Evaluation function • Heuristic component • Similar to the capture game • Provably correct component • Benson’s algorithm for recognizing unconditional life extended with detection of unconditional territory
Recognizing Unconditional Territory • Find regions surrounded by unconditionally alive stones of one colour • Find interior of the regions (eyespace) • Remove false eyes • Contract eyespace around defender stones • Count maximum sure liberties (MSL) MSL<2 Unconditionally territory. Otherwise Play it out.
Value of opening moves on 5x5 (2,2) (3,3) (3,2) Solutions for Small Boards
Learning techniques • Successful in several related domains • Heuristic knowledge can be ‘learned’ from analysis of human games • Research • Representation & Generalization • Learn maximally from limited number of examples • Pros and cons of different architectures • Clever use of available domain knowledge
Move prediction • Many moves in Go conform to local patterns which can be played almost reflexively • Train a MLP network to rank moves • Use move-pairs {expert , random} extracted from human game records • Training attempts to rank expert moves first
Move Prediction - Representation • Selection of raw features: • Stones • Ko • Liberties after • Nearest stones • Edge • Liberties • Captures • Last move • Remove symmetry by canonical ordering & colour reversal • High-dimensional representation suffers from curse of dimensionality => Apply linear feature extraction to reduce dimensionality
Move Prediction - Feature Extraction • Principal Component Analysis (PCA) • Linear Discriminant Analysis (LDA) • Move-Pair Analysis (MPA) • Linear projection maximizing the expected quadratic distance between pairs • Weakness: ignores global features • Modified Eigenspace Separation Transform (MEST) • Linear projection on eigenvectors with largest absolute eigenvalues of the correlation difference matrix • Good results using combination of MEST & MPA Standard techniques, sub-optimal for ranking
Human & Computer Performance Compared Black must choose between two red intersections
Performance on professional 19×19 games Cumulative performance (%) moves
Learning to Score • Using archives of (online) Go servers, such as NNGS, for ML is non-trivial because of : • Missing information: Only a single numeric result is given. The status of individual board-points is not available. • Unfinished games: Humans resign early or do not even finish the game at all • Bad moves • To overcome 1&2, we need reliable final scores • Large dataset created: 18k labeled final 9x9 positions • Several tricks were used to identify dubious scores • A few thousand positions scored/verified manually
The scoring method • Classify life & death for all blocks • Remove dead blocks • Mark empty intersections using flood-fills or distance to nearest remaining colour • (Optional) recursively update representation to take adjacent block status into account; return to 1
Blocks to Classify • For final positions there are 3 types of blocks: • Alive (O): at border of own territory • Dead (X): inside the opponents territory • Irrelevant (?): removal does not change area score • We only train on blocks of type 1 and 2 !
Representation of the blocks • Direct features of the block • Size • Perimeter • Adjacent opponent stones • 1st, 2nd, 3rd - order liberties • Protected liberties • Auto-atari liberties • Adjacent opponent blocks • Local majority (MD < 3) • Centre of mass • Bounding box size • Adjacent fully accessible CERs • Number of regions • Size • Perimeter • Split points • Adjacent partially accessible CERs • Number of partially accessible regions • Accessible size • Accessible perimeter • Inaccessible size • Inaccessible perimeter • Inaccessible split points • Disputed territory • Direct liberties of the block in disputed territory • Liberties of all friendly blocks in disputed territory • Liberties of all enemy blocks in disputed territory • Directly adjacent eyespace • Size • Perimeter • Optimistic chain • Number of blocks • Size • Perimeter • Split points • Adjacent CERs • Adjacent CERs with eyespace • Adjacent CERs, fully accessible from at least 1 block • Size of adjacent eyespace • Perimeter of adjacent eyespace • External opponent liberties • Opponent blocks (3x) • (1) Weakest directly adjacent opponent block (weakest = block with the fewest liberties) • (2) 2nd weakest directly adjacent opponent block • (3) Weakest opponent block adjacent or sharing liberties with the block’s optimistic chain • Perimeter • Liberties • Shared liberties • Split points • Perimeter of adjacent eyespace • Recursive features • Predicted value of strongest adjacent friendly block • Predicted value of weakest adjacent opponent block • Predicted value of second weakest adjacent opponent block • Average predicted value of weakest opponent block’s optimistic chain • Adjacent eyespace size of the weakest opponent block’s optimistic chain • Adjacent eyespace perimeter of the weakest opponent block’s optimistic chain
Scoring Performance • Blocks (direct/recursive classification) • Full board (4-step recursive classification) • Incorrect score: 1.1% = better than the average rated NNGS player (~7 kyu) • Incorrect winner: 0.5% = comparable to the average NNGS player • Average absolute score difference: 0.15 points
Life & Death during the game • Predict whether blocks of stones can be captured • Perfect predictionsnot possible in non-final positions! • Approximate the a posteriori probability that a block will be aliveat the end of the game • 4 Block types • First 3 types identified from final position (as before) • 4th type: blocks captured during the game -> dead • Irrelevant blocks not used during training! • Representation extended with 5 additional features Player to move, Ko , Distance to ko, Nr. of black/white stones on the board Black blocks 50% alive
Performance over the game MLP, 25 hidden units, 175,000 training examples Average prediction error: 11.7%
Estimating Potential Territory • Why estimate territory? • For predicting the score (potential territory) Main purpose: to build an evaluation function May also be used to adjust strategy (e.g., play safe when ahead) • To detect safe regions (secure territory) Main purpose: forward pruning (risky unless provably correct) • Our main focus is on (1) potential territory • We investigate: • Direct methods, known or derived from literature • ML methods, trained on game records • Enhancements with (heuristic) knowledge of L&D
Direct methods • Explicit control • Direct control • Distance-based control • Influence based control (~ numerical dilations) • Bouzy’s method (numerical dilations + erosions) • Combinations 5+3 or 5+4 • Enhancements use knowledge of Life & Death to remove dead stones (or reverse their colour)
features ML methods • Simple representation • Intersections in ROI: Colour {+1 black, -1 white, 0 empty} • Enhanced representation • Intersections in ROI: Colour x Prob.(Alive) • Edge • Colour of nearest stone • Colour of nearest livingstone • Prob.(Alive) obtained from pre-trained MLP predicted colour +1sure black 0neutral -1sure white
Summary: Searching Techniques • The capture game • Simplified Go rules(who captures the first stone wins) • boardsup to 6x6 solved • Go on small boards • Normal Go rules • First program in the world to have solved 5x5 Go • Perfect solutions up to ~30 intersections • Heuristic knowledge required for larger boards
Summary: Learning Techniques 1 • Move prediction • Very good results (strong kyu level) • Strong play is possible with limited selection of moves • Scoring final positions • Excellent classification • Reliable training data
Summary: Learning Techniques 2 • Predicting life and death • Good results • Most important ingredient for accurate evaluation of positions during the game • Estimating potential territory • Comparison of non-learning and learning methods • Best resultswith learning methods
Conclusions • Knowledge is the most important ingredient to improve Go programs • Searching techniques • Provably correct knowledge sufficient for solving small problems up to ~30 intersections • Heuristic knowledge essential for larger problems • Learning techniques • Heuristic knowledge learned quite well from games • Learned heuristic knowledge at least at the level of reasonably strong kyu players
Questions? ? More information at: http://erikvanderwerf.tengen.nl/ Email: