240 likes | 385 Views
Monte Carlo Go Has a Way to Go. Adapted from the slides presented at AAAI 2006. Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1). (*1)University of Tokyo (*2) Future University Hakodate. Games in AI.
E N D
Monte Carlo Go Has a Way to Go Adapted from the slides presented at AAAI 2006 Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University of Tokyo (*2)Future University Hakodate
Games in AI • Ideal test bed for AI research • Clear results • Clear motivation • Good challenge • Success in search-based approach • chess (1997, Deep Blue) • and others • Not successful in the game of Go • Go is to Chess as Poetry is to Double-entry accounting • It goes to the core of artificial intelligence, which involves the study of learning and decision-making, strategic thinking, knowledge representation, pattern recognition and, perhaps most intriguingly, intuition
The game of Go • An 4,000 years old board game from China • Standard size 19×19 • Two players, Black and White, place the stones in turns • Stones can not be moved, but can be captured and taken off • Larger territory wins
Playing Strength $1.2M was set for beating a professional with no handicap (expired!!!) Handtalk in 1997 claimed $7,700 for winning an 11-stone handicap match against a 8-9 years old master
Difficulties in Computer Go • Large search space • the game becomes progressively more complex, at least for the first 100 ply
Difficulties in Computer Go • Lack of good evaluation function • a material advantage does not mean a simple way to victory, and may just mean that short-term gain has been given priority • legal moves around 150–250, usually <50 acceptable (even <10), but computers have a hard time distinguishing them. • Very high degree of pattern recognition involved in human capacity to play well.
Why Monte Carlo Go? Replace evaluation function by random sampling Brugmann:1993, Bouzy:2003 • Success in other domains Bridge [Ginsberg:1999], Poker [Billings et al.:2002] • Reasonable position evaluation based on sampling search space from O(bd) to O(Nbd) • Easy to parallelize • Can win against search-based approach • Crazy Stone won the 11th Computer Olympiad in 9x9 Go • MoGo 19th, 20th KGS 9x9 winner, rated highest on CGOS
Basic idea of Monte Carlo Go • Generate next moves by 1-ply search • Play a number of random games and compute the expected score • Choose the move with the maximal score • The only domain-dependent information is eye.
Terminal Position of Go Larger territory wins Territory = surrounded area + stones ▲ Black’s territory is 36 points × White’s territory is 45 points White wins by 9 points
Play many sample games Each player plays randomly Compute average points for each move Select the move that has the highest average Example Play rest of the game randomly 5 points win for black 9 points win for black move A: (5 + 9) / 2 = 7 points
Monte Carlo Go and Sample Size Monte Carlo with 1000 sample games • Can reduce statistical errors with additional samples • Relationships between sample size and strength are not yet investigated • Sampling error~ • N: # of random games Diminishing returns must appear Monte Carlo with 100 sample games Stronger than
Our Monte Carlo Go Implementation • basic Monte Carlo Go • atari-50 enhancement: Utilization of simple go knowledge in move selection • progressive pruning [Bouzy 2003]: statistical move pruning in simulations
Atari-50 Enhancement • Basic Monte Carlo: assign uniform probability for each move in sample game (no eye filling) • Atari-50: higher probability for capture moves • Capture is “mostly” a good move • 50% Move A captures black stones
Progressive Pruning [Bouzy2003] • Try sampling with smaller sample size • Prune statistically inferior moves score move Can assign more sample games to promising moves
Experimental Design • Machine • Intel Xeon Dual CPU at 2.40 GHz with 2 GB memory • Use 64 PCs (128 processors) connected by 1GB/s network • Three versions of programs • BASIC: Basic Monte Carlo Go • ATARI: BASIC + Atari-50 enhancement • ATARIPP: ATARI + Progressive Pruning • Experiments • 200 self-play games • Analysis of decision quality from 58 professional games
Decision Quality of Each Move a b c 1 20 17 10 2b -> 9 times 2c -> 1 times 15 2 25 30 3 12 21 7 Selected move for 100 sample game Monte Carlo Go Evaluation score of “Oracle” (64 million sample games) Average error of one move is ((30 – 30) * 9 + (30 - 15 ) * 1) / 10 = 1.5 points
Summary of Experimental Results • Additional enhancements improve strength of Monte Carlo Go • Diminish returns eventually • Additional enhancements get quicker diminishing returns • Need to collect more samples in the early stage game of 9x9 Go
Conclusions and Future Work • Conclusions • Additional samples achieve only small improvements • Not like search algorithm, e.g. chess • Good at strategy, not tactics • blunder due to lack of domain knowledge • Easy to evaluate • Easy to parallelize • The way for Monte Carlo Go to go Small sample games with many enhancements will be promising • Future Work • Adjust probability with pattern matching • Learning • Search + Monte Carlo Go • MoGo (exploration-exploitation in the search tree using UCT) • Scale to 19×19
Questions ? Reference: • Go wiki http://en.wikipedia.org/wiki/Go_(board_game) • Gnu Go http://www.gnu.org/software/gnugo/ • KGS Go Server http://www.gokgs.com • CGOS 9x9 Computer Go Server http://cgos.boardspace.net