Monte Carlo Go Has a Way to Go

Monte Carlo Go Has a Way to Go Adapted from the slides presented at AAAI 2006 Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University of Tokyo (*2)Future University Hakodate

Games in AI • Ideal test bed for AI research • Clear results • Clear motivation • Good challenge • Success in search-based approach • chess (1997, Deep Blue) • and others • Not successful in the game of Go • Go is to Chess as Poetry is to Double-entry accounting • It goes to the core of artificial intelligence, which involves the study of learning and decision-making, strategic thinking, knowledge representation, pattern recognition and, perhaps most intriguingly, intuition

The game of Go • An 4,000 years old board game from China • Standard size 19×19 • Two players, Black and White, place the stones in turns • Stones can not be moved, but can be captured and taken off • Larger territory wins

Terminology of Go

Playing Strength $1.2M was set for beating a professional with no handicap (expired!!!) Handtalk in 1997 claimed $7,700 for winning an 11-stone handicap match against a 8-9 years old master

Difficulties in Computer Go • Large search space • the game becomes progressively more complex, at least for the first 100 ply

Difficulties in Computer Go • Lack of good evaluation function • a material advantage does not mean a simple way to victory, and may just mean that short-term gain has been given priority • legal moves around 150–250, usually <50 acceptable (even <10), but computers have a hard time distinguishing them. • Very high degree of pattern recognition involved in human capacity to play well.

Why Monte Carlo Go? Replace evaluation function by random sampling Brugmann:1993, Bouzy:2003 • Success in other domains Bridge [Ginsberg:1999], Poker [Billings et al.:2002] • Reasonable position evaluation based on sampling search space from O(bd) to O(Nbd) • Easy to parallelize • Can win against search-based approach • Crazy Stone won the 11th Computer Olympiad in 9x9 Go • MoGo 19th, 20th KGS 9x9 winner, rated highest on CGOS

Basic idea of Monte Carlo Go • Generate next moves by 1-ply search • Play a number of random games and compute the expected score • Choose the move with the maximal score • The only domain-dependent information is eye.

Terminal Position of Go Larger territory wins Territory = surrounded area + stones ▲ Black’s territory is 36 points × White’s territory is 45 points White wins by 9 points

Play many sample games Each player plays randomly Compute average points for each move Select the move that has the highest average Example Play rest of the game randomly 5 points win for black 9 points win for black move A: (5 + 9) / 2 = 7 points

Monte Carlo Go and Sample Size Monte Carlo with 1000 sample games • Can reduce statistical errors with additional samples • Relationships between sample size and strength are not yet investigated • Sampling error～ • N: # of random games Diminishing returns must appear Monte Carlo with 100 sample games Stronger than

Our Monte Carlo Go Implementation • basic Monte Carlo Go • atari-50 enhancement: Utilization of simple go knowledge in move selection • progressive pruning [Bouzy 2003]: statistical move pruning in simulations

Atari-50 Enhancement • Basic Monte Carlo: assign uniform probability for each move in sample game (no eye filling) • Atari-50: higher probability for capture moves • Capture is “mostly” a good move • 50% Move A captures black stones

Progressive Pruning [Bouzy2003] • Try sampling with smaller sample size • Prune statistically inferior moves score move Can assign more sample games to promising moves

Experimental Design • Machine • Intel Xeon Dual CPU at 2.40 GHz with 2 GB memory • Use 64 PCs (128 processors) connected by 1GB/s network • Three versions of programs • BASIC: Basic Monte Carlo Go • ATARI: BASIC + Atari-50 enhancement • ATARIPP: ATARI + Progressive Pruning • Experiments • 200 self-play games • Analysis of decision quality from 58 professional games

Diminishing Returns4*N samples vs N samplesfor each move

Additional enhancements and Winning Percentage

Decision Quality of Each Move a b c 1 20 17 10 2b -> 9 times 2c -> 1 times 15 2 25 30 3 12 21 7 Selected move for 100 sample game Monte Carlo Go Evaluation score of “Oracle” (64 million sample games) Average error of one move is ((30 – 30) * 9 + (30 - 15 ) * 1) / 10 = 1.5 points

Decision Quality of Each Move(Basic)

Decision Quality of Each Move (with Atari50 Enhancement)

Summary of Experimental Results • Additional enhancements improve strength of Monte Carlo Go • Diminish returns eventually • Additional enhancements get quicker diminishing returns • Need to collect more samples in the early stage game of 9x9 Go

Conclusions and Future Work • Conclusions • Additional samples achieve only small improvements • Not like search algorithm, e.g. chess • Good at strategy, not tactics • blunder due to lack of domain knowledge • Easy to evaluate • Easy to parallelize • The way for Monte Carlo Go to go Small sample games with many enhancements will be promising • Future Work • Adjust probability with pattern matching • Learning • Search + Monte Carlo Go • MoGo (exploration-exploitation in the search tree using UCT) • Scale to 19×19

Questions ? Reference: • Go wiki http://en.wikipedia.org/wiki/Go_(board_game) • Gnu Go http://www.gnu.org/software/gnugo/ • KGS Go Server http://www.gokgs.com • CGOS 9x9 Computer Go Server http://cgos.boardspace.net

Monte Carlo Go Has a Way to Go

Monte Carlo Go Has a Way to Go

Presentation Transcript

The Woodpecker Has To Go

Monte Carlo

A Greener Way To Go

A WAY TO GO IN STYLE

The Pacific Way – the Way to Go!

GO GO GO

Go, Go, Jo Go, Go, Jo Go, Go, Jo Go, Go, Jo

WPX The smart way to go.

A BETTER WAY TO GO!

ePAD: The way to go!

WAY TO GO 5

WPX The smart way to go.

The Way Things Go

WPX The smart way to go.

Go Go Go

Go SKy GO: Effective way of MArketing

A Better Way To Go Mobile

IPRs in Bangladesh: The Nation Has a Long Way to Go

Jargon A Go-Go