A Scalable Machine Learning Approach to Go

A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine

Contents • Introduction on Go • Existing approaches • Our approach • Results • Conclusion & Future work

What is Go?

What is Go? • Black & white play alternatively • Stones with zero liberty will be removed • The one who has more territory wins

Why is Go interested? • Go is a hard game for computer. • The best Go computer programs are easily defeated by an average human amateur • Board games have expert-level programs • Chess: Deep blue (1997) & FRITZ (2002) • Checker: Chinook (1994) • Othello (Reversi): Logistello (2002) • Backgammon: TD-GAMMON (1992)

Why is Go interested for AI? • Poses unique opportunities and challenges for AI and machine learning • Hard to build high quality evaluation function • Big branching factor, 200-300, compared with 35-40 for chess

Existing approaches • Hard-coded programs • Evaluate the next move by playing large number of random games • Use machine learning to learn the evaluation functions

Existing approaches── hard-coded programs • Hand-tailored pattern libraries • Hard-coded rules to choose among multiple hits • Tactical search (or reading) • E.g. “Many Faces of Go”, “GnuGo”

Existing approaches── hard-coded programs • Pros: • Good performance • Cons: • Intensive manual work • Pattern library is not complete • Hard to manage and improve

Existing approaches── Random games • Play huge number of random games from given position • Use the results of games to evaluate all the legal moves • Choose the legal move with best evaluation • E.g: Gobble, Go81

Existing approaches── Random games • Pros • Easy to implement • Reasonable performance • Cons • Small boards only, cannot scale to normal board

Existing approaches── Machine learning • Schraudolph et al., 1994 • TD0 • Neural Network • Graepel et al., 2001 • Condensed graph by common fate property • SVM • Stern, Graepel, and MacKay, 2005 • Conditional Markov random field

Existing approaches── Machine learning • Pros: • Learn automatically • Cons: • Poor performance

Out approach • Use scalable algorithms to learn high quality evaluation functions automatically • Imitate human evaluating process

Our approach── Human evaluating process • Three key components • The understanding of patterns • The ability to combine patterns • The ability to relate strategic rewards to tactical ones

Our approach── System components • 3x3 pattern library • Learn tactical patterns automatically • A structure-rich Recursive Neural Network • Propagate interaction between patterns • Learn the correlation between strategic rewards (Targets) and tactical reward (Inputs)

Our approach── RNN architecture • Six planes • One input plane • One output plane • Four Hidden Planes

Our approach── Update sequence

Our approach── Provide relevant inputs • For intersections • Intersection type: black, white, or empty • Influence: influence from the same & opposite color • Pattern stability: a statistical value calculated from 3x3 patterns • For groups • Number of eyes • Number of 1st, 2nd, 3rd, and 4th order liberties • Number of liberties of the 1st and 2nd weakest opponents

Our approach── Pattern stability (I) • 9x9 board is split into 10 unique locations for 3x3 patterns with mirror and rotation symmetries considered • Stability is measured for each intersection of each pattern within each unique location.

Our approach── Pattern stability (II) • Ten unique pattern locations

Our approach── Pattern stability (III)

Our approach── Pattern stability results (I)

Our approach── Pattern stability results (II)

Results── Validation error

Results── Results on move predictions

Results── Matched move (I)

Results── Matched move (II)

Conclusion & Future work

A Scalable Machine Learning Approach to Go