290 likes | 446 Views
A Scalable Machine Learning Approach to Go. Pierre Baldi and Lin Wu UC Irvine. Contents. Introduction on Go Existing approaches Our approach Results Conclusion & Future work. What is Go?. What is Go?. Black & white play alternatively Stones with zero liberty will be removed
E N D
A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine
Contents • Introduction on Go • Existing approaches • Our approach • Results • Conclusion & Future work
What is Go? • Black & white play alternatively • Stones with zero liberty will be removed • The one who has more territory wins
Why is Go interested? • Go is a hard game for computer. • The best Go computer programs are easily defeated by an average human amateur • Board games have expert-level programs • Chess: Deep blue (1997) & FRITZ (2002) • Checker: Chinook (1994) • Othello (Reversi): Logistello (2002) • Backgammon: TD-GAMMON (1992)
Why is Go interested for AI? • Poses unique opportunities and challenges for AI and machine learning • Hard to build high quality evaluation function • Big branching factor, 200-300, compared with 35-40 for chess
Existing approaches • Hard-coded programs • Evaluate the next move by playing large number of random games • Use machine learning to learn the evaluation functions
Existing approaches── hard-coded programs • Hand-tailored pattern libraries • Hard-coded rules to choose among multiple hits • Tactical search (or reading) • E.g. “Many Faces of Go”, “GnuGo”
Existing approaches── hard-coded programs • Pros: • Good performance • Cons: • Intensive manual work • Pattern library is not complete • Hard to manage and improve
Existing approaches── Random games • Play huge number of random games from given position • Use the results of games to evaluate all the legal moves • Choose the legal move with best evaluation • E.g: Gobble, Go81
Existing approaches── Random games • Pros • Easy to implement • Reasonable performance • Cons • Small boards only, cannot scale to normal board
Existing approaches── Machine learning • Schraudolph et al., 1994 • TD0 • Neural Network • Graepel et al., 2001 • Condensed graph by common fate property • SVM • Stern, Graepel, and MacKay, 2005 • Conditional Markov random field
Existing approaches── Machine learning • Pros: • Learn automatically • Cons: • Poor performance
Out approach • Use scalable algorithms to learn high quality evaluation functions automatically • Imitate human evaluating process
Our approach── Human evaluating process • Three key components • The understanding of patterns • The ability to combine patterns • The ability to relate strategic rewards to tactical ones
Our approach── System components • 3x3 pattern library • Learn tactical patterns automatically • A structure-rich Recursive Neural Network • Propagate interaction between patterns • Learn the correlation between strategic rewards (Targets) and tactical reward (Inputs)
Our approach── RNN architecture • Six planes • One input plane • One output plane • Four Hidden Planes
Our approach── Provide relevant inputs • For intersections • Intersection type: black, white, or empty • Influence: influence from the same & opposite color • Pattern stability: a statistical value calculated from 3x3 patterns • For groups • Number of eyes • Number of 1st, 2nd, 3rd, and 4th order liberties • Number of liberties of the 1st and 2nd weakest opponents
Our approach── Pattern stability (I) • 9x9 board is split into 10 unique locations for 3x3 patterns with mirror and rotation symmetries considered • Stability is measured for each intersection of each pattern within each unique location.
Our approach── Pattern stability (II) • Ten unique pattern locations