1 / 21

Reinforcement Learning of Local Shape in the Game of Atari-Go

Reinforcement Learning of Local Shape in the Game of Atari-Go. David Silver. Local shape. Local shape describes a pattern of stones Corresponds to expert Go knowledge Joseki (corner patterns) Tesuji (tactical patterns) Used extensively in current strongest programs Pattern databases

rgillespie
Download Presentation

Reinforcement Learning of Local Shape in the Game of Atari-Go

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver

  2. Local shape • Local shape describes a pattern of stones • Corresponds to expert Go knowledge • Joseki (corner patterns) • Tesuji (tactical patterns) • Used extensively in current strongest programs • Pattern databases • Difficult to extract expert Go knowledge, and input into pattern databases • Focus of this work • Explicitly learn local shapes through experience • Learn a value for the goodness of each shape

  3. Prior work • Supervised learning of local shapes • Local move prediction [Stoutamire, Werf] • Mimics strong play rather than learning to evaluate and understand positions • Reinforcement learning of neural networks • TD(0) [Schraudolph, Enzenberger] • Shape represented implicitly, difficult to interpret • Limited in scope by network architecture

  4. Local shape features • Specifies configuration of stones within a rectangular window • Takes account of rotational, reflectional and colour inversion symmetries • Location dependent (LD) features • Specifies canonical position on board • Location invariant (LI) features • Can appear anywhere on board

  5. Local shape features 2x2 LI feature 2x2 LD feature Position

  6. Local shape features • All possible configurations enumerated • For each window size from 1x1 up to 3x3 • Both LI and LD shapes Numbers of features (active and total)

  7. Partial ordering of local feature sets • There is a partial ordering > over the generality of local feature sets • Small windows > large windows • LI > LD

  8. Value function • Reward of +1 for winning, 0 for losing • Value function estimates total expected reward from any position, i.e. the probability of winning • Move selection is done by 1-ply greedy search over value function • Value function is approximated by a single linear threshold unit • Local shape features are inputs to the LTU

  9. Value function

  10. Temporal difference learning • Update value of each position towards the new value after making a move • For tabular value function • V(st) = [r + V(st+1) - V(st)] • For linear threshold unit, assign credit to each active feature •  = r + V(st+1) - V(st) • i = V(s)(1 - V(st))

  11. Minimum liberty opponent • To evaluate a position s: • Find block of either colour with fewest liberties • Set colmin to colour of minimum liberty block • Set libmin to number of liberties • If both players have a block with l liberties, colmin is set to minimum liberty player • Evaluate position according to: • Select move with 1-ply greedy search

  12. Training procedure • Random policy rarely beats minimum liberty player • So train against an improving opponent • Opponent plays some random moves, enough to win 50% of games • Random moves are reduced as agent improves • Eventually there are no random moves • Testing is always performed against full opponent (no random moves)

  13. Results on 5x5 board • Different combinations of feature sets tried • Just one feature set F • All feature sets as or more general than F • All feature sets as or less general than F • Percentage wins during testing after 25,000 training games

  14. Results on 5x5 board Single specified feature set, location invariant

  15. Results on 5x5 board All feature sets as or more general than specified set

  16. Board growing results • Board grown from 5x5 to 9x9 • Board size increased when winning 90% • Weights transferred from previous size • Percentage wins shown during training

  17. Board growing • Local shape features have a direct interpretation • The same interpretation applies to different board sizes • So transfer knowledge from one board size to the next • Learn key concepts rapidly and extend to more difficult contexts

  18. Shapes learned

  19. Example game • 7x7 board • Agent plays black • Minimum liberty opponent plays white • Agent has learned strategic concepts: • Keeping stones connected • Building territory • Controlling corners

  20. Conclusions • Local shape knowledge can be explicitly learnt directly from experience • Multi-scale representation helps to learn quickly and provide fine differentiation • Knowledge is easily interpretable and can be transferred to different board sizes • The combined knowledge of local shape is sufficient to express global strategic concepts

  21. Future work • Stronger opponents, real Go not Atari-Go • Learn shapes selectively rather than enumerating all possible shapes • Learn shapes to answer specific questions • Can black B4 be captured? • Can white connect A2 to D5? • Learn non-local shape: • Use connectivity relationships • Build hierarchies of shapes

More Related