330 likes | 472 Views
Monte-Carlo Search Algorithms. Daniel Bjorge and John Schaeffer. Problem. Where it needs to be recognized. Important stuff. Solution. Best early prediction?. Which is faster?. Search Algorithms. Best β parameter?. Comparing Search Algorithms. Option 1: Computer Go.
E N D
Monte-Carlo Search Algorithms Daniel Bjorge and John Schaeffer
Problem Where it needs to be recognized Important stuff
Solution Best early prediction? Which is faster? • Search Algorithms Best β parameter?
Option 2: Artificial Trees 010 110 010 110 010 110 010 110 010 110 010 110 010 110 010 110 010 110
Eager Game Tree Generation (Small trees) • Tree Generator • Memory • Search Algorithm
Eager Game Tree Generation (Big trees) • Memory • Tree Generator
Lazy Game Tree Generation (Any trees) • Tree Generator • Search Algorithm • Memory
Minimax Values Minimizing Maximizing 7 0 1 -4 9 6 3 4 -9
Minimax Values Minimizing Maximizing 4 4 7 9 7 0 1 -4 9 6 3 4 -9
Minimax Values Minimizing Maximizing 4 4 7 9 7 0 1 -4 9 6 3 4 -9 Go:
Downward Minimax Evaluation 4 4 7 9 0 7 0 1 4 9 6 3 4
Example Minimizing Maximizing 0 0 5 5 1
Difficulty Minimax Optimal Easier White to play
Quantified Difficulty Minimizing Maximizing 0 0 5 5 1
Many-Armed Bandit Solutions 10 8 3 7 ... 0 1 0 0 ... 14 19 -27 -13 ... 1 2 2 2 ... -12 -4 -2 -8 …
(KocsisandSzepesvári, 2006) UCTUpper Confidence Bound for Trees Optimal arm choice rate
(Hartland et al., 2006) UCT-FPUUCT with First Play Urgency Optimal arm choice rate
Infinitely-Armed Bandit Solutions … 10 8 3 7 ... -12 -4 -2 -8 …
Infinite-Armed Bandit SolutionsAdapting to Trees • K-Failure • …for trees • UCB-V () • UCT-V () • UCB-V () AIR • UCT-V () AIR (Wang, Audibert, and Munos, 2006)
K-Failure Optimal arm choices (out of 2000)
UCT-V () … … … … … … … … …
UCT-V () with AIR … … … … … … … … …
UCT-V () with AIR: Gomba Optimal arm choices (out of 1000)