230 likes | 255 Views
Solving Probabilistic Combinatorial Games. Ling Zhao & Martin Mueller University of Alberta September 7, 2005. Paper link: http://www.cs.ualberta.ca/~zhao/PCG.pdf. Motivations. Ken Chen’s previous work to maximize winning chance in Go. Maximize points Vs. maximize winning probability
E N D
Solving Probabilistic Combinatorial Games Ling Zhao & Martin Mueller University of Alberta September 7, 2005 Paper link: http://www.cs.ualberta.ca/~zhao/PCG.pdf
Motivations • Ken Chen’s previous work to maximize winning chance in Go. • Maximize points Vs. maximize winning probability • How to solve the abstract game efficiently or play the abstract game well?
Motivations Results (black plays first): +15 (80%), -7 (20%)
Probabilistic Combinatorial Games • A Terminal node which is expressed as a probability distribution (d={[p1, v1], [p2, v2], …, [pn, vn] }) is a PCG. • If A1, A2, … An, B1, B2, …, Bn are PCGs, then {A1, A2, … An | B1, B2, …, Bn} is a PCG. • A sum of PCGs is a PCG. Left options Right options A move in a sum game consists of a move to an option in exactly one subgame and leaves all other subgames unchanged.
Simple PCG (SPCG) • Each PCG has exactly one left option and one right option. • Each option leads immediately to a terminal node. • Each distribution has exactly 2 values with associate probabilities.
Problems to Address • How to solve PCGs efficiently? • How to play PCGs well if resources are limited or fast play is required?
Game Tree Analysis • Very regular game tree: a node at depth k has exactly n-k children, so n!/(n-k)! nodes in total at depth k. • Very large number of transpositions: C(n, k) * C(k, k/2) distinct nodes at depth k.
Terminal Node Evaluation • Terminal node is a sum of probability distributions. • Winning probability
Monte-Carlo Terminal Evaluation (MCTE) • Use Monte-Carlo methods to randomly collect k samples from 2n data points in the sum of n distributions. • Use the average winning percentage of samples (P’w)to approximate the overall winning probability (Pw). • Theory from statistics: Pw - P’w is a normal distribution, with mean=0 and std dev = <= • Experimental results:
Monte-Carlo Interior Evaluation (MCIE) • Evaluation of anode is approximated by averaging the values of terminal nodesreached from it through random play. • Proposed by Abramsonin 1990. - Using 4x4 tic-tac-toe and 6x6 Othello for experiments. • Applied to Monte-Carlo Go by Bouzy and Helmstetter and several other researchers.
SPCG Solver and Player • Solver: alpha-beta search, transposition tables, move ordering (MCIE & MCTE). • Player: alpha-beta search to a certain depth and use Monte-Carlo interior evaluation for frontier nodes.
Experimental Results • 100 randomly generated games, and each game has 14 subgames. • Value distribution: probability from 0 to 1, value from –1000 to 1000. • AMD 2400MHz CPUs • 220 cache entries (terminal nodes have higher priority) • About 8 seconds to solve a game.
Solver Performance • Monte-Carlo move ordering: dm – depth limit for Monte-Carlo move ordering being used, otherwise history heuristic is used. • Monte-Carlo interior evaluation: nt – percentage of all the current node’s descendant terminal nodes sampled. • Monte-Carlo terminal evaluation: nc – number of data points sampled. • Accurate terminal evaluation occupies 90% of the overall running time.
Monte-Carlo Player • Test against the perfect player. • Each game has two rounds: each side plays first once. • Winning probability is the average of the two rounds. • Parameters: search depth and nc.
Error in Move Ordering • Average probability error: Move: A B Actual win prob: 0.18 < 0.19 Estimate: 0.32 > 0.30 Win prob lost: 0.01 • Average probability error is the average of the winning probability lost in all move pairs of a node. • Worst probability error: the probability lost due to the wrong best move chosen.
Conclusions • Efficient exact and heuristic solvers for SPCG. • Successfully incorporate Monte-Carlo move ordering into alpha-beta search to SPCG. • A heuristic evaluation techniques based on Monte-Carlo with performance close to the prefect player. • Extensive experiments for the two solvers.
Future Work • Better algorithms to accurately evaluate terminal nodes? • Progressive pruning. • Why does the simple Abramson’s Expected Outcome model perform so well in move ordering?
New Directions to Apply Monte-Carlo to Computer Go Monte-Carlo Go: + New Direction:
Future Work: Transformation ? ? PCG solver or player