190 likes | 304 Views
Learning Opponent-type Probabilities for PrOM search. Jeroen Donkers IKAT Universiteit Maastricht. Contents. OM search and PrOM search Learning for PrOM search Off-line Learning On-line Learning Conclusions & Future research. OM search. MAX player uses evaluation function V 0
E N D
Learning Opponent-type Probabilities for PrOM search Jeroen Donkers IKAT Universiteit Maastricht 6th Computer Olympiad
Contents • OM search and PrOM search • Learning for PrOM search • Off-line Learning • On-line Learning • Conclusions & Future research 6th Computer Olympiad
OM search • MAX player uses evaluation function V0 • Opponent uses different evaluation function (Vop) • At MIN nodes: predict which move the opponent will select (using standard search and Vop) • At MAX nodes, pick the move that maximizes the search value (based on V0) • At leaf nodes: use V0 6th Computer Olympiad
PrOM search • Extended Opponent Model: • a set of opponent types (e.g. evaluation functions) • a probability distribution over this set • Interpretation: At every move, the opponent uses a random device to pick one of the opponent types, and plays using the selected type. 6th Computer Olympiad
PrOM search algorithm • At MIN nodes: determine for every opponent type which move would be selected. • Compute the MAX player’s value for these moves • Use opponent-type probabilities to compute the expected value of the MIN node • at MAX nodes: select maximum child 6th Computer Olympiad
Learning in PrOM search • How do we assess the probabilities on the opponent types? • Off line: use games previously played by the opponent, to estimate the probabilities. (lot of time and - possibly - data available) • On line: use the observed moves during a game to adjust the probabilities.(only little time and few observations)prior probabilities are needed. 6th Computer Olympiad
Off-Line Learning • Ultimate Learning Goal: find P**(opp) for a given opponent and given opponent types such that PrOM search plays the best against that opponent. • Assumption: PrOM search plays the best if P** = P*, where P*(opp) is the mixed strategy that predicts the moves of the opponent the best. 6th Computer Olympiad
Off-Line Learning • How to obtain P*(opp)? • Input: a set of positions and the moves that the given opponent and all the given opponent types would select • “Algorithm”: P*(oppi) = Ni / N • But: leave out all ambiguous positions! (e.g. when more than one opponent type agree with the opponent) 6th Computer Olympiad
Off-Line Learning • Case I: The opponent is using a mixed strategy P#(opp) of the given opponent types • Effective learning is possible (P*(opp) P# (opp)) • More difficult if the opponent types are not independent 6th Computer Olympiad
Not leaving out ambiguous events 5 opponent types P = (a,b,b,b,b) 20 moves 100 - 100,000 runs 100 samples 6th Computer Olympiad
Leaving out ambiguous events 5 opponent types P = (a,b,b,b,b) 20 moves 10 - 100,000 runs 100 samples 6th Computer Olympiad
Varying number of opponent types 2-20 opponent types P = (a,b,b,b,b) 20 moves 100,000 runs 100 samples 6th Computer Olympiad
Off-Line Learning • Case 2: The opponent is using a different strategy. • Opponent types behave random but dependent(distribution of type i depends on type i-1) • Real opponent selects a fixed move 6th Computer Olympiad
Learning error Learned probabilities 6th Computer Olympiad
Fast On-Line Learning • At the principal MIN node, only the best moves for every opponent type are needed • Increase the probability of an opponent type slightly if the observed move is the same as the selected move of this opponent type only. Normalize all probabilities. • Drift to one opponent type is possible. 6th Computer Olympiad
Slower On-Line LearningNaive Bayesian (Duda & Hart’73) • Compute the value of every move at the principal MIN node for every opponent type • Transform these values into conditional probabilities P(move | opp). • Compute P(opp | moveobs) using P*(opp) (Bayes rule) • take P*(opp) = a.P*(opp) + (1- a) P(opp | moveobs) 6th Computer Olympiad
Naïve Bayesian Learning • In the end, drifting to 1-0 probabilities will occur almost always • Parameter a is very important for the actual performance: • amount of change in the probabilities • convergence • drifting speed • It should be tuned in a real setting 6th Computer Olympiad
Conclusions • Effective off-line learning of probabilities is possible, when ambiguous events are disregarded. • Off-line learning also works if the opponent does not use a mixed strategy of known opponent types. • On-line learning must be tuned precisely to a given situation 6th Computer Olympiad
Future Research • PrOM search and learning in real game playing • Zanzibar Bao (8x4 mancala) • LOA (some experiment with OM-search done) • Chess endgames 6th Computer Olympiad