1 / 19

Learning Opponent-type Probabilities for PrOM search

Learning Opponent-type Probabilities for PrOM search. Jeroen Donkers IKAT Universiteit Maastricht. Contents. OM search and PrOM search Learning for PrOM search Off-line Learning On-line Learning Conclusions & Future research. OM search. MAX player uses evaluation function V 0

Download Presentation

Learning Opponent-type Probabilities for PrOM search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Opponent-type Probabilities for PrOM search Jeroen Donkers IKAT Universiteit Maastricht 6th Computer Olympiad

  2. Contents • OM search and PrOM search • Learning for PrOM search • Off-line Learning • On-line Learning • Conclusions & Future research 6th Computer Olympiad

  3. OM search • MAX player uses evaluation function V0 • Opponent uses different evaluation function (Vop) • At MIN nodes: predict which move the opponent will select (using standard search and Vop) • At MAX nodes, pick the move that maximizes the search value (based on V0) • At leaf nodes: use V0 6th Computer Olympiad

  4. PrOM search • Extended Opponent Model: • a set of opponent types (e.g. evaluation functions) • a probability distribution over this set • Interpretation: At every move, the opponent uses a random device to pick one of the opponent types, and plays using the selected type. 6th Computer Olympiad

  5. PrOM search algorithm • At MIN nodes: determine for every opponent type which move would be selected. • Compute the MAX player’s value for these moves • Use opponent-type probabilities to compute the expected value of the MIN node • at MAX nodes: select maximum child 6th Computer Olympiad

  6. Learning in PrOM search • How do we assess the probabilities on the opponent types? • Off line: use games previously played by the opponent, to estimate the probabilities. (lot of time and - possibly - data available) • On line: use the observed moves during a game to adjust the probabilities.(only little time and few observations)prior probabilities are needed. 6th Computer Olympiad

  7. Off-Line Learning • Ultimate Learning Goal: find P**(opp) for a given opponent and given opponent types such that PrOM search plays the best against that opponent. • Assumption: PrOM search plays the best if P** = P*, where P*(opp) is the mixed strategy that predicts the moves of the opponent the best. 6th Computer Olympiad

  8. Off-Line Learning • How to obtain P*(opp)? • Input: a set of positions and the moves that the given opponent and all the given opponent types would select • “Algorithm”: P*(oppi) = Ni / N • But: leave out all ambiguous positions! (e.g. when more than one opponent type agree with the opponent) 6th Computer Olympiad

  9. Off-Line Learning • Case I: The opponent is using a mixed strategy P#(opp) of the given opponent types • Effective learning is possible (P*(opp)  P# (opp)) • More difficult if the opponent types are not independent 6th Computer Olympiad

  10. Not leaving out ambiguous events 5 opponent types P = (a,b,b,b,b) 20 moves 100 - 100,000 runs 100 samples 6th Computer Olympiad

  11. Leaving out ambiguous events 5 opponent types P = (a,b,b,b,b) 20 moves 10 - 100,000 runs 100 samples 6th Computer Olympiad

  12. Varying number of opponent types 2-20 opponent types P = (a,b,b,b,b) 20 moves 100,000 runs 100 samples 6th Computer Olympiad

  13. Off-Line Learning • Case 2: The opponent is using a different strategy. • Opponent types behave random but dependent(distribution of type i depends on type i-1) • Real opponent selects a fixed move 6th Computer Olympiad

  14. Learning error Learned probabilities 6th Computer Olympiad

  15. Fast On-Line Learning • At the principal MIN node, only the best moves for every opponent type are needed • Increase the probability of an opponent type slightly if the observed move is the same as the selected move of this opponent type only. Normalize all probabilities. • Drift to one opponent type is possible. 6th Computer Olympiad

  16. Slower On-Line LearningNaive Bayesian (Duda & Hart’73) • Compute the value of every move at the principal MIN node for every opponent type • Transform these values into conditional probabilities P(move | opp). • Compute P(opp | moveobs) using P*(opp) (Bayes rule) • take P*(opp) = a.P*(opp) + (1- a) P(opp | moveobs) 6th Computer Olympiad

  17. Naïve Bayesian Learning • In the end, drifting to 1-0 probabilities will occur almost always • Parameter a is very important for the actual performance: • amount of change in the probabilities • convergence • drifting speed • It should be tuned in a real setting 6th Computer Olympiad

  18. Conclusions • Effective off-line learning of probabilities is possible, when ambiguous events are disregarded. • Off-line learning also works if the opponent does not use a mixed strategy of known opponent types. • On-line learning must be tuned precisely to a given situation 6th Computer Olympiad

  19. Future Research • PrOM search and learning in real game playing • Zanzibar Bao (8x4 mancala) • LOA (some experiment with OM-search done) • Chess endgames 6th Computer Olympiad

More Related