60 likes | 203 Views
Game Theory-Based Opponent Modeling in Large Imperfect-Information Games. Tuomas Sandholm Carnegie Mellon University Computer Science Department Joint work with Sam Ganzfried. Traditionally two approaches. Game theory approach ( abstraction+equilibrium finding)
E N D
Game Theory-Based Opponent Modeling in Large Imperfect-Information Games Tuomas Sandholm Carnegie Mellon University Computer Science Department Joint work with Sam Ganzfried
Traditionally two approaches • Game theory approach (abstraction+equilibrium finding) • Safe in 2-person 0-sum games • Doesn’t maximally exploit weaknesses in opponent(s) • Opponent modeling • Get-taught-and-exploited problem [Sandholm AIJ-07] • Needs prohibitively many repetitions to learn in large games (loses too much during learning) • Crushed by game theory approach in Texas Hold’em…even with just 2 players and limit betting • Same tends to be true of no-regret learning algorithms
Let’s hybridize the two approaches • Start playing based on game theory approach • As we learn opponent(s) deviate from equilibrium, start adjusting our strategy to exploit their weaknesses
The dream of safe exploitation • Wish: Let’s avoid the get-taught-and-exploited problem by exploiting only to an extent that risks what we have won so far • Proposition. It is impossible to exploit to any extent (beyond what the best equilibrium strategy would exploit) while preserving the safety guarantee of equilibrium play • So we give up some on worst-case safety …
Deviation-Based Best Response (DBBR) algorithm(can be generalized to multi-player non-zero-sum) • Many ways to determine opponent’s “best” strategy that is consistent with observations • L1 or L2 distance to equilibrium strategy • Custom weight-shifting algorithm • ... Dirichlet prior
Experiments • Performs significantly better in 2-player Limit Texas Hold’em against trivial opponents, and weak opponents from AAAI computer poker competitions, than game-theory-based base strategy • Can be turned on only against weak opponents • Examples of winrate evolution: