240 likes | 401 Views
A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation. Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department. Motivation: Poker. Poker games are wildly popular card games 2006 World Series of Poker
A competitive Texas Hold’em poker player via automated abstraction andreal-time equilibrium computation Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department
Motivation: Poker • Poker games are wildly popular card games • 2006 World Series of Poker • $82M at World Championship event • Portions broadcast on ESPN • Presents several challenges for AI • Imperfect information • Risk assessment and management • Deception (bluffing, slow-playing) • Counter-deception (calling a bluff, addressing slow play)
Prior poker research • Simulation/Learning [e.g. Findler 77, Billings et al 99, 02] • Do not take multi-agent aspect directly into account • Game-theoretic • Small games [e.g. vN-M 44, Nash & Shapley 50, Kuhn 50] • Tournament games [Miltersen & Sørensen 06] • Manual abstraction for large games • “Approximating Game-Theoretic Optimal Strategies for Full-scale Poker”, Billings, Burch, Davidson, Holte, Schaeffer, Schauenberg, Szafron, IJCAI-03 • Ours: Automated abstraction for large games • As computing speed increases, we can automatically take advantage of it by simply rerunning the abstraction algorithm with a different parameter to produce a finer-grained abstraction • We apply our techniques to Texas Hold’em poker, the most popular poker variant
Computing equilibrium • In two-person zero-sum games, • Nash equilibria are minimax equilibria, so there is no equilibrium selection problem • Equilibrium can be found using LP • Any extensive form game (satisfying perfect recall) can be converted into a matrix game • Create one pure strategy in the matrix game for every possible pure contingency plan in the sequential game (cross product of actions at information sets) • Leads to exponential blowup in number of strategies, even in the reduced normal form • Sequence form: More compact representation based on sequences of moves rather than pure strategies [von Stengel 96, Koller & Megiddo 92, Romanovskii 62] • Two-person zero-sum games with perfect recall can be solved in time polynomial in size of game tree • Not enough to solve Rhode Island Hold’em (3.1 billion nodes) or Texas Hold’em (1018 nodes)
Our prior work on automated abstraction [EC-06] • Automatic method for performing abstractions in a broad class of sequential games of imperfect information • Equilibrium-preserving game transformation, where certain information sets are merged and certain nodes within an information set are collapsed • GameShrink, algorithm for identifying and applying all the game transformations • Õ(n2) time • n = #nodes in the signal tree. In poker, these are possible card deals in the game • Run-time tends to be highly sublinear in the size of the game tree • Used these techniques to solve Rhode Island Hold’em • Largest poker game solved to date by over four orders of magnitude • Also developed approximate (lossy) version of GameShrink • Uses a similarity metric on nodes in the signal tree (e.g., |#wins1 - #wins2| + |#losses1 - #losses2|) and a similarity threshold
Example: Applying the ordered game isomorphic abstraction transformation
Optimized approximate abstractions • Original version of GameShrink yielded lopsided abstractions when used as an approximation algorithm • Now we instead find an abstraction via clustering: • For each level of the tree (starting from root): • For each group of hands: • use k-means clustering to split group i into ki abstract “states” • win probability as the similarity metric (ties count as half a win) • for each value of ki, compute expected error (considering hand probs) • We find, using integer programming, an abstraction (split of K into ki’s) that minimizes this expected error, subject to a constraint on the total number of states, K, at that level • (=size of the resulting LP in the zero-sum case) • Solving this class of integer programs is quite easy in practice
Application to Texas Hold’em • Two-person game tree has ~1018 leaves • Too large to run lossless GameShrink • Even after that, LP would be too large • Already too large when we applied this to first two rounds • We split the 4 betting rounds into two phases • Phase I (first 3 rounds) solved offline using new approximate version of GameShrink followed by LP • Phase II (last 2 rounds): • abstractions computed offline • real-time equilibrium computation using updated hand probabilities and anytime LP
Phase I (first three rounds) • Payoffs at leaves computed assuming rollout for rest of the game • Automated abstraction using approximate version of GameShrink • Round 1 • There are 1,326 hands, of which 169 are strategically different • We consider 15 strategically different hands • Round 2 • There are 25,989,600 distinct possible hands • GameShrink (in lossless mode for Phase I) determines that there are about a million strategically different hands • This is still too large to solve • We used GameShrink to compute an abstraction that considers 225 strategically different hands • Round 3 • There are 1,221,511,200 distinct possible hands • We consider 900 strategically different hands • This process took about 3 days running on 4 CPUs • LP solve took 7 days and 80 gigabytes using CPLEX’s barrier method (interior-point method for linear programming)
Mitigating effect of round-based abstraction (i.e., having 2 phases) • For leaves in the first phase, we could assume no betting in the later rounds • Ignores implied odds • Can do better by estimating the amount of betting that occurs in later rounds • Incorporate this information into the LP for the first phase • For each possible hand strength and in each possible betting situation, we store the probability of each possible action • Mine the betting history in the later rounds from hundreds of thousands of played hands
Example of betting in fourth round Player 1 has bet. Player 2 to fold, call, or raise
Phase II (last two rounds) • Abstractions computed offline • Betting history doesn’t matter => ( ) situations • Simple suit isomorphisms at the root of Phase II halves this • For each such setting, we use GameShrink to generate an abstraction with 10 and 100 strategically different hands in the last two rounds, respectively • Real-time equilibrium computation (using LP) • So that our strategies are specific to particular hand (too many to precompute) • Updated hand probabilities from Phase I equilibrium using betting histories and community card history: • si is player i’s strategy, h is an information set • Conditional choice of primal vs. dual simplex • Achieve anytime capability for the player that is us • Dealing with running off the equilibrium path 52 4
Precompute several databases • db5: possible wins and losses (for a single player) for every combination of two hole cards and three community cards (25,989,600 entries) • Used by GameShrink for quickly comparing the similarity of two hands • db223: possible wins and losses (for both players) for every combination of pairs of two hole cards and three community cards based on a roll-out of the remaining cards (14,047,378,800 entries) • Used for computing payoffs of the Phase I game to speed up the LP creation • handval: concise encoding of a 7-card hand rank used for fast comparisons of hands (133,784,560 entries) • Used in several places, including in the construction of db5 and db223 • Colexicographical ordering used to compute indices into the databases allowing for very fast lookups
Experimental results • GS1: Game theory-based player, old version of manual abstraction, no strategy simulation in later rounds [GS 2006] • Sparbot: Game theory-based player, manual abstraction [Billings et al 2003] • Vexbot: Opponent modeling, miximax search with statistical sampling [Billings et al 2004]
Summary • Competitive Texas Hold’em player automatically generated • First phase (rounds 1, 2 & 3): automated abstraction & LP solved offline, using statistical data to compute payoffs at end of round 3 • Second phase (rounds 3 & 4): abstraction precomputed automatically; LP solved in real-time using updated hand probabilities and anytime • Techniques are applicable to many sequential games of imperfect information
Where to from here? • The top poker-playing programs are fairly equal • Recent experimental results show our player is competitive with (but not better than) expert human players • Provable approximation, e.g., ex post • Other types of abstraction • More scalable equilibrium-finding algorithms • Tournament poker [e.g. Miltersen & Sørensen 06] • More than two players [e.g. Nash & Shapley 50] Thank you