270 likes | 726 Views
Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department Motivation: Poker Poker is a wildly popular card game
E N D
Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department
Motivation: Poker • Poker is a wildly popular card game • This year’s World Series of Poker prize pool surpassed $103 million, including $56 million for the World Championship event • ESPN is broadcasting parts of the tournament • Poker presents several challenges for AI • Imperfect information • Risk assessment and management • Deception (bluffing, slow-playing) • Counter-deception (calling a bluff)
Rhode Island Hold’em poker: The Deal
Rhode Island Hold’em poker: Round 1
Rhode Island Hold’em poker: Round 2
Rhode Island Hold’em poker: Round 3
Rhode Island Hold’em poker: Showdown
Sneak preview of results:Solving Rhode Island Hold’em poker • Rhode Island Hold’em poker invented as a testbed for AI research [Shi & Littman 2001] • Game tree has more than 3.1 billion nodes • Previously, the best techniques did not scale to games this large • Using our algorithm we have computed optimal strategies for this game • This is the largest poker game solved to date by over four orders of magnitude
Outline of this talk • Game-theoretic foundations: Equilibrium • Model: Ordered games • Abstraction mechanism: Information filters • Strategic equivalence: Game isomorphisms • Algorithm: GameShrink • Solving Rhode Island Hold’em
Game Theory • In multi-agent systems, an agent’s outcome depends on the actions of the other agents • Consequently, an agent’s optimal action depends on the actions of the other agents • Game theory provides guidance as to how an agent should act • A game-theoretic equilibrium specifies a strategy for each agent such that no agent wishes to deviate • Such an equilibrium always exists [Nash 1950]
Rock Paper Scissors 1/3 1/3 1/3 Rock 1/3 Paper 1/3 Scissors 1/3 A simple example
Complexity of computing equilibria • Finding a Nash equilibrium is “A most fundamental computational problem whose complexity is wide open [and] together with factoring … the most important concrete open question on the boundary of P today” [Papadimitriou 2001] • Even for games with only two players • There are algorithms (requiring exponential-time in the worst-case) for computing Nash equilibria • Good news: Two-person zero-sum matrix games can be solved in poly-time using linear programming
What about sequential games? • Sequential games involve turn-taking, moves of chance, and imperfect information • Every sequential game can be converted into a simultaneous-move game • Basic idea: Make one strategy in the simultaneous-move game for every possible action in every possible situation in the sequential game • This approach leads to an exponential blowup in the number of strategies
Sequence form representation • The sequence form is an alternative representation that is more compact [Koller, Megiddo, von Stengel, Romanovskii] • Using the sequence form, two-player zero-sum games with perfect recall can be solved in time polynomial in the size of the game tree • But, Texas Hold’em has 1018 nodes
Our approach • Instead of developing an equilibrium-finding algorithm per se, we instead introduce an automated abstraction technique that results in a smaller, equivalent game • We prove that a Nash equilibrium in the smaller game corresponds to a Nash equilibrium in the original game • Our technique applies to n-player sequential games with observed actions and ordered signals
Abstracted game Abstraction Compute Nash Illustration of our approach Original game Nash equilibrium Nash equilibrium
I = {1,2} κ = (0,1,1) γ = (1,0,0) Θ = {2♠,…,A♦} Uniform Hand rank Game with ordered signals(a.k.a. ordered game) • Players I = {1,…,n} • Stage games G = G1,…,Gr • Player label L • Game-ending nodes ω • Signal alphabet Θ • Signal quantities κ = κ1,…,κr and γ = γ1,…,γr • Signal probability distribution p • Partial ordering ≥ of subsets of Θ • Utility function u (increasing in private signals)
Information filters • Observation: We can make games smaller by filtering the information a player receives • Instead of observing a specific signal exactly, a player instead observes a filtered set of signals • E.g. receiving the signal {A♠,A♣,A♥,A♦} instead of A♠ • Combining an ordered game and a valid information filter yields a filtered ordered game • Prop.A filtered ordered game is a finite sequential game with perfect recall • CorollaryIf the filtered ordered game is two-person zero-sum, we can solve it in poly-time using linear programming
Filtered signal trees • Every filtered ordered game has a corresponding filtered signal tree • Each edge corresponds to the revelation of some signal • Each path corresponds to the revelation of a set of signals • Our algorithms operate directly on the filtered signal tree • We never load the full game representation into memory
Ordered game isomorphic relation • The ordered game isomorphic relation captures the notion of strategic symmetry between nodes • We define the relationship recursively: • Two leaves are ordered game isomorphic if the payoffs to all players are the same at each leaf, for all action histories • Two internal nodes are ordered game isomorphic if they are siblings and there is a bijection between their children such that only ordered game isomorphic nodes are matched • We can compute this relationship efficiently using dynamic programming and perfect matching computations in a bipartite graph
Ordered game isomorphic abstraction transformation • This operation transforms an existing information filter into a new filter that merges two ordered game isomorphic nodes • The new filter yields a smaller, abstracted game • ThmIf a strategy profile is a Nash equilibrium in the smaller, abstracted game, then it is a Nash equilibrium in the original game
GameShrink: Efficiently computing ordered game isomorphic abstraction transformations • Recall: we have a dynamic program for determining if two nodes of the filtered signal tree are ordered game isomorphic • Algorithm: Starting from the top of the filtered signal tree, perform the transformation where applicable • Approximation algorithm: instead of requiring perfect matching, instead require a matching with a penalty below some threshold
GameShrink: Efficiently computing ordered game isomorphic abstraction transformations • The Union-Find data structure provides an efficient representation of the information filter • Linear memory and almost linear time • Can eliminate certain perfect matching computations by using easy-to-check necessary conditions • Compact histogram databases for storing win/loss frequencies to speed up the checks
Solving Rhode Island Hold’em poker • GameShrink computes all ordered game isomorphic abstraction transformations in under one second • Without abstraction, the linear program has 91,224,226 rows and columns • After applying GameShrink, the linear program has only 1,237,238 rows and columns • By solving the resulting linear program, we are able to compute optimal min-max strategies for this game • CPLEX Barrier method takes 7 days, 17 hours and 25 GB RAM to solve • This is the largest poker game solved to date by over four orders of magnitude
Comparison to previous research • Rule-based • Limited success in even small poker games • Simulation/Learning • Do not take multi-agent aspect into account • Game-theoretic • Manual abstraction • “Approximating Game-Theoretic Optimal Strategies for Full-scale Poker”, Billings, Burch, Davidson, Holte, Schaeffer, Schauenberg, Szafron, IJCAI-03. Distinguished Paper Award. • Automated abstraction
Directions for future work • Computing strategies for larger games • Requires approximation of solutions • Tournament poker • More than two players • Other types of abstraction
Summary • Introduced an automatic method for performing abstractions in a broad class of games • Introduced information filters as a technique for working with games with imperfect information • Developed an equilibrium-preserving abstraction transformation, along with an efficient algorithm • Described a simple extension that yields an approximation algorithm for tackling even larger games • Solved the largest poker game to date • Playable on-line at http://www.cs.cmu.edu/~gilpin/gsi.html Thank you very much for your interest