310 likes | 419 Views
Nesterov’s excessive gap technique and poker. Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas Sandholm. Outline. Two-person zero-sum sequential games First-order methods for convex optimization
E N D
Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas Sandholm
Outline • Two-person zero-sum sequential games • First-order methods for convex optimization • Nesterov’s excessive gap technique (EGT) • EGT for sequential games • Heuristics for EGT • Application to Texas Hold’em poker
We want to solve: If Q1 and Q2 are simplices, this is the Nash equilibrium problem for two-person zero-sum matrix games If Q1 and Q2 are complexes, this is the Nash equilibrium problem for two-person zero-sum sequential games
What’s a complex? It’s just like a simplex, but more complex. Each player’s complex encodes her set of realization plans in the game In particular, player 1’s complex is where E and e depend on the game…
Recall our problem: where Q1 and Q2 are complexes Since Q1 and Q2 have a linear description, this problem can be solved as an LP. However, current LP solution methods do not scale
(Un)scalability of LP solvers • Rhode Island poker [Shi & Littman 01] • LP has 91 million rows and columns • Applying GameShrink automated abstraction algorithm yields an LP with only 1.2 million rows and columns, and 50 million non-zeros [G. & Sandholm, 06a] • Solution requires 25 GB RAM and over a week of CPU time • Texas Hold’em poker • ~1018 nodes in game tree • Lossy abstractions need to be performed • Limitations of current solver technology primary limitation to achieving expert-level strategies [G. & Sandholm 06b, 07a] • Instead of standard LP solvers, what about a first-order method?
Convex optimization Suppose we want to solve Note that this formulation captures ALL convex optimization problems (can model feasible space using an indicator function) where f is convex. For general f, convergence requires O(1/ε2) iterations (e.g., for subgradient methods) For smooth, strongly convex f with Lipschitz- continuous gradient, can be done in O(1/ε½) iterations Analysis based on black-box oracle access model. Can we do better by looking inside the box?
Strong convexity A function is strongly convex if there exists such that for all and all is the strong convexity parameter of d
Recall our problem: where Q1 and Q2 are complexes Equivalently: where and
, , Unfortunately, Φ and f are non-smooth Fortunately, they have a special structure Let d1,d2 be smooth and strongly convex on Q1,Q2 These are called prox-functions Now let μ > 0 and consider: These are well-defined smooth functions
Excessive gap condition From weak duality, we have that f(y) ≤ Φ(x) The excessive gap condition requires that fμ(y) ≤ Φμ(x) (EGC) The algorithm maintains (EGC), and gradually decreases μ As μ decreases, the smoothed functions approach the non-smooth functions, and thus iterates satisfying (EGC) converge to optimal solutions
Nesterov’s main theorem Theorem [Nesterov 05] There exists an algorithm such that after at most N iterations, the iterates have duality gap at most Furthermore, each iteration only requires solving three problems of the form and performing three matrix-vector product operations on A.
Nice prox functions A prox function d for Q is nice if it is: • Strongly convex continuous everywhere in Q, and differentiable in the relative interior of Q • The min of d over Q is 0 • The following maps are easily computable:
Nice simplex prox function 2: Euclidean sargmax can be computed in O(n log n) time
From the simplex to the complex Theorem [Hoda, G., Peña 06] A nice prox function can be constructed for the complex via a recursive application of any nice prox function for the simplex
Prox function example Let be any nice simplex prox function. The prox function for this matrix is:
Heuristics [G., Hoda, Peña, Sandholm 07] • Heuristic 1: Aggressive μ reduction • The μ given in the previous algorithm is a conservative choice guaranteeing convergence • In practice, we can do much better by aggressively pushing μ, while checking that the excessive gap condition is satisfied • Heuristic 2: Balanced μ reduction • To prevent one μ from dominating the other, we also perform periodic adjustments to keep them within a small factor of one another
Matrix-vector multiplication in poker[G., Hoda, Peña, Sandholm 07] • The main time and space bottleneck of the algorithm is the matrix-vector product on A • Instead of storing the entire matrix, we can represent it as a composition of Kronecker products • We can also effectively take advantage of parallelization in the matrix-vector product to achieve near-linear speedup
Poker • Poker is a recognized challenge problem in AI because (among other reasons) • the other players’ cards are hidden; • bluffing and other deceptive strategies are needed in a good player; • there is uncertainty about future events. • Texas Hold’em: most popular variant of poker • Two-player game tree has ~1018 nodes
Potential-aware automated abstraction[G., Sandholm, Sørensen 07] • Most prior automated abstraction algorithms employ a myopic expected value computation as a similarity metric • This ignores hands like flush draws where although the probability of winning is small, the payoff could be high • Our newest algorithm considers higher-dimensional spaces consisting of histograms over abstracted classes of states from later stages of the game • This enables our bottom-up abstraction algorithm to automatically take into account positive and negative potential
Solving the four-round model • Computed abstraction with • 20 first-round buckets • 800 second-round buckets • 4800 third-round buckets • 28800 fourth-round buckets • Algorithm using 30 GB RAM • Simply representing as an LP requires 32 TB • Outputs new, improved solution every 2.5 days
Future research • Customizing second-order (e.g. interior-point methods) for the equilibrium problem • Additional heuristics for improving practical performance of EGT algorithm • Techniques for finding an optimal solution from an ε-solution