280 likes | 417 Views
Recent progress in computing approximate Nash equilibria. Paul W. Goldberg Dept. of Computer Science University of Liverpool. Nash equilibrium. 2 players, each with a set of n pure strategies
E N D
Recent progress in computing approximate Nash equilibria Paul W. Goldberg Dept. of Computer Science University of Liverpool
Nash equilibrium 2 players, each with a set of n pure strategies • For each pair (i,j), a payoff is specified for each player: R(i, j) for player 1 and C(i, j) for player 2. • These payoffs can be placed into 2 n×n matrices R and C. • We want probability distributions x and y over the players’ strategies such that their expected payoffs cannot be increased by either player changing his distribution: xTRy ≥ (x’)T Ry for all distributions x’ over player 1’s strategies xTCy ≥ xTC(y’) for all distributions y’ over player 2’s strategies
Nash equilibrium 1/3 1/3 1/3 C R R P S 1/3 R -1 1 1 -1 1/3 P 1 -1 -1 1 1/3 S -1 1 1 -1 Rock-paper-scissors
Nash equilibrium 1/3 5/12 1/4 C R R P S 1/3 R -1 2 1 -1 1/3 P 1 -1 -1 1 1/3 S -1 1 1 -1 (thanks to Rahul Savani’s on-line NE program.)
Computing Nash equilibria • Some pre-history: Nash equilibria are “hard” to compute exactly • But, there are notions of approximate NE… (ε-Nash equilibrium) • So, for what values of ε can we compute approximate NE? • (obvious analogy with approximation algorithms for NP-complete problems)
ε-Nash equilibrium • exact NE: “no incentive to deviate” • ε-NE: gain of at most ε when you deviate • let x and y denote the row and column players’ mixed strategies; let eibe vector with 1 in compt i, zero elsewhere. • For all i, xTRy ≥ eiTRy-ε. • For all j, xTCy ≥ xTCei-ε. • Assume payoffs are re-scaled into [0,1]
A simple algorithm [Daskalakis, Mehta and Papadinitriou, WINE 2006] C R 1 2 3 1/2 1 0 0.1 0.2 0.2 0.9 0.2 2 0.3 0.4 0.5 0.2 0.1 0.2 3 0.6 0.7 0.8 0.2 0.2 0.8 ●1 Player 1 chooses arbitrary strategy i; gives it probability ½
A simple algorithm [Daskalakis, Mehta and Papadinitriou, WINE 2006] 1 C R 1 2 3 1/2 1 0 0.1 0.2 0.2 0.9 0.2 2 0.3 0.4 0.5 0.2 0.1 0.2 3 0.6 0.7 0.8 0.2 0.2 0.8 ●2 Player 2 chooses best response j; gives it probability 1
A simple algorithm [Daskalakis, Mehta and Papadinitriou, WINE 2006] 1 C R 1 2 3 1/2 1 0 0.1 0.2 0.2 0.9 0.2 2 0.3 0.4 0.5 0.2 0.1 0.2 1/2 3 0.6 0.7 0.8 0.2 0.2 0.8 ●3 Player 1 chooses best response to j; gives it probability ½
Can we improve this algorithm? (i.e. is there an “incremental” improvement?) e.g. Player 1 did not choose a “good” strategy to begin with…
No! [Feder, Nazerzadeh and Saberi, EC 2007]: To get a better approximation than ½, strategies need support of size Θ(log n), where n is number of strategies. Proof: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ●1 Consider a u.a.r. zero-sum win-lose n×n game
[Feder, Nazerzadeh and Saberi, EC 2007]: To get a better approximation than ½, strategies need support of size Θ(log n), where n is number of strategies. Proof (continued): 1/n 1 1 1 1 1 1 1/n 1 1 1 1 1 1 1/n 1 1 1 1 1 1 1/n 1 1 1 1 1 1 1/n 1 1 1 1 1 1 1/n 1 1 1 1 1 1 ●2 If player 1 uses uniform dist, he gets payoff about ½, whatever player 2 does…
[Feder, Nazerzadeh and Saberi, EC 2007]: To get a better approximation than ½, strategies need support of size Θ(log n), where n is number of strategies. Proof (continued): 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ●3 If player 1 uses only one strategy, player 2’s best response leaves him with nothing!
[Feder, Nazerzadeh and Saberi, EC 2007]: To get a better approximation than ½, strategies need support of size Θ(log n), where n is number of strategies. Proof (continued): 1 1 1 1 1 1 1 0.4 1 1 1 1 1 1 1 1 1 1 1 1 0.6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ●4 Indeed, if player 1 mixes just 2 strategies, w.h.p. player 2 has a response that leaves player 1 with nothing…
[Feder, Nazerzadeh and Saberi, EC 2007]: To get a better approximation than ½, strategies need support of size Θ(log n), where n is number of strategies. Proof (continued): 1 1 1 1 1 1 1 0.4 1 1 1 1 1 1 1 1 1 1 1 1 0.6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Similarly for any constant-sized support, indeed less than (say) log(n)/2, in general.
How big a support do you need? • O(log(n)) is also an upper bound (for any constant ε) [Althofer 1994; Lipton, Markakis and Mehta, EC 2003 (extended result from 2-player to multi-player)] ●Define an “empirical NE” as: draw N samples from x and y; replace x,y with resulting empirical distributions x and y.
Example 0.27 0.30 0.43 C R R P S 0.36 R -1 1 1 -1 0.29 P 1 -1 -1 1 0.35 S -1 1 1 -1 If N=100, empirical NE for rock-paper-scissors might look like this
From player 1’s perspective, suppose player 2 replaces y with an empirical distribution based on N = O(log(n/ε2). With high probability, any pure strategy i gets about the same payoff as before, eiTRy= eiTRy +O(ε) yhas support O(log(n/ε2)), so if we do the same thing with x we get the desired result.
Support enumeration Note that it follows that for any εwe can find ε-NE in time O(nlog(n)). (This was pointed out in the Lipton et al paper; another context where support enumeration “works” is on randomly-generated games [Bárány, Vempala and Vetta, FOCS ’05].)
Breaking the ε=½ Barrier [Bosse, Byrka and Markakis, WINE 07] Recall player 1’s initial strategy i may be poor, but now we know that alternative pure strategy won’t necessarily help. Original game is (R,C); solve zero-sum game (R-C,C-R); let x0and y0be player 1 and 2’s strategies in the solution. Let α be a parameter of the algorithm; if x0and y0are an α-NE, use them, else continue…
Breaking the ε=½ Barrier [Bosse, Byrka and Markakis, WINE 07] Let j be player 2’s best response to x0; player 2 uses pure strategy j. (BTW, assume player 2’s regret is at least player 1’s) Let k be player 1’s pure best response to j; player 1 uses a mixture of x0and k. Mixture coefficient of k is (1-r)/(2-r) where r is player 1’s regret in the solution to the zero-sum game.
Breaking the ε=½ Barrier [Bosse, Byrka and Markakis, WINE 07] Optimal choice of α is (3-√5)/2=0.382… . Comment: Why does this work? When player 2 changes his mind (from using y0) he is to some extent helping player 1; y0arose from a game where player 2 tries to hurt player 1 as well as help himself. In the paper, they tweak the algorithm to reduce the ε-value down to 0.364. If fact, a previous paper obtained 0.384+ζ…
0.384+ζ approximation [Daskalakis, Mehta and Papadimitriou, EC 2007] General idea:construct a LP that is satisfied by approximate solutions (x,y) to the game (R,C) Suppose (x*,y*) is a NE with payoffs v1, v2 to players 1 and 2 resp. Suppose (x, y)is an empirical NE for N= 4/ζ2. We can assume we have been given v1, v2, (x, y). ●(1): check that xTRy≈ v1 (and similarly for column player)
0.384+ζ approximation (x*,y*) is a NE with payoffs v1, v2 to players 1 and 2 resp. (x, y)is an empirical NE for N= 4/ζ2. ●(1): check that xTRy≈ v1 (and similarly for column player) ●2: Find (x’,y’) that satisfy xTRy’ ≥ v1 - 3ζ/2 For all i, eiTRy’ ≤ v1 + ζ/2 x’TRy ≥ v1 - 3ζ/2 plus a similar set of constraints for the C matrix.
0.384+ζ approximation ●(1): check that xTRy≈ v1 (and similarly for column player) ●2: Find (x’,y’) that satisfy xTRy’ ≥ v1 - 3ζ/2 For all i, eiTRy’ ≤ v1 + ζ/2 x’TRy ≥ v1 - 3ζ/2 plus a similar set of constraints for the C matrix. ●3: If max(v1,v2) ≥ ⅓ then return a certain mixture of x with x’; y with y’; else return (x’,y’).
●(1): check that xTRy≈ v1 (and similarly for column player) ●2: Find (x’,y’) that satisfy xTRy’ ≥ v1 - 3ζ/2 For all i, eiTRy’ ≤ v1 + ζ/2 x’TRy ≥ v1 - 3ζ/2 plus a similar set of constraints for the C matrix. ●3: If max(v1,v2) ≥ ⅓ then return a certain mixture of x with x’; y with y’; else return (x’,y’). If v1 and v2 are both <⅓, these constraints ensure that there is not too much to gain by defecting to any pure strategy.
●(1): check that xTRy≈ v1 (and similarly for column player) ●2: Find (x’,y’) that satisfy xTRy’ ≥ v1 - 3ζ/2 For all i, eiTRy’ ≤ v1 + ζ/2 x’TRy ≥ v1 - 3ζ/2 plus a similar set of constraints for the C matrix. ●3: If max(v1,v2) ≥ ⅓ then return a certain mixture of x with x’; y with y’; else return (x’,y’). If v1 (say) is at least ⅓, these constraints ensure that the mixture distribution has a good performance.
Conclusions • The algorithms are not randomized, but the analysis often uses randomness • plenty of open problems…