1 / 17

On the Approximation Performance of Fictitious Play in Finite Games

On the Approximation Performance of Fictitious Play in Finite Games . Paul W. Goldberg U. Liverpool Rahul Savani U. Liverpool Troels Bjerre Sørensen U. Warwick Carmine Ventre U. Liverpool. Penalty-kick practice. Scenario : Every day two friends meet to practice penalty-kicks .

conlan
Download Presentation

On the Approximation Performance of Fictitious Play in Finite Games

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Approximation Performance of Fictitious Play in Finite Games Paul W. Goldberg U. Liverpool Rahul Savani U. Liverpool Troels Bjerre Sørensen U. Warwick Carmine Ventre U. Liverpool

  2. Penalty-kick practice Scenario: Every day two friends meet to practice penalty-kicks players shoot on the right, left or center of the goal dive on the right, left or center of the goal actions

  3. Penalty-kick game C R L R C L Q: How would a goalie “learn” to play this game?

  4. Fictitious play [Brown, 51] FP rule: Best respond to the empirical distribution of play of the opponent. Days 1 2 3 4 5 6 7 8 9 10 R C L L L L C L L L ‘s action R C L 1/10 R 2/10 C dives on the L 7/10 L Is it a “good” choice? Ie, is it a good algorithm strategically?

  5. Where does the name come from? • FP can also be seen as an algorithm for playing the game just once • Simulate what would happen in the repeated version of the game up to some predetermined round r • Output the empirical distribution • In the above example for r=10, the empirical distribution of is: R wp 1/10, C wp 2/10, L wp 7/10 • FP is a very simple iterative algorithm • Sometimes, advocated to model bounded rationality • Is FP strategically “good”?

  6. Fictitious play and Nash equilibria • The empirical distribution of play defined by FP converges to Nash equilibria for • constant-sum games [Robinson, 51] • non-degenerate 2 × 2 games [Miyasawa, 61] • 2 × n games [Berger, 05] • ... but it does not converge in general [Shapley, 64] R C L R C L R R C C L L

  7. Fictitious play and approximate NEs • Analysis of the strategic performances of FP done by means of approximate NEs • NE = no incentive to deviate • Ɛ-NE = little (Ɛ) incentive to deviate • Concept which assumed relevance given the PPAD-hardness of computing exact NEs [Daskalakis, Goldberg & Papadimitriou, 06] + [Chen & Deng, 06] • Payoffs normalized to [0,1] and additive approximation • [Conitzer, 09] proves: • For any game, Ɛ ≤ (r+1)/(2r) at round r • There exists an infinite game for which Ɛ = (r+1)/2r

  8. Approximation guarantee of FP round players’ actions Ɛ = 1 Ɛ = 0 By FP rule, si is a best response to the mixture of the first i-1 actions Ɛ for playing si is (r-i+1)/r2 Ɛ of FP is

  9. Approximation for finite games? round players’ actions Ɛ = 0 Ɛ = 1 • Re-using strategies may guarantee a significantly better approximation of FP • Experimentally, Shapley’s game (for which FP does not converge) has Ɛ ≈ 1/4 Ɛ for playing si at round i is less than (r-i+1)/r2

  10. Our contribution • We define a class of 4n × 4n symmetric games, n being a parameter, for which we show that FP fails to obtain any constant Ɛ< ½ • Specifically, we prove a lower bound of ½ - O(1/n1-δ) for any δ > 0 • We also give a “matching” upper bound of ½ - O(1/n)

  11. The game: row player’s payoff matrix • n=5 • α>1, β<1 • Blank entries stand for a 0 • Column player’s payoff matrix is the transpose of the above • Players share the same sequence of actions (simpler analysis)

  12. The role of α and β • α>1, β<1 • Blank entries stand for a 0 • Column player’s payoff matrix is the transpose of the above • Players share the same sequence of actions (simpler analysis)

  13. The last block and the induction

  14. Next ideas of the analysis • α and β govern the ratio between the probabilities of two consecutive actions • Ratios are such that • probabilities increase in geometric progression last n different actions played occupy all but an exponentially-small fraction of the probability mass best response has payoff around β (1-1/2n) ≈ 1- O(1/n1-δ) • probability distribution does not allocate much probability to any individual strategy payoff from FP distribution is around α/2 ≈ 1/2

  15. Upper bound round players’ actions • Ɛfor an action is defined by its last occurrence in the sequence • The maximum Ɛis given by the sequence m1, ..., m1, ... , mn, ..., mn r/n r/n

  16. Conclusions • FP is not good from a strategic point of view in terms of approximation guarantee to NEs for finite games • There is a class of finite games for which a cyclic behavior persists which leads to a poor guarantee (independently of the number of iterations) • Ie, fully rational player has always beaten his bounded rational friend

  17. Open problems • Is ½ a limit to the approximation performance obtainable by simple or decentralized algorithms? • Cf. algorithm of [Daskalakis, Mehta & Papadimitriou, 09] vs more complex centralized algorithms achieving a ratio better than a half • Consider more general class of algorithms • E.g., uncoupled dynamics defined by [Hart & Mas-Colell, 03] + [Hart & Mas-Colell, 06]

More Related