370 likes | 386 Views
Communication Networks. A Second Course. Jean Walrand Department of EECS University of California at Berkeley. Concave, Learning, Cooperative. Concave Games Learning in Games Cooperative Games. Concave Games. Motivation
E N D
Communication Networks A Second Course Jean Walrand Department of EECS University of California at Berkeley
Concave, Learning, Cooperative • Concave Games • Learning in Games • Cooperative Games
Concave Games Motivation • In many applications, the possible actions belong to a continuous set.For instance one chooses prices, transmission rates, or power levels. • In such situations, one specifies reward functions instead of a matrix or rewards. • We explain results on Nash equilibria for such games
Concave Games: Preliminaries Many situations are possible: 3 NE 1 NE No NE J.B. Rosen, “ Existence and Uniqueness of Equilibrium Points forConcave N-Person Games,” Econometrica, 33, 520-534, July 1965
Concave Game Definition: Concave Game Definition: Nash Equilibrium
Concave Game Proof Theorem: Existence
Concave Game Specialized Case:
Concave Game Definition: Diagonally Strictly Concave
Concave Game Theorem: Uniqueness
Concave Game Theorem: Uniqueness - Bilinear Case:
Concave Game Local Improvements
Learning in Games Motivation Examples Models Fictitious Play Stochastic Fictitious Play Fudenberg D. and D.K. Levine (1998), The Theory of Learning in Games. MIT Press, Cambridge, Massachusetts. Chapters 1, 2, 4.
Motivation Explain equilibrium as result of players “learning” over time (instead of as the outcome of fully rational players with complete information)
Examples: 1 Fixed Player Model If P1 is patient and knows P2 chooses her play based on her forecast of P1’s plays, then P1 should always play U to lead P2 to play R A sophisticated and patient player who faces a naïve opponent can develop a reputation for playing a fixed strategy and obtain the rewards of a Stackelberg leader Large Population Models Most of the theory avoids possibility above by assuming random pairings in a large population of anonymous users In such a situation, P1 cannot really teach much to the rest of the population, so that myopic play (D, L) is optimal Naïve play: Ignore that you affect other players’ strategies
Examples: 2 Cournot Adjustment Model Each player selects best response to other player’s strategy in previous period Converges to unique NE in this case This adjustment is a bit naïve … 2 BR1 BR2 NE 1
Models Learning Model: Specifies rules of individual players and examines their interactions in repeated game Usually: Same game is repeated (some work on learning from similar games) Fictitious Play: Players observe result of their own match, play best response to the historical frequency of play Partial Best-Response Dynamics: In each period, a fixed fraction of the population switches to a best response to the aggregate statistics from the previous period Replicator Dynamics: Share of population using each strategy grows at a rate proportional to that strategy’s current payoff
Fictitious Play Each player computes the frequency of the actions of the other players (with initial weights) Each player selects best response to the empirical distribution (need not be product) Theorem: Strict NE are absorbing for FP If s is a pure strategy and is steady-state for FP, then s = NE Proof: Assume s(t) = s = strict NE. Then, with a := a(t) …, p(t+1) = (1 – a)p(t) + ad(s), so that u(t+1, r) = (1 – a)u(p(t), r) + au(d(s), r),which is maximized by r = s if u(p(t), r) is maximized by r = s. Converse: If converges, this means players do not want to deviate, so limit must be NE…
Fictitious Play Assume initial weights (1.5, 2) and (2, 1.5). Then(T, T) (1.5, 3), (2, 2.5) (T, H), (T, H) (H, H), (H, H), (H, H) (H, T)… Theorem: If under FP empirical converge, then product converges to NE Proof: If strategies converge, this means players do not want to deviate, so limit must be NE… Theorem: Under FP, empirical converge if one of the following holds 2x2 with generic payoffs Zero-sum Solvable by iterated strict dominance … Note: Empirical distributions need not converge
Fictitious Play Assume initial weights (1, 20.5) for P1 and P2. Then(A, A) (2, 20.5) (B, B) (A, A) (B, B) (A, A), etc Empirical frequencies converge to NE However, players get 0 Correlated strategies, not independent(Fix: Randomize …)
Stochastic Fictitious Play Motivation: Avoid discontinuity in FP Hope for a stronger form of convergence: not only of the marginals, but also of the intended plays
Stochastic Fictitious Play Definitions: Reward of i = u(i, s) + n(i, si), n has positive support on interval BR(i, s)(si) = P[n(i, si) is s.t. si = BR to s] Nash Distribution: if si = BR(i, s), all i Harsanyi’s Purification Theorem: For generic payoffs, ND NE if support of perturbation 0. Key feature: BR is continuous and close to original BR. Matching Pennies
Stochastic Fictitious Play Theorem (Fudenberg and Kreps, 93): Assume 2x2 game has unique mixed NE If smoothing is small enough, then NE is globally stable for SFP Theorem (K&Y 95, B&H 96) Assume 2x2 game has unique strict NE The unique intersection of smoothed BR is a global attractor for SFPAssume 2x2 game has 2 strict NE and one mixed NE. The SFP converges to one of the strict NE, w.p. 1. Note: Cycling is possible for SFP in multi-player games
Stochastic Fictitious Play Other justification for randomization: Protection against opponent’s mistakes Learning rules should be Safe: average utility ≥ minmax Universally consistent: utility ≥ utility if frequency were known but not order of plays Randomization can achieve universal consistency (e.g., SFP)
Stochastic Fictitious Play Stimulus-Response (Reinforcement learning): Increase probability of plays that give good results General observation: It is difficult to discriminate learning models on the basis of experimental data: SFP, SR, etc. seem all about comparable
Cooperative Games • Motivation • Notions of Equilibrium • Nash Bargaining Equilibrium • Shapley Value
Cooperative Games: Motivation • The Nash equilibriums may not be the most desirable outcome for the players. • Typically, players benefit by cooperating. • We explore some notions of equilibrium that players achieve under cooperation.
Cooperative Games: Nash B.E. Definition: Nash Bargaining Equilibrium Interpretation: Fact
Cooperative Games: Nash B.E. Example: NBE: Social:
Cooperative Games: Nash B.E. Axiomatic Justification At NE, sum of relative increases is zero.
Shapley Value Example: Shapley Value:
Fixed Point Theorems Theorem (Brower):
Brower Labels (1, 3) Labels (2, 3) One path through doors (1, 2) mustend up in triangle (1, 2, 3).[Indeed: Odd numberof boundary doors.]
Brower Take small triangle (1, 2, 3) Divide it into triangles as before; it contains another (1, 2, 3); Continue in this way. Pick z(n) in triangle n. Let z = lim z(n). Claim: f(z) = z. Proof: If f(z) is not z, then z(n) and f(z(n)) are in different small triangles at stage n; but then z(n) cannot be in a (1, 2, 3) triangle ….
Notes on Kakutani Theorem: