330 likes | 525 Views
Repeated Games. APEC 8205: Applied Game Theory Fall 2007. Objectives. Understand the Class of Repeated Games Understand Conditions Under Which Non-Nash Play Can be Sustained as a Subgame Perfect Nash Equilibrium when a Game is Repeated Multiple Nash Equilibria Infinite Repetition.
E N D
Repeated Games APEC 8205: Applied Game Theory Fall 2007
Objectives • Understand the Class of Repeated Games • Understand Conditions Under Which Non-Nash Play Can be Sustained as a Subgame Perfect Nash Equilibrium when a Game is Repeated • Multiple Nash Equilibria • Infinite Repetition
Why study repeated games? • Many interactions in life are repeated. • Large retailers compete on a daily basis for customers. • Dana and I compete on a daily basis to decide who will make dinner and who will pickup around the house. • Mason and Spencer compete on a daily basis to see who gets to watch TV and who gets to play X-Box. • What is of interest in these type of repeated interactions? • Can players achieve better results than might occur in a single shot game? • Can players use the history of play to their advantage?
Some Terminology • G: Stage game (usually thought of in normal form). • Players: i = 1,..,N • ai Ai: Strategy space for player i. • a = (a1,…,aN) A = i = 1NAi: Strategy profile for all players. • ui(a): Player i’s payoff for strategy profile a. • u(a) = (u1(a),…, uN(a)): Vector of player payoffs for strategy profile a. • T: Number of times the stage game is repeated (could be infinite). • ait Ai : Player i’s strategy choice at time t. • at = (a1t,…,aNt) A = i = 1NAi: Strategy profile for all players at time t. • ht = (a1,…,at-1) At = t’ = 1t-1A: History of play at time t. • sit(ht) Ai: History dependent strategy. • st(ht) = (s1t(ht), …, sNt(ht)) A: History dependent strategy profile. • Ui(s1(h1),…, sT(hT)) = t=1Twitui(st(ht)): Player i’s payoff from the game. • U(s1(h1),…, sT(hT)) = (U1(s1(h1),…, sT(hT)),…,UN(s1(h1),…, sT(hT))): Payoffs for all players.
Consider and Example Suppose this Prisoner’s Dilemma game is played twice and that wit =1 for i = 1,2 and t = 1,2.
Two Period Prisoner’s Dilemma ExampleIn Extensive Form Player 1 C D Player 2 C D C D Player 1 Player 1 Player 1 Player 1 C D C D C D C D Player 2 Player 2 Player 2 Player 2 C D C D C D C D C D C D C D C D 4 4 2 5 5 2 3 3 2 5 0 6 3 3 1 4 5 2 3 3 6 0 4 1 1 4 4 1 2 2 3 3 Player 1’s Payoff Player 2’s Payoff
Two Period Prisoner’s Dilemma ExampleAfter Solving Stage 2 Subgames Player 1 C D Player 2 C D C D Player 1 Player 1 Player 1 Player 1 D D D D Player 2 Player 2 Player 2 Player 2 D D D D 3 3 1 4 4 1 2 2 Player 1’s Payoff Player 2’s Payoff
Two Period Prisoner’s Dilemma ExampleAfter Solving Game As Whole Player 1 D Player 2 D Therefore, the subgame perfect strategies are (strategy choice in stage 1, strategy choice in stage 2 given (D,D) in stage 1, strategy choice in stage 2 given (D,C) in stage 1, strategy choice in stage 2 given (C,D) in stage 1, strategy choice in stage 2 given (C,C) in stage 1) = (D,D,D,D,D) for both players. Player 1 D Player 2 D 2 2 Player 1’s Payoff Player 2’s Payoff
So, what is the point? • If the stage game of a finitely repeated game has a unique Nash equilibrium, then there is a unique subgame perfect equilibrium where that Nash equilibrium is played in every stage of the game! • But what can happen if there is not a unique equilibrium? • Or what if the stage game can be infinitely repeated?
What about multiple equilibria? Consider this modified version of the Prisoner’s Dilemma and assume T = 2 and wit = 1 for i = 1,2 and t = 1,2.
Starting with Period 2 • There are 9 possible histories for the 2nd period of this game: • (U,L), (U,C), (U,R), (M,L), (M,C), (M,R), (D,L), (D,C), and (D,R). • For any subgame starting from one of these histories, there are two potential Nash equilibria: (M,C) or (D,R). • Therefore, for an equilibrium strategy to be subgame perfect, it must have (M,C) or (R,D) in response to the history (x, y) for x = U, M, D and y = L, C, R in the first period.
Now Period 1 Consider the strategiess12(h1) = M if h1 = (U,L) and D otherwise & s22(h1) = C if h1 = (U,L) and R otherwise. With these strategies the players’ payoffs for the game starting in period 1 are: which yields a subgame perfect equilibrium with cooperative, Non-Nash stage game play in period 1!
What about infinite repetition? • First, two definitions: • Feasible Payoff: Any convex combination of the pure strategy profile payoffs. • iis feasible if i = sAsui(s) where s 0 for s A and sAs= 1. • Average Payoff : (1 - )t = 1 t - 1 ui(ht) where 1 0 is the discount factor. • Theorem (Friedman 1971): Let G be a finite, static game of complete information. Let (e1,…,eN) denote the payoffs from a Nash equilibrium of G, and let (x1,…,xN) denote any other feasible payoffs from G. If xi > ei for every i and if is sufficiently close to one, then there exists a subgame perfect Nash equilibrium of the infinitely repeated game G that achieves (x1,…,xN) as the average payoff. • Often referred to as the Folk Theorem, but there are now lots of different versions of this Folk Theorem.
What does this result mean? • In infinitely repeated games, we can get lots of subgame perfect equilibria. • These equilibria can include actions in a stage game that are not Nash equilibrium actions for that stage game. • You can get cooperative behavior in a Prisoner’s Dilemma! Lets see what I mean.
Consider the Prisoner’s Dilemma Consider the strategy: Play C in Period 1, Play C in period t > 1 if at’ = (C, C) for all t’ t, Otherwise playD. Can we find a discount rate such that this strategy is subgame perfect for this Prisoner’s Dilemma if it is repeated infinitely?
The answer to this question is yes! • Suppose Player j is playing this type of strategy. At any point in time, Player j has either chosen Din the past in response to i’s choice of D or he has always chosen C because i has always chosen C. So, we must consider whether the strategy above is a best response for player i under both of these circumstances.
If D has been chosen in the past, player j will always choose Din the future. What is optimal for i now will be optimal for i in the future due to infinite repetition. • Let VC & VD be the current value of playing strategy C & D. • If C is optimal, i’s payoff from here on out will be VC = 0 + VC such that VC= 0. • If D is optimal, i’s payoff from here on out will be VD = 1 + VDsuch that VD= 1/(1 - ). • VD > VC, so D is optimal.
If D has not been chosen in the past, player j will choose Cin the immediate future and will continue to do so as long as i does. But if i chooses D, j will follow suit from here on out. Again, what is optimal for i now will be optimal for i in the future due to infinite repetition. • If C is optimal, i’s payoff from here on out will be VC = 2 + VC such that VC= 2/(1 - ). • If D is optimal, i’s payoff from here on out will be VD = 3 + /(1 - ). • VC >/=/< VD when >/=/< ½.
To summarize • As long as > ½, this strategy will constitute a subgame perfect Nash strategy for the infinitely repeated Prisoner’s Dilemma. • This type of strategy is often referred to as a trigger strategy. • Bad behavior on the part of one player triggers bad behavior on the part of his opponent from here on after. Are there other trigger strategies that can work? YES!
General Trigger Strategy • Define • i*: equilibrium payoffs (per stage) • iD: defection payoff • iP: punishment payoffs (Nash equilibrium payoff per stage) • Assume iD > i* > iP • Cheating doesn’t pay when: or Are there other types of strategies that can work? YES! LOTS MORE!
So what are we to make of all this? • It does provide an explanation for cooperation in games where cooperation seems unlikely. • However, the explanation tells us that almost anything is possible. • So, what type of behavior can we expect? • The theory provides few answers. • There has been a lot of research on repeated Prisoner Dilemma games to understand the best way to play as well as how people actually play. Of particular interest is Axelrod (1984). Axelrod had researches submit various strategies and had computers play them to see which ones performed the best. Tit-for-Tat strategies tended to perform the best the best.
Application: Cournot Duopoly with Repeated Play • Who are the players? • Two Firms • Who can do what when? • Firms Choose Output Each Period (qit for i = 1,2) to Infinity & Beyond • Who knows what when? • Firms Know Output Choices for all Previous Periods • How are players rewarded based on what they do?
Cournot Nash Equilibrium Output q1C = q2C = qC = a/3 Profit 1C = 2C = C = a2/9 Collusive Monopoly Outcome Output q1M = q2M = qM = a/4 Profit 1M = 2M = M = a2/8 Stage Game Output & Profit Is it possible to sustain the collusive Monopoly outcome as a subgame perfect Nash equilibrium with infinite repetition?
Consider the Strategy • Period 1: qi1 = qM • Period t > 1: • qit = qM if qit’ = qjt’ = qM for t’ < t • qit = qCotherwise
Lets check to see if this proposed strategy is a subgame perfect Nash equilibrium. • To accomplish this, we need to show that the strategy is a Nash equilibrium in all possible subgames. • Our task is simplified here by the fact that there are only two distinct types of subgames: • qit’ ≠ qM or qjt’ ≠ qM for some t’ < t • qit’ = qjt’ = qM for all t’ < t
First consider qit’ ≠ qM or qjt’ ≠ qM for some t’ < t • With this history, the proposed strategy says both players should choose qC. • So, lets see what the optimal output in period t is for Firm i given Firm j will always choose qC.
Firm i’s optimal strategy is to choose the Cournot output just like the proposed strategy says!
Now consider qit’ = qjt’ = qM for some t’ < t • With this history, the proposed strategy says both players should choose qM. • So, lets see what the optimal output in period t is for Firm i given Firm j will always choose qMas long as Firm i chooses qM.
First, suppose that Firm i chooses qM in period t and forever after.
Now, suppose Firm i choose something other than qM in period t. Recall that we have already solved the optimization problem for which implies qis = qC for all s > t and
Finally, Firm i will prefer to choose the Monopoly output forever after if or Therefore, if the discount rate is high enough, the proposed strategy will constitute a subgame perfect Nash equilibrium in this infinitely repeated game!
Is this the only subgame perfect Nash equilibrium? • Hardly! • One criticism of trigger strategies like our proposed strategy is that they do not permit cooperation to be reestablished. • It is possible to find subgame perfect Nash equilibrium strategies that allow cooperation to be reestablished: • Period 1: qi1 = qM. • Period t > 1: • qit= qMif qjt – 1 = qM or qjt – 1 = x • x otherwise • Though defining and proving such strategies are subgame perfect can be an arduous task!