Evolution and Repeated Games

Evolution and Repeated Games D. Fudenberg (Harvard) E. Maskin (IAS, Princeton)

Theory of repeated games important • central model for explaining how self-interested agents can cooperate • used in economics, biology, political science and other fields

But theory has a serious flaw: • although cooperative behavior possible, so is uncooperative behavior (and everything in between) • theory doesn’t favor one behavior over another • theory doesn’t make sharp predictions

Evolution (biological or cultural) can promote efficiency • might hope that uncooperative behavior will be “weeded out” • this view expressed in Axelrod (1984)

Basic idea: • Start with population of repeated game strategy Always D • Consider small group of mutants using Conditional C (Play C until someone plays D, thereafter play D) • does essentially same against Always D as Always D does • does much better against Conditional C than Always D does • Thus Conditional C will invade Always D • uncooperative behavior driven out

But consider ALT Alternate between C and D until pattern broken, thereafter play D • can’t be invaded by some other strategy • other strategy would have to alternate or else would do much worse against ALT than ALT does • Thus ALT is “evolutionarily stable” • But ALT is quite inefficient (average payoff 1)

Still, ALT highly inflexible • relies on perfect alternation • if pattern broken, get D forever • What if there is a (small) probability of mistake in execution?

Consider mutant strategy identical to ALT except if (by mistake) alternating pattern broken • “intention” to cooperate by playing C in following period • if other strategy plays C too, • if other strategy plays D,

Main results in paper (for 2-player symmetric repeated games) • If s evolutionarily stable and • discount rate r small (future important) • mistake probability p small (but p > 0) then s (almost) “efficient” (2) If payoffs (v, v) “efficient”, then exists ES strategy s (almost) attaining (v, v) provided • r small • p small relative to r • generalizes Fudenberg-Maskin (1990), in which r = p = 0

Finite symmetric 2–player game • if • normalize payoffs so that

strongly efficient if

Repeated game: g repeated infinitely many times • period t history • H = set of all histories • repeated game strategy • assume finitely complex (playable by finite computer) • in each period, probability p that i makes mistake • chooses (equal probabilities for all actions) • mistakes independent across players

informally, s evolutionarily stable (ES), if no mutant can invade population with big proportion s and small proportion • formally, s is ES w.r.t. if for all and all • evolutionary stability • expressed statically here • but can be given precise dynamic meaning

population of • suppose time measure in “epochs” T = 1, 2, . . . • strategy state in epoch T • most players in population use • group of mutants (of size a)plays s' a drawn randomly from s' drawn randomly from finitely complex strategies • M random drawings of pairs of players • each pair plays repeated game • = strategy with highest average score

Theorem 1: For any exists such that, for all there exists such that, for all (i) if s not ES, (ii) if

Let Theorem 2: Given such that, for all if s is ES w.r.t. then

Proof: Suppose • will construct mutant s' that can invade • let • if s = ALT, = any history for which alternating pattern broken

Construct s' so that • if h not a continuation of • after , strategy s' • “signals” willingness to cooperate by playing differently from s for 1 period (assume s is pure strategy) • if other player responds positively, plays strongly efficiently thereafter • if not, plays according to s thereafter • after • responds positively if other strategy has signaled, and thereafter plays strongly efficiently • plays according to s otherwise

because is already worst history, s' loses for only 1 period by signaling (small loss if r small) • if p small, probability that s' “misreads” other player’s intention is small • hence, s' does nearly as well against s as s does against itself (even after ) • s' does very well against itself (strong efficiency), after

remains to check how well s does against s' • by definition of • Ignoring effect of p, Also, after deviation by s', punishment started again, and so Hence • so s does appreciably worse against s' than s' does against s'

Summing up, we have: • s is not ES

Theorem 2 implies for Prisoner’s Dilemma that, for any • doesn’t rule out punishments of arbitrary (finite) length

Consider strategy s with “cooperative” and “punishment” phases • in cooperative phase, play C • stay in cooperative phase until one player plays D, in which case go to punishment phase • in punishment phase, play D • stay in punishment phase for m periods (and then go back to cooperative phase) unless at some point some player chooses C, in which case restart punishment • For any m,

Can sharpen Theorem 2 for Prisoner’s Dilemma: Given , there exist such that, for all if s is ES w.r.t. then it cannot entail a punishment lasting more than periods Proof: very similar to that of Theorem 2

For r and p too big, ES strategy s may not be “efficient” • if • if fully cooperative strategies in Prisoner’s Dilemma generate payoffs

Theorem 3: Let For all for all for all

Proof: Construct s so that • along equilibrium path of (s, s), payoffs are (approximately) (v, v) • punishments are nearly strongly efficient • deviating player (say 1) minimaxed long enough wipe out gain • thereafter go to strongly efficient point • overall payoffs after deviation: • if r and p small (s, s) is a subgame perfect equilibrium

In Prisoner’s Dilemma, consider s that • plays C the first period • thereafter, plays C if and only if either both players played C previous period or neither did • strategy s • is efficient • entails punishments that are as short as possible • is modification of Tit-for-Tat (C the first period; thereafter, do what other player did previous period) • Tit-for-Tat not ES • if mistake (D, C) occurs then get wave of alternating punishments: (C, D), (D, C), (C, D), ... until another mistake made

Let s = play d as long as in all past periods • both players played d • neither played d if single player deviates from d • henceforth, that player plays b • other player plays a • s is ES even though inefficient • any attempt to improve on efficiency, punished forever • can’t invade during punishment, because punishment efficient

Consider potential invader s' For any h, s' cannot do better against s than s does against itself, since (s, s) equilibrium hence, for all h, and so For s' to invade, need Claim:implies h' involves deviation from equil path of (s, s) only other possibility: • s' different from s on equil path • then s' punished by • violates we thus have Hence, from rhs of

For Theorem 3 to hold, p must be small relative to r • consider modified Tit-for-Tat against itself (play C if and only if both players took same action last period) • with every mistake, there is an expected loss of 2 – (½ · 3 + ½ (−1)) = 1 the first period 2 – 0 = 2 the second period • so over-all the expected loss from mistakes is approximately • By contrast, a mutant strategy that signals, etc. and doesn’t punish at all against itself loses only about • so if r is small enough relative to p, mutant can invade

Evolution and Repeated Games