Learning in Games

Learning in Games Scott E Page Russell Golman & Jenna Bednar University of Michigan-Ann Arbor

Overview Analytic and Computational Models Nash Equilibrium • Stability • Basins of Attraction Cultural Learning Agent Models

Domains of Interest Computable Analytic

Learning/Adaptation Diversity Interactions/Epistasis Networks/Geography

Equilibrium Science We can start by looking at the role that learning rules play in equilibrium systems. This will give us some insight into whether they’ll matter in complex systems.

Players

Actions Cooperate: C Defect: D

Payoffs C D C D

Best Responses C D C D

Nash Equilibrium C D C 2,2 D

“Equilibrium” Based Science Step 1: Set up game Step 2: Solve for equilibrium Step 3: Show how equilibrium depends on parameters of model Step 4: Provide empirical support

Is Equilibrium Enough? Existence: Equilibrium exists Stability: Equilibrium is stable Attainable: Equilibrium is attained by a learning rule.

Examples Best respond to current state Better respond Mimic best Mimic better Include portions of best or better Random with death of the unfit

Stability Stability can only be defined relative to a learning dynamic. In dynamical systems, we often take that dynamic to be a best response function, but with human actors we need not assume people best respond.

Battle of Sexes Game EF CG EF CG

Three Equilibria 1/4 3/4 3/4 1/4

Unstable Mixed? 1/4 +e 3/4 - e EF 3/4 + 3e 3/4 - 3e CG

Note the Implicit Assumption Our stability analysis assumed that Player 1 would best respond to Player 2’s tremble. However, the learning rule could be go to the mixed strategy equilibrium. If so, Player 1 would sit tight and Player 2 would return to the mixed strategy equilibrium.

Empirical Foundations We need to have some understanding of how people learn and adapt to say anything about stability.

Classes of Learning Rules Belief Based Learning Rules: People best respond given their beliefs about how other people play. Replicator Learning Rules: People replicate successful actions of others.

Stability Results An extensive literature provides conditions (fairly week) under which the two learning rules have identical stability property. Synopsis: Learning rules do not matter

Basins Question Do games exist in which best response dynamics and replicator dynamics produce different basins of attraction? Question: Does learning matter?

Best Response Dynamics x = mixed strategy of Player 1 y = mixed strategy of Player 2 dx/dt = BR(y) - x dy/dt = BR(x) - y

Replicator Dynamics dxi/dt = xi( i - ave) dyi/dt = yi( i - ave)

Symmetric Matrix Game A B C A B C

A A B B B C

Best Response Basins A B C A B C A > B iff 60pA + 60pB + 30pC > 30pA + 70pB + 20pC A > C iff 60pA + 60pB + 30pC > 50pA + 25pB + 25pC

Barycentric Coordinates C .7C + . 3B .4A + . 2B + .4C B A .5A + . 5B

Best Responses C C A C B B B A

Stable Equilibria C C A C B B B A

Best Response Basins C C A A B B B A

Replicator Dynamics Basins C C C A ? B B B A

Replicator Dynamics C A C A B B B B A

Recall: Basins Question Do games exist in which best response dynamics and replicator dynamics produce very different basins of attraction? Question: Does learning matter?

Conjecture For any  > 0, There exists a symmetric matrix game such that the basins of attraction for distinct equilibria under continuous time best response dynamics and replicator dynamics overlap by less than 

Collective Action Game SI Coop Pred Naive SI Coop Pred Naive

Naïve Goes Away Pred Coop SI

Basins on Face Pred SI Coop Coop SI

Lots of Predatory Pred SI Coop Coop SI

Best Response Pred SI Coop

Replicator Pred SI Coop

The Math dxi/dt = xi( i - ave) ave = 2xS + xC (1+NxC) dxc/dt = xc[(1+ NxC- - 2xS - xC (1+NxC)] dxc/dt = xc[(1+ NxC)(1- xC) - 2xS]

Choose N > 1/ Assume xc >  dxc/dt = xc[(1+ NxC)(1- xC) - 2xS] dxc/dt >  [2(1- ) - 2(1- )] = 0 Therefore, xc always increases.

Collective Action Game SI Coop Pred Naive SI Coop Pred Naive

Theorem: In any symmetric matrix game for which best response and replicator dynamics attain different equilibria with probability one, there exists an action A that is both an initial best response with probability one and not an equilibrium.

Complex Systems Primatives Learning/Adaptation Diversity Interactions/Epistasis Networks/Geography

Diversity EWA (Wilcox) and Quantal Response (Golman) learning models are misspecified for heterogenous agents. EWA = “hybrid of response and belief based learning”

Convex Combinations Suppose the population learns using the following rule a (Best Response) + (1-a) Replicator

Learning in Games