Competition between adaptive agents: learning and collective efficiency

Competition between adaptive agents: learning and collective efficiency Damien Challet Oxford University Matteo Marsili ICTP-Trieste (Italy) • My definition of the Minority Game • Simple worlds (M= 0) • Markovian behavior • Neural networks • Reinforcement learning • Multistate worlds (M> 0) • Cause of large inefficiencies • Remedies • From El Farol to MG and back challet@thphys.ox.ac.uk

'Truth is always in the minority' Kierkegaard

Zig-Zag-Zoug • Game played by Swiss children • 3 players, 3 feet, 3 magic words • “Ziiig” ... “Zaaag” .... “ZOUG!”

Minority Game Zig-Zag-Zoug with N players Aim: to be in the minority Outcome = #UP-#DOWN = #A-#B Model of competition between adaptive players Challet and Zhang (1997), from El Farol's bar problem (Arthur 1994)

Initial goals of the MG El Farol (1994): impossible to understand Drastic simplification, keeping key ingredients Bounded rationality Reinforcement learning Symmetrize the problem: 60/100 -> 50/50 Understand the symmetric problem Generalize results to the asymmetric problem

Repeated games Why playing again ? Frustration Losers in majority Induction • Limited capabilities • Beliefs, strategies,personality • Trial and error • Learning How to play ? Deduction • Rationality • Best answer • All lose !

Minority Game +1 Total losses = A2 -1 ... aN(t) Payoff player i -ai(t)A(t) A(t)=iai(t) N agents i=1, ..., N Choice ai (t) a1(t) a2(t)

Markovian learning 'If it ain't broken, don't fix it' (Reents et al., Physica A 2000: If I won, I stick to my previous choice If I lost, I change to the other choice with prob p Results: ( s2= < A> 2 ) • pN = x = cst (small p): s2 = 1 + 2x (1+ x/6) • p~ N 1/2s2 ~ N • p~ 1 s2 ~ N 2

Markovian learning II Problem: if N unknown, p= ? Try: p= f(t) e.g. p= t-k Convergence for any N Freezing When to stop ?

Neural networks Simple perceptrons, learning rate R (Metzler ++ 1999) s 2 = N + N(N-1)F(N,R) smin2 = N (1-2/p) = 0.363... N

Reinforcement learning • Each player has a register Di • Di> 0 + is better • Di< 0 - is better • Di(t+1) = Di(t) – A(t) • Choice: prob(+ | Di) = f(Di) f '(x) > 0 (RL)

Reinforcement learning II • Central result: agents minimize < A> 2 (predictability) for all f • Stationary state: < A> = 0 • Fluctuations = ? • Ex: f(x)=(1+tanh(K x))/2 exponential learning, K learning rate • K< Kcs2~ N • K> Kcs2~ N2

Reinforcement learning III Market Impact: each agent has an influence on the outcome • Naive agents: payoff - A = -A-i -a i • Non-naive agents: payoff - A + c a i • Smart agents: payoff -A-icf WLU, AU • Central result 2: non-naive agents minimize < A2> (fluctuations) for all f -> Nash equilibrium s2~ 1

Summary

Minority Games with memory If an agent believes that the outcome depends on the past results, the outcome will depend on the past results. Sun spot effect Self-fulfilling prophecies Fallacies of casual inference Consequence: The other agents will change their behavior accordingly

Minority Games with memory: naïve agents s2/N =P/N Fixed randomly drawn strategies = quenched disorder Tools of statistical physics give the exact solution in principle Agents minimize the predictability Predictability = Hamiltonian Optimization problem ? Numeric: Savit++ PRL99 Analytic: Challet++ PRL99 Coolen+ J. Phys A 2002

Minority Games with memory: low efficiency = P/N

Minority Games with memory: low efficiency P/N is not the right scaling for large fluctuations

Minority Games with memory: origin of low efficiency Stochastic dynamical equation for strategy score Ui slow varying part + correlated noise II = K P-1/2 I: Size independent When I << II, large fluctuations Transition at I / K = G /P1/2 Critical signal to noise ratio = G / P1/2

Minority Games with memory: origin of low efficiency DetermineG Predict critical points G /P1/2 Check: I/K

Minority Games with memory: origin of low efficiency AFTER BEFORE

Minority Games with memory: origin of low efficiency

Minority Games with memory: sophisticated agents Agents minimize fluctuations Optimization problem again

Reverse problem Many variations, different global utility functions • Grand canonical game (play or not play) • Time window of scores (exponential moving average) • Any payoff Hence, given a task (global utility function), one knows how to design agents (local utility). example: optimal defects combinations (cf. Neil's talk)

From El Farol to MG and back El Farol N 0 L MG N 0 L = N/2 Differences, similarities? Which results from MG are valid for El Farol?

From El Farol to MG and back N 0 L Theorem: all results from MG apply to El Farol S N< a> Everything scales like (L/N – < a>)/S = g P ½ The El Farol problem with P states of the world is solved.

From El Farol to MG and back:new results If (L/N – < a>)/S = g P ½0, P>Pc = 2 S2 / [p (L/N-< a>)2]: no more phase transition.

Summary AU/WLU suppresses large fluctuations -> Nash equilibrium Design: agents must know they have an impact. The knowledge of the exact impact not crucial Reverse problem also possible MG: simple, rich, fun, and useful www.unifr.ch/econophysics/minority 102 commented references

Competition between adaptive agents: learning and collective efficiency