Mean Field Equilibria of Multi-Armed Bandit Games

Mean Field Equilibria of Multi-Armed Bandit Games RamkiGummadi (Stanford) Joint work with: Ramesh Johari (Stanford) Jia Yuan Yu (IBM Research, Dublin)

Motivation • Classical MAB models have a single agent. • What happens when other agents influence arm rewards? • Do standard learning algorithms lead to any equilibrium?

Examples • Wireless transmitters learningunknown channels with interference • Sellers learning about product categories:e.g. eBay • Positive externalities: social gaming.

Example: Wireless Transmitters Channel A 0.8 ? Channel B 0.6

Example: Wireless Transmitters Channel A 0.8 ; 0.9 ? Channel B 0.6 ; 0.1

Modeling the Bandit Game • Perfect bayesian equilibrium • Implausible agent behavior. • Mean field model • Agents behave under an assumption of stationarity.

Outline • Model • The equilibrium concept • Existence • Dynamics • Uniqueness and convergence • From finite system to limit model • Conclusion

Mean Field Model of MAB Games • Discrete time; arms; rewards. • An agent at any time has • Agents `regenerate’ once every time slots. • is sampled i.i.d. with distribution . • is reset to zero vector.

Mean Field Model of MAB Games • Policy,: maps to (randomized) arm E.g. UCB, Gittins index. • Population profile: Arm distribution of agents • Rewarddistribution Bernoulli of mean:

A Single Agent’s Evolution • Current state: • Current type: • Agent picks an arm • Population profile • Transitions to new state where: with probability with probability

Examples of Reward Functions • Negative externality: E.g. • Positive externality: E.g. • Non separable rewards: E.g.

The Equilibrium Concept • What constitutes an MFE? • A joint distribution for • A population profile, • Policy that maps state to arm choice. • Equilibrium conditions for • has to be the unique invariant distribution for fixed population profile under . • arises from when agents adopt policy

Optimality in Equilibrium • In an MFE, doesn’t change over time. • can be any “optimal” policy learning an i.i.d. reward environment.

Existence of MFE Theorem : At least one MFE exists if is continuous in for every . • Proved using Brouwer’s fixed point theorem.

Beyond Existence • MFE exists, but when is it unique? • Can agent dynamics find such an equilibrium even if it is unique? • How does the mean field model approximate a system with finitely many agents?

Dynamics Arms 1 2 3 . i . n

Dynamics Arms 1 2 3 . i . n Policy:

Dynamics Arms 1 2 3 . i . n Policy: Transition kernel ()

Dynamics Theorem : Let denote map from to . Assume is - Lipschitz for every θ. Then is a contraction map (in total variation) if: • Proof uses a coupling argument on the bandit process, .

Uniqueness and Convergence • Fixed points for MFE • For arbitrary initial , mean field evolution is: When is a contraction (w.r.t. ): • There exists a unique MFE • The mean field trajectory of measures converges to

Finite Systems to Limit Model • Rewards depend on, the empirical population profile of agents. • is a random probability measure on the (state, type) space. • (In what sense) does as ? i.e. Could trajectories diverge after a long time even for large ?

Approximation Property Theorem: As uniformly in when is a contraction. • Proof uses an artificial “auxiliary” system with rewards based on mean field profile. • Coupling of transitions to enable a bridge from finite to mean field limit via auxiliary system.

Conclusion • Agent populations converge to a mean field equilibrium using classical bandit algorithms. • Large agent population effectively mitigates non-stationarityin MAB games. • Interesting theoretical results beyond existence: uniqueness, convergence and approximation. • Insights are more general than theorem conditions strictly imply.

Mean Field Equilibria of Multi-Armed Bandit Games

Mean Field Equilibria of Multi-Armed Bandit Games

Presentation Transcript

Approximate Nash Equilibria in interesting games

The Complexity of Equilibria in Cost Sharing Games

Beat the Mean Bandit

Beat the Mean Bandit

Mortal Multi-Armed Bandits

Equilibria of Atomic Flow Games are not Unique

Definable strategies for Games and Equilibria

Extended Dynamical Mean Field

The Mean Field of the Sun

Symmetries of the Cranked Mean Field

Multi-armed Bandit Problems with Dependent Arms

Multi-armed Bandit Problem and Bayesian Optimization in Reinforcement Learning

Multi Armed Bandits

1.Description of correlations in mean-field and beyond mean-field methods

Perfect Correlated Equilibria in Stopping Games

An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem

Exploration and Exploitation Strategies for the K-armed Bandit Problem

Math Bandit

Some Aspects of Mean Field Dynamo Theory

Games with Secure Equilibria

Games with Secure Equilibria

Symmetries of the Cranked Mean Field