810 likes | 941 Views
Pre-Bayesian Games. Moshe Tennenholtz Technion—Israel Institute of Technology. Acknowledgements. Based on joint work with Itai Ashlagi, Ronen Brafman and Dov Monderer. GT with CS flavor. Program equilibrium / strong mediated equilibrium Ranking systems Non-cooperative computing
E N D
Pre-Bayesian Games Moshe Tennenholtz Technion—Israel Institute of Technology
Acknowledgements • Based on joint work with Itai Ashlagi, Ronen Brafman and Dov Monderer.
GT with CS flavor • Program equilibrium / strong mediated equilibrium • Ranking systems • Non-cooperative computing • Pre-Bayesian games • Distributed Games • Recommender systems for GT • … • … • …
Modeling Uncertainty • In game theory and economics the Bayesian approach is mainly used. • Work in computer science frequently uses non-probabilistic models. • Work on Pre-Bayesian games incorporates game-theoretic reasoning into non-Bayesian decision-making settings.
Pre-Bayesian Games • Modeling and solution concepts in Pre-Bayesian games. • Applications: congestion games with incomplete information. • Pre-Bayesian repeated/stochastic games as a framework for multi-agent learning.
Solution Concepts in Pre-Bayesian Games
Safety-Level Equilibrium For every type play a strategy that maximizes the worst-case payoff given the other players’ strategies. Worst case - over the set of possible states!
Safety-Level Equilibrium w 1-w z 1-z p 1-p q 1-q
Other Non-Bayesian Solution Concepts • Minimax-Regret equilibrium (Hyafil and Boutilier 2004) • Competitive-Ratio equilibrium
Existence in Mixed Strategies Theorem: Safety-level, Minimax-regret and Competitive-ratio equilibria exist in every concave pre-Bayesian Game. A concave pre-Bayesian game - - For every type the set of possible actions is compact and convex (for every player) - ui(,¢) - concave function for every player i The proof follows by applying Kakutani’s fixed point theorem.
Related Work On Non-Bayesian Solutions Safety-level equilibria • Aghassi and Bertsimas (2004) • Levin and Ozdenoren (2004) Pure safety-level equilbria • Shoham and Tennenholtz (1992), Moses and Tennenholtz (1992),Tennenholtz (1991) Axiomatic Foundations • Brafman and Tennenholtz (1996)
Beyond Existence The main goal – analysis!!
Modeling Congestion Settings Examples: • Transportation engineering (Wardrop 1952, Beckman et al. 1956) • Congestion games (Rosenthal 1973) • Potential games (Monderer and Shapley 1996) • Price of anarchy (Papadimitriou 1999, Tardos and Roughgarden 2001) • Resource selection games with player specific cost functions (Milchtaich 1996) • Local effect games (Leyton-Brown and Tennenholtz 2003) ….
Where are we heading to? Our Goal: Incorporate incomplete information to congestion settings. Type of uncertainties: • number of players • job sizes • network structure • players’ cost functions • … • …
Symmetric Equilibrium Theorem: Every resource selection game with increasing resource cost functions has a unique symmetric equilibrium - .
Uniqueness of Symmetric Safety Level Equilibrium - game with complete information - game with incomplete information Theorem: • Let be a resource selection system with increasing resource cost functions. has a unique symmetric safety-level equilibrium. • The symmetric safety-level equilibrium profile is . is the unique symmetric equilibrium in the game .
Is Ignorance Bad? K – the real state, |K|=k , k<n Known number of players – cost of every player • Unknown number of players - cost of every player ?
Is Ignorance Bad? wj(k)=wj(1)+(k-1)dj Main Theorem: Let be a linear resource selection system with increasing resource cost functions. There exist an integer such that for all : 1. 2. All inequalities above are strict if and only if there exists such that
Where is this Useful? Example: Mechanism Design - Organizer knows the exact number of active players. Wishes to maximize social surplus --will not reveal the information.
More Detailed Analysis Theorem: Let be a linear resource selection system with increasing resource cost functions. There exist an integer L such that for every k>L: The minimal social cost in attained with symmetric mixed-action profiles is attained at . Consequently, is minimized at n=2k-1.
Further Research • Routing games with unknown number of players extension to general networks unique symmetric equilibrium exists in a model where an agent job can be split ignorance helps as long as n<k2 • Routing games with unknown job sizes extension to variable job sizes uncertainty about job sizes do not change surplus in several general settings ignorance helps where we have uncertainty on both the number of participants and the job sizes • …… • …… • Minimax-regret equilibria in the different congestion settings • Non-Bayesian equilibria in social choice settings … …
Conclusions so far • Non-Bayesian Equilibria exist in pre-Bayesian Games. • Players are better off with common lack of knowledge about the number of participants. • More generally, we show illuminating results using non-Bayesian solution concepts in pre-Bayesian games.
Non-Bayesian solutions for repeated (and stochastic) games with incomplete information: efficient learning equilibrium.
Learning in multi-agent systems • Multi-Agent Learning lies in the intersection of Machine Learning/Artificial Intelligence and Game Theory • Basic settings: A repeated game where the game (payoff functions) is initially unknown, but may be learned based on observed history. A stochastic game where both the stage games and the transition probabilities are initially unknown. • What can be observed following an action is part of the problem specification. • No Bayesian assumptions!
The objective of learning in multi-agent systems • Descriptive objective: how do people behave/adapt their behavior in (e.g. repeated) games? • Normative objective: can we provide the agents with advice about how they should behave, to be followed by “rational” agents, which will also lead to some good social outcome?
Learning in games: an existing perspective • Most work on learningin games (in machine learning/AI extending upon work in game theory), deals with the search for learning algorithms that ifadopted by all agents will lead to equilibrium. (another approach: regret minimization will be discussed and compared to later).
Re-Considering Learning in Games • But, why should the agents adopt these learning algorithms? This seems contradicting to the whole idea of self-motivated agents (which led to considering equilibrium concepts).
Re-Considering Learning in Games • (New) Normative answer: The learning algorithms themselves should be in equilibrium! • We call this form of equilibrium: Learning Equilibrium, and in particular we consider Efficient Learning Equilibrium (ELE). • Remark: In this talk we refer to optimal ELE (extending upon the basic ELE we introduced) but use the term ELE.
Efficient Learning Equilibrium:“Informal Definition” • The learning algorithms themselves are in equilibrium. It is irrational for an agent to deviate from its algorithm assuming that the others stick to their algorithms, regardless of the nature of the (actual) game that is being played. • If the agents follow the provided learning algorithms then they will obtain a value that is close to the value obtained in an optimal (or Pareto-optimal) Nash equilibrium (of the actual game) after polynomially many iterations. • It is irrational to deviate from the learning algorithm. Moreover, the irrationality of deviation is manifested within a polynomial number of iterations.
Efficient Learning Equilibrium is a form of ex-post equilibrium in Pre-Bayesian repeated games
Basic Definitions Game: G=<N={1,…,n},{S1,….,Sn},{U1,….,Un}> Ui:S1 … Sn→ R - utility function for i Δ(Si) – mixed strategies for i. A tuple of (mixed) strategies t=(t1,…,tn) is a Nash equilibrium if i N, Ui(t) Ui(t1,…,ti-1,t’,ti+1,…,tn) for every t’ Si Optimal Nash equilibrium – maximizes social surplus (sum of agents payoffs) val(t,i,g) – the minimal expected payoff that may be obtained by i when employing t in the game g. A strategy t’ Δ(Si) for which val(.,i,g) is maximized is a safety- level strategy (or, probabilistic maximin strategy ), and its value is the safety-level value.
Basic Definitions • R(G) -- repeated game with respect to a (one-shot) game G. • History of player i after t iterations of R(G): Perfect monitoring – Hti = ((a1j, …, anj),(p1j,…,pnj))tj=1 --- a player can observe all previously chosen actions and payoffs Imperfect monitoring – Hti = ((a1j, …, anj),pij)tj=1 --- a player can observe previously chosen actions (of all players) and payoffs of i. Strictly imperfect monitoring – Hti = (aij ,pij)tj=1 --- a player can observe only its own payoffs and actions. Possible histories for agent i: Hi=t=1Hti Policy for agent i: :Hi Δ(Si) Remark: in the game theory literature the term perfect monitoring is used to refer to the concept of imperfect monitoring above
Basic Definitions Let G be a (one-shot) game, let M=R(G) be the corresponding repeated game, and let n(G) be an optimal Nash-equilibrium of G. Denote the expected payoff of agent i in that equilibrium by NVi(n(G)). Given M= R(G) and a natural number T, we denote the expected T-step undiscounted average reward of player i when the players follow the policy profile (1 ,…,i,…,n) by Ui(M,1 ,…,i,…,n,T). Ui(M,1 ,…,i,…,n)=liminfT Ui(M,1 ,…,i,…,n,T)
Definition: (Optimal) ELE (in 2-person repeated game) (,) is an efficient learning equilibrium with respect to the class of games (where each one-shot game has k actions)if for every > 0, 0 < <1, there exists some T>0, where T is polynomial in 1/ , 1/ , and k, such that with probability of at least 1- : (1) If player 1(resp. 2) deviates from to ’ (resp. from to ’) in iteration l, then U1(M,(’,) ,l+t) U1(M,(,) ,l+t) + (resp. U2(M,(,’) ,l+t) U2(M,(,) ,l+t) + ) for every t T and for every repeated game M=R(G) . (2) For every t T and for every repeated game M=R(G) , U1(M,(,) ,t)+ U2(M,(,) ,t) NV1(n(G))+NV2(n(G)) - for an optimal (surplus maximizing) Nash equilibrium n(G).
The Existence of ELE Theorem: Let M be a class of repeated games. Then, there exists an ELE w.r.t. M given perfect monitoring. The proof of the above is constructive and use ideas of our Rmax algorithm (the first near-optimal polynomial time algorithm for reinforcement learning in stochastic games)+the folk-theorem in economics.
The ELE algorithm • For ease of presentation assume that the payoff functions are non-negative and are bounded by Rmax. • Player 1 performs action ai one time after the other for k times, for all i=1,2,...,k. • In parallel, player 2 performs the sequence of actions (a1,…,ak) k times. • If both players behaved according to the above then an optimal Nash equilibrium of the corresponding (revealed) game is computed, and the players behave according to the corresponding strategies from that point on. If several such Nash equilibria exist, one is selected based on a pre-determined arrangement. • If one of the players deviated from the above, we shall call this player the adversary and the other player the agent, and do the following: • Let G be the Rmax-sum game in which the adversary's payoff is identical to his payoff in the original game, and where the agent's payoff is Rmax minus the adversarypayoffs. Let Mdenote the corresponding repeated game. Thus, G is a constant-sum game where the agent's goal is to minimize the adversary's payoff. Notice that some of these payoffs will be unknown (because the adversary did not cooperate in the exploration phase). The agent now plays according to the following:
The ELE algorithm (cont.) • Initialize: Construct the following model M' of the repeated game M, where the game G is replaced by a game G' where all the entries in the game matrix are assigned the rewards (Rmax,0) (we assume w.l.o.g positive payoffs, and also assume the maximal possible reward Rmax is known). • We associate a boolean valued variable with each joint-action: {assumed,known}. This variable is initialized to the value assumed. • Repeat: • Compute and Act: Compute the optimal probabilistic maximin of G' and • execute it. • Observe and update: Following each joint action do as follows: • Let a be the action the agent performed and let a‘ be the adversary's action. • If (a,a') is performed for the first time, update the reward associated with • (a,a') in G', as observed, and mark it known.
Imperfect Monitoring Theorem: There exist classes of games for which an ELE does not exist given imperfect monitoring. The proof is based on showing that you can not get the values obtained in the Nash equilibria of the following games, when you don’t know initially what game you play, and can not observe the other agent’s payoff:
The Existence of ELE for Imperfect Monitoring Settings Theorem: Let M be a class of repeated symmetric games. Then, there exists an ELE w.r.t. M given imperfect monitoring.