Equlibrium Selection in Stochastic Games

Equlibrium Selection in Stochastic Games • By Marcin Kadluczka Dec 2nd 2002 CS 594 – Piotr Gmytrasiewicz CS 594

Agenda • Definition of finite discounted stochastic games • Stationary equilibrium • Linear tracing procedure • Stochastic tracing procedures • Examples of different equlibria depending on the type of stocastic tracing CS 594

Finite discounted stochastic games • Where N – is the finite set of players (N={1,2,…,n} )  - state space with finite number of states  CS 594

Rules of the game Time t Time t+1 Probability of transition Player 1: Player 1: Player 2: Player 2: . . . . . . . . . . Transition Player n: Player n: Current state Rewards CS 594

Other assumption • Perfect recall At each stage each player remembers all past action chosen by all players and all past states occurred • Difference from normal-form game The game does not exist of single play, but jumps according to the probability measure  to the next state and continues dynamically • For rewards it count future states not only immediate payoffs CS 594

Pure & Mixed strategy • Pure strategy • Mixed strategy If mixed strategy is played -> instantaneous expected payoff of player i is denoted by And transition probability by CS 594

Stationary strategy payoffs • History The set of possible histories up to stage k: Consists of all sequences • Behavior strategy • Stationary strategy • Payoffs CS 594

Equilibrium • General equilibrium A strategy-tuple  is an equilibrium if and only if i is a best response to -i for all i • Stationary equilibrium (Nash Eq.) • Payoff for stationary equilibrium  CS 594

Comparison with other games • Comparison to normal-form games • Comparison to MDPs • More than one agent • If strategy is stationary – they are the same • Comparison to Bayesian Games • No discount in Bayesian • Types -> States • We have beliefs inside prior CS 594

Linear tracing procedure • Corresponding normal-form game We fix the state : • Prior probability distributions = prior Expectation of each player about other players strategy choices over the pure strategies Each player has the same assumption about others – Important assumption CS 594

Linear tracing procedure con’t • Family of one-parameter games • Payoff function CS 594

Linear tracing procedure con’t • - set of equilibrium points in It can be collection of piece of one-dim curves, though in degenerate cases it may contain isolated points and/or more dim curves • Feasible path  • Linear tracing procedure • Well-defined l.t.p  t 1 CS 594

Stochastic tracing procedure • Assumption: and prior p is given • Stochastic game • Total expected discounted payoffs • Stochastic tracing procedure T(,p) CS 594

Alternative ways of extension payoff function for stochastic games • There are 4 ways of define player belief: • Correlation within states – C(S) All opponents plays the same strategy • Absence of correlation within states – I(S) Each opponent can play different strategy • Correlation across time – C(T) Each player plays the same strategy accross the time • Absence of correlation across time – I(T) During the time each player can change its strategy CS 594

Alternatives con’t • Alternative 1: C(S),I(T) • Alternative 2: C(S),C(T) CS 594

Alternatives con’t • Alternative 3: I(S),I(T) • Alternative 4: I(S),C(T) CS 594

Example 1 – C(S) versus I(S) • Prior = • Equilibria: • Starting point: CS 594

Ex1: C(S) solution CS 594

Ex1: C(S) calculations • (s1,s2,s3;1): Player 1 expect player 2 plays: (1/2(1-t)+t,1/2(1-t)) Player 1 expect player 3 plays: (2/3(1-t)+t,1/3(1-t)) Expected payoff: (1/2(1-t)+t)(2/3(1-t)+t)*2=1/3(1+t)(2+t) • (s1,s2,s3;2): Player 2 expect player 1 plays: (1/6(1-t)+t,5/6(1-t)) Player 2 expect player 3 plays: (2/3(1-t)+t,1/3(1-t)) Expected payoff: (1/6(1-t)+t)(2/3(1-t)+t)*2=1/9(1+5t)(2+t) • (s1,s2’,s3;1): Player 1 expect player 2 plays: (1/2(1-t)+t,5/6(1-t)) Player 1 expect player 3 plays: (2/3(1-t)+t,1/3(1-t)) Expected payoff: (1/2(1-t))(2/3(1-t)+t)*2=1/9(1-t)(2+t) CS 594

Ex1: C(S) trajectory CS 594

Ex1: I(S) solution CS 594

Ex1: I(S) calculations • (s1,s2,s3;1): Player 1 expect player 2&3 plays s2&s3: t Player 1 expect player 2&3 plays prior(s1&s3) : (1-t) Expected payoff: ((1-t)(1/2)(2/3)+t) *2=2/3(1-t)+2t • (s1,s2,s3;2): Player 2 expect player 1&3 plays s1&s3: t Player 2 expect player 1&3 plays prior(s1&s3) : (1-t) Expected payoff: ((1-t)(1/6)(2/3)+t) *2=2/9(1-t)+2t • (s1,s2’,s3;1): Player 1 expect player 2&3 plays s2’&s3: t (but payoff is 0) Player 1 expect player 2&3 plays prior(s1&s3) : (1-t) Expected payoff: ((1-t)(1/2)(2/3)) *2=2/3(1-t) CS 594

Ex1: I(S) trajectory CS 594

Example 2 – C(I) versus C(S) • Equilibria: • Prior: • Starting point: Payoffs Transition probalilities CS 594

Ex2: C(T) solution 0 Transition probalilities for player 2 Transition probalilities for player 1 CS 594

Ex2: C(T) trajectory CS 594

Ex2: I(T) trajectory CS 594

Summary • Definition of stochastic games • Linear tracing procedure were presented • Some extension were shown with examples • C(S),I(T) is probably the best extension for calculation of strategy CS 594

Reference • “Equlibrium Selection in Stochastic Games” by P. Jean-Jacques Herings and Ronald J.A.P. Peeters CS 594

Questions ? CS 594

Equlibrium Selection in Stochastic Games