320 likes | 423 Views
Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב. Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part II) (Some slides from Prof. Avrim Blum’s course at CMU and Prof. Yishay Mansour’s course at TAU). Recap: Regret Minimization.
E N D
Issues on the border of economics and computationנושאים בגבול כלכלה וחישוב Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part II) (Some slides from Prof. Avrim Blum’s course at CMU and Prof. YishayMansour’s course at TAU)
Example 1: Weather Forecast • Sunny: • Rainy: • No meteorological understanding! • using other web sites Goal: Nearly the most accurate forecast
Example 2: Route Selection Goal: Fastest route Challenge: Partial Information
Financial Markets • Model: Select a portfolio each day. • Gain: The changes in stock values. • Performance goal: Compare well with the best “static” policy.
Reminder: Minimizing Regret • At each round t=1,2, …,T • There are n strategies (experts) 1,2, …, n • Algorithm selects a strategy in {1,…,n} • and then observes the loss li,t[0,1] of each strategy i{1,…,n} • Let li = Stli,t. Let lmin = minili • Goal: Do “nearly as well” as lmin in hindsight. • Have no regret!
Randomized Weighted Majority Initially: wi,1=1 and pi,1=1/n for each strategy i At time t=1,…,T Select strategy i with probability pi,t Observe loss vector lt Update the weights wi,t+1 := wi,t(1-li,t). pi,t+1 := wi,t+1/Wt+1 where Wt = Siwi,t+1
Formal Guarantee for RWM Theorem: If L= expected total loss of algby time T+1 and OPT = accumulated loss of best strategy by time T+1 L < OPT + 2(T log(n))1/2. “additive regret” bound An algorithm is called a “no-regret algorithm” if L < OPT + f(T) andf(T)/T goes to 0 as T goes to infinity
Analysis • Let Ft be the expected loss of RWM at time tFt=Sipi,tli,t • Observe that • Wfinal = n(1-eF1)(1 - eF2)… • ln(Wfinal) = ln(n) + åt [ln(1 - eFt)] < ln(n) - eåt Ft • (using ln(1-x) < -x) • = ln(n) - eL. (using åFt = L = E[total loss of RWM by time T+1])
Analysis (Cont.) • Let ibe the best strategy in hindsight • Observe that wi,final = 1(1-li,1e)(1-li,2e)…(1-li,ne)ln(wi,final) = ln(1-li,1e)+ln(1-li,2e)+…+ln(1-li,Te) >-Stli,te-Stli,te2= lie(1+e)= -(Stli,t)e(1+e) =-lie(1+e) = -lmine(1+e)(using -z-z2≤ ln(1-z) for 0 ≤z≤ ½ and li,t in [0,1])
Analysis (Cont.) • wi,final < Wfinalln(wi,final) < ln(Wfinal)-lmine(1+e) < ln(n) – eLL < (1+e)lmin + (1/e)ln(n) • Set e=(log(n) / T)1/2 to getL < lmin+ 2(T * log(n))1/2
Equivalently • At each round t=1,2, …, T • There are n strategies (experts) 1,2, …, n • Algorithm selects a strategy in {1,…,n} • and then observes the gain gi,t[0,1] of eachstrategy i{1,…,n} • Let gi = Stgi,t. Let gmax = maxigi • Goal: Do “nearly as well” as gmax in hindsight • Let G be the algorithm’s expected total gain by time T+1. RWM (setting li,t=1-gi,t) guarantees thatG > gmax – 2(T * log(n))1/2
Rock-Paper-Scissors • Say that you are the row player… and you play multiple times • How should you play the game?! • what does “playing well” mean? • highly opponent dependent • One (weak?) option: Do “nearly as well” as best pure strategy in hindsight!
Rock-Paper-Scissors • Use no-regret algorithm to select strategy ateach time step • Why does this make sense?
Reminder: Mixed strategies • Definition: a “mixed strategy” is a probability distribution over actions. • If {a1,a2,…,am} are the pure strategies of A, then {p1,…,pm} is a mixed strategy for A if (1) (2) For all i 9/10 1/4 1/3 0 1/2 1/10 1/2 2/3 1 1/2
Reminder: Mixed Nash Eq. • Main idea: given a fixed behavior of the others, I will not change my strategy. • Definition: (SA,SB) are in Nash Equilibrium, if each strategy is a best response to the other. 1/2 1/2 1/2 1/2
Zero-Sum Games • Azero-sum game is a 2-player strategic game such that for eachsS, we haveu1(s) + u2(s) = 0. • What is good for me, is bad for my opponent and vice versa • Note: Any game where the sum is a constant c can be transformed into a zero-sum game with the same set of equilibria: • u’1(a) = u1(a) • u’2(a) = u2(a) - c
(-1,1) (1,-1) (1,-1) (-1,1) Left Right Left Right 2-Player Zero-Sum Games
How to Play Zero-Sum Games? • Assume that only pure strategies are allowed • Be paranoid: Try to minimize your loss by assuming the worst! • Player 1 takes minimum over row values: • T: -6, M: -1, B: -6 • then maximizes: • M: -1
Minimax-Optimal Strategies • A (mixed) strategy s1*isminimax optimal for player 1, if mins2 S2u1(s1*,s2) ≥mins2 S2u1(s1,s2) for all s1S1 • Similar for player 2 • Can be found via linear programming.
(-1,1) (1,-1) (1,-1) (-1,1) Left Right Left Right Minimax-Optimal Strategies • E.g., penalty shot Minimax optimal for both players is 50/50. Gives expected gain of 0. Any other is worse.
(0,0) (1,-1) (1,-1) (-1,1) Left Right Left Right Minimax-Optimal Strategies • E.g., penalty shot with goalie who’s weaker on the left. Minimax optimal for both players is (2/3,1/3). Gives expected gain 1/3. Any other is worse.
MinimaxTheorem(von Neumann 1928) • Every 2-player zero-sum game has a unique value V. • Minimax optimal strategy for R guarantees R’s expected gain at least V. • Minimaxoptimal strategy for C guarantees R’s expected gain at most V.
VC VR Proof Via Regret (sketch) • Suppose for contradiction this is false. • This means some game G has VC > VR: • If column player commits first, there exists a row that gets at least VC. • But if row player has to commit first, the column player can make him get only VR. • Scale matrix so that payoffsare in [0,1] and say that VR = (1-e)Vc.
Proof (Cont.) • Consider the scenario that both players repeatedly play the game G and each uses a no-regret algorithm. • Let si,t be the (pure) strategy of player i at time t • Let qi = (1/T)Stsi,t • qi is called i’sempiricial distribution • Observe that player 1’s average gain whenplaying a pure strategy s1 against thesequence {s2,t} is exactly u1(s1,q2)
VC VR Proof (Cont.) • Player 1 is using a no-regret algorithm and sohis average gain is at least Vc (as T goes to infinity) • Similarly, player 2 is using a no-regret algorithm and so player 1’s average gain is at most VR (as T goes to infinity). • A contradiction!
Convergence to Nash • Suppose that each of the players in a 2-player zero-sum game is using a no-regret algorithm to select strategies. • Let qi be the empirical distribution of each player i in {1,2}. • (q1,q2) converges to a Nash equilibrium as T goes to infinity!
Rock-Paper-Scissors • Use no-regret algorithm • Adjust to the opponent’s play • No need to know the entire game in advance • Payoff can be more than the game’svalue V • If both players do this outcome is a Nash equilibrium
Summary • No-regret algorithms help deal with uncertainty in repeated decision making. • Implications for game theory: When both players use no-regret algorithms in a 2-player zero-sum game convergence to a Nash equilibrium is guaranteed.