1 / 32

Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב

Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב. Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part II) (Some slides from Prof. Avrim Blum’s course at CMU and Prof. Yishay Mansour’s course at TAU). Recap: Regret Minimization.

ianthe
Download Presentation

Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Issues on the border of economics and computationנושאים בגבול כלכלה וחישוב Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part II) (Some slides from Prof. Avrim Blum’s course at CMU and Prof. YishayMansour’s course at TAU)

  2. Recap:Regret Minimization

  3. Example 1: Weather Forecast • Sunny: • Rainy: • No meteorological understanding! • using other web sites Goal: Nearly the most accurate forecast

  4. Example 2: Route Selection Goal: Fastest route Challenge: Partial Information

  5. Financial Markets • Model: Select a portfolio each day. • Gain: The changes in stock values. • Performance goal: Compare well with the best “static” policy.

  6. Reminder: Minimizing Regret • At each round t=1,2, …,T • There are n strategies (experts) 1,2, …, n • Algorithm selects a strategy in {1,…,n} • and then observes the loss li,t[0,1] of each strategy i{1,…,n} • Let li = Stli,t. Let lmin = minili • Goal: Do “nearly as well” as lmin in hindsight. • Have no regret!

  7. Randomized Weighted Majority Initially: wi,1=1 and pi,1=1/n for each strategy i At time t=1,…,T Select strategy i with probability pi,t Observe loss vector lt Update the weights wi,t+1 := wi,t(1-li,t). pi,t+1 := wi,t+1/Wt+1 where Wt = Siwi,t+1

  8. Formal Guarantee for RWM Theorem: If L= expected total loss of algby time T+1 and OPT = accumulated loss of best strategy by time T+1 L < OPT + 2(T log(n))1/2. “additive regret” bound An algorithm is called a “no-regret algorithm” if L < OPT + f(T) andf(T)/T goes to 0 as T goes to infinity

  9. Analysis • Let Ft be the expected loss of RWM at time tFt=Sipi,tli,t • Observe that • Wfinal = n(1-eF1)(1 - eF2)… • ln(Wfinal) = ln(n) + åt [ln(1 - eFt)] < ln(n) - eåt Ft • (using ln(1-x) < -x) • = ln(n) - eL. (using åFt = L = E[total loss of RWM by time T+1])

  10. Analysis (Cont.) • Let ibe the best strategy in hindsight • Observe that wi,final = 1(1-li,1e)(1-li,2e)…(1-li,ne)ln(wi,final) = ln(1-li,1e)+ln(1-li,2e)+…+ln(1-li,Te) >-Stli,te-Stli,te2= lie(1+e)= -(Stli,t)e(1+e) =-lie(1+e) = -lmine(1+e)(using -z-z2≤ ln(1-z) for 0 ≤z≤ ½ and li,t in [0,1])

  11. Analysis (Cont.) • wi,final < Wfinalln(wi,final) < ln(Wfinal)-lmine(1+e) < ln(n) – eLL < (1+e)lmin + (1/e)ln(n) • Set e=(log(n) / T)1/2 to getL < lmin+ 2(T * log(n))1/2

  12. Equivalently • At each round t=1,2, …, T • There are n strategies (experts) 1,2, …, n • Algorithm selects a strategy in {1,…,n} • and then observes the gain gi,t[0,1] of eachstrategy i{1,…,n} • Let gi = Stgi,t. Let gmax = maxigi • Goal: Do “nearly as well” as gmax in hindsight • Let G be the algorithm’s expected total gain by time T+1. RWM (setting li,t=1-gi,t) guarantees thatG > gmax – 2(T * log(n))1/2

  13. So, why is this in an algorithmic game theory course?

  14. Rock-Paper-Scissors

  15. Rock-Paper-Scissors • Say that you are the row player… and you play multiple times • How should you play the game?! • what does “playing well” mean? • highly opponent dependent • One (weak?) option: Do “nearly as well” as best pure strategy in hindsight!

  16. Rock-Paper-Scissors • Use no-regret algorithm to select strategy ateach time step • Why does this make sense?

  17. Zero-Sum Games

  18. Reminder: Mixed strategies • Definition: a “mixed strategy” is a probability distribution over actions. • If {a1,a2,…,am} are the pure strategies of A, then {p1,…,pm} is a mixed strategy for A if (1) (2) For all i 9/10 1/4 1/3 0 1/2 1/10 1/2 2/3 1 1/2

  19. Reminder: Mixed Nash Eq. • Main idea: given a fixed behavior of the others, I will not change my strategy. • Definition: (SA,SB) are in Nash Equilibrium, if each strategy is a best response to the other. 1/2 1/2 1/2 1/2

  20. Zero-Sum Games • Azero-sum game is a 2-player strategic game such that for eachsS, we haveu1(s) + u2(s) = 0. • What is good for me, is bad for my opponent and vice versa • Note: Any game where the sum is a constant c can be transformed into a zero-sum game with the same set of equilibria: • u’1(a) = u1(a) • u’2(a) = u2(a) - c

  21. (-1,1) (1,-1) (1,-1) (-1,1) Left Right Left Right 2-Player Zero-Sum Games

  22. How to Play Zero-Sum Games? • Assume that only pure strategies are allowed • Be paranoid: Try to minimize your loss by assuming the worst! • Player 1 takes minimum over row values: • T: -6, M: -1, B: -6 • then maximizes: • M: -1

  23. Minimax-Optimal Strategies • A (mixed) strategy s1*isminimax optimal for player 1, if mins2 S2u1(s1*,s2) ≥mins2 S2u1(s1,s2) for all s1S1 • Similar for player 2 • Can be found via linear programming.

  24. (-1,1) (1,-1) (1,-1) (-1,1) Left Right Left Right Minimax-Optimal Strategies • E.g., penalty shot Minimax optimal for both players is 50/50. Gives expected gain of 0. Any other is worse.

  25. (0,0) (1,-1) (1,-1) (-1,1) Left Right Left Right Minimax-Optimal Strategies • E.g., penalty shot with goalie who’s weaker on the left. Minimax optimal for both players is (2/3,1/3). Gives expected gain 1/3. Any other is worse.

  26. MinimaxTheorem(von Neumann 1928) • Every 2-player zero-sum game has a unique value V. • Minimax optimal strategy for R guarantees R’s expected gain at least V. • Minimaxoptimal strategy for C guarantees R’s expected gain at most V.

  27. VC VR Proof Via Regret (sketch) • Suppose for contradiction this is false. • This means some game G has VC > VR: • If column player commits first, there exists a row that gets at least VC. • But if row player has to commit first, the column player can make him get only VR. • Scale matrix so that payoffsare in [0,1] and say that VR = (1-e)Vc.

  28. Proof (Cont.) • Consider the scenario that both players repeatedly play the game G and each uses a no-regret algorithm. • Let si,t be the (pure) strategy of player i at time t • Let qi = (1/T)Stsi,t • qi is called i’sempiricial distribution • Observe that player 1’s average gain whenplaying a pure strategy s1 against thesequence {s2,t} is exactly u1(s1,q2)

  29. VC VR Proof (Cont.) • Player 1 is using a no-regret algorithm and sohis average gain is at least Vc (as T goes to infinity) • Similarly, player 2 is using a no-regret algorithm and so player 1’s average gain is at most VR (as T goes to infinity). • A contradiction!

  30. Convergence to Nash • Suppose that each of the players in a 2-player zero-sum game is using a no-regret algorithm to select strategies. • Let qi be the empirical distribution of each player i in {1,2}. • (q1,q2) converges to a Nash equilibrium as T goes to infinity!

  31. Rock-Paper-Scissors • Use no-regret algorithm • Adjust to the opponent’s play • No need to know the entire game in advance • Payoff can be more than the game’svalue V • If both players do this outcome is a Nash equilibrium

  32. Summary • No-regret algorithms help deal with uncertainty in repeated decision making. • Implications for game theory: When both players use no-regret algorithms in a 2-player zero-sum game convergence to a Nash equilibrium is guaranteed.

More Related