460 likes | 587 Views
Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב. Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Slides on weighted majority algorithms from Prof. Avrim Blum’s course at CMU). Reminder: n -Player Games. Consider a game:
E N D
Issues on the border of economics and computationנושאים בגבול כלכלה וחישוב Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Slides on weighted majority algorithms from Prof. Avrim Blum’s course at CMU)
Reminder: n-Player Games • Consider a game: • Si is the set of (pure) strategies for player i • S = S1 x S2 x… x Sn • s = (s1,s2,…,sn ) S is a vector of strategies • Ui: S R is the payoff function for player i. • Notation: given a strategy vector s, let s-i= (s1,…,si-1,si,…,sn) • The vector i where the i’th item is omitted. • s is a (pure) Nash equilibrium if for every i,ui(si,s-i) ≥ ui(si’,s-i) for every si’ Si
Best-Response Dynamics • The (arguably) most natural way for reaching a pure Nash (PNE) equilibrium in a game • Best-response dynamics: • Start at an arbitrary strategy vector • Let players take turns best-responding to other players’ actions (in any order) • … until a pure Nash equilibrium is reached
Best-Response Dynamics: Illustration Column Player x y 0,0 1,1 x Row Player y 0,0 1,1 two pure Nash equilibria players’ best-responses
Best-Response Dynamics: Illustration Column Player x y 0,0 1,1 x Row Player y 0,0 1,1 let players take turns best-responding (Row, Column, …) until a PNE is reached start at some strategy vector
Do Best-Response Dynamics Always Converge? Column Player x y x 0,1 1,0 Row Player y 0,1 1,0 A PNE might not even exist Even if a PNE exists convergence is not guaranteed!
Better Responses • When player B plays b and player A plays a, A’s strategy a* is a better responseto b if UA(a*,b) > UA(a,b) Column Player L R U Row Player M D
Better-Response Dynamics • Start at an arbitrary strategy vector • Let players take turns better-responding to other players’ actions (in any order) • … until a pure Nash equilibrium is reached
Do Better-Response Dynamics Always Converge? Column Player x y x 0,1 1,0 Row Player y 0,1 1,0 best-response dynamics is a special case of better-response dynamics
Reminder: Potential Games • Definition: (exact) potential gameA game is an exact potential game if there is a function Φ:SR such that • Definition: (ordinal) potential game
Reminder: Eq. in Potential Games • Theorem: every (finite) potential game has a pure Nash equilibrium. • Theorem: in every (finite) potential game better-response dynamics (and so also best-response dynamics)converge to a PNE
Level3 AT&T Comcast Qwest Example: Internet Routing Establish routes between the smaller networks that make up the Internet Currently handled by the Border Gateway Protocol (BGP).
Level3 AT&T Comcast Qwest Why is Internet Routing Hard? Not shortest-paths routing!!! Always chooseshortest paths. Load-balance myoutgoing traffic. Avoid routes through AT&T if at all possible. My link to UUNET is for backup purposes only.
BGP Dynamics Prefer routes through 2 Prefer routes through 1 2 1 12d 1d 21d 2d d under BGP each router repeatedly selectsits best available route until a stable state is reached
BGP Might Not Converge! 12d 1d … 23d 2d … 2 1 d 31d 3d … 3 in fact, sometimes a stable state does not even exists
Implications of BGP Instability almost 50% of problems with VoIPresult from bad BGP convergence…
Internet Routing as a Game • BGP can be modeled as best-response dynamics! • the (source) nodes are the players • player i’sstrategy set is Si = Ni • where N(i) is the set of i’s neighbors Ni • Player i’sutility from strategy vector s isui(s) = i’s rank for the route from i to d in the directed graph induced by s**the more preferred a route the higher its rank • A PNE in this game corresponds toa stable routing state. 21d 2d 12d 1d 1 2 d
Next-Hop Preferences • A node i has next-hop preferences if all paths that go through the same neighbor have the same rank. • i’s route preferences depend only on its “next-hop node” R1 . . . . k i d . . . R2
Positive Result • Theorem: When all nodes have next-hop preferences the Internet routing game is a potential game • A PNE (= stable state) always exists • better-response (and best-response dynamics) converge to PNE. • Proof (sketch): We define the (exact) potential function Φ:SR as followsΦ(s) = Siui(S)
Positive Result (Proof Sketch) • Need to prove that . . . . i d . . . • Observe that the change in i’s strategy does not affect the utility of any player but (possibly i).Φ(ti,s-i) – Φ(si,s-i) = Sjuj(ti,s-i) – Sjuj(si,s-i) =uj(ti,s-i) – ui(si,s-i)
Other Game Dynamics • We will next learn about other dynamics that converge to equilibria in games. • But first…
Motivation Many situations involve online repeated decision making in an uncertain environment. • Deciding how to invest your money (buy or sell stocks) • What route to drive to work each day • …
Online learning, minimizing regret, and combining expert advice. Expert 3 Expert 2 Expert 1
Using “Expert” Advice Assume we want to predict the stock market. • We solicit n “experts” for their advice. • Will the market go up or down? • We then want to use their advice somehow to make our prediction. E.g., Can we do nearly as well as best expertin hindsight? Note: “expert” someone with an opinion. [Not necessairly someone who knows anything.]
Formal Model • At each round t=1,2, …, T • There are n experts. • each expert makes a prediction in {0,1} • the learner (using experts’ predictions) makes a prediction in {0,1} • The learner observes the actual outcome. There is a mistake if the predicted outcome is different form the actual outcome. The learner gets to update his hypothesis. Can we do nearly as well as bestexpert in hindsight?
Formal Model • At each round t=1,2, …, T • each expert makes a prediction in {0,1} • There are n experts. • the learner (using experts’ predictions) makes a prediction in {0,1} • The learner observes the actual outcome. There is a mistake if the predicted outcome is different form the actual outcome. Can we do nearly as well as best expert in hindsight? We are not given any other info besides the experts’ yes/no answers. We make no assumptions about the quality or independence of the experts. We cannot hope to achieve an absolute level of quality in our predictions.
Simpler Question • One of these is perfect (never makes a mistake). We don’t know which one. • We have n “experts”. • Is there a strategy that makes no more than lg(n) mistakes?
Halving Algorithm Take majority vote over all experts that have been correct so far. I.e., if # surviving experts predicting 1 > # surviving experts predicting 0, then predict 1; else predict 0. Claim: If one of the experts is perfect, then at most lg(n) mistakes. Proof: Each mistake cuts # surviving experts by factor of 2, so we make · lg(n) mistakes. Note: this means ok fornto be very large.
Using “Expert” Advice • If one expert is perfect, get · lg(n) mistakes with halving algorithm. • But what if no expert is perfect? Can we do nearly as well as the best one in hindsight?
Using “Expert” Advice Strategy #1: Iterated halving algorithm. • Same as before, but once we've crossed off all the experts, restart from the beginning. • Makes at most log(n)*[OPT+1] mistakes, where OPT is #mistakes of the best expert in hindsight. Divide the whole history into epochs. Beginning of an epoch is when we restart Halving; end of an epoch is when we have crossed off all the available experts. At the end of an epoch we have crossed all the experts, so every single expert must make a mistake. So, the best expert must have made a mistake. We make at most log n mistakes per epoch. • If OPT=0 we get the previous guarantee.
Using “Expert” Advice Strategy #1: Iterated halving algorithm. • Same as before, but once we've crossed off all the experts, restart from the beginning. • Makes at most log(n)*[OPT+1] mistakes, where OPT is #mistakes of the best expert in hindsight. Wasteful. Constantly forgetting what we've “learned”. Can we do better?
Weighted Majority Algorithm Key Point: A mistake doesn't completely disqualify an expert. Instead of crossing off, just lower its weight. Weighted Majority Algorithm • Start with all experts having weight 1. • Predict based on weighted majority vote. • If • then predict 1 • else predict 0
Weighted Majority Algorithm Key Point: A mistake doesn't completely disqualify an expert. Instead of crossing off, just lower its weight. Weighted Majority Algorithm • Start with all experts having weight 1. • Predict based on weighted majority vote. • Penalize mistakes by cutting weight in half.
Analysis: Does Nearly as Well as Best Expert Theorem: If M = # mistakes we've made so far and OPT = # mistakes best expert has made so far, then:
Analysis: Does Nearly as Well as Best Expert Theorem: If M = # mistakes we've made so far and OPT = # mistakes best expert has made so far, then: Proof: • Analyze W = total weight (starts at n). • After each mistake, W drops by at least 25%. • So, after M mistakes, W is at most n(3/4)M. • Weight of best expert after M mistakes is (1/2)OPT. So, constant ratio
Randomized Weighted Majority 2.4(OPT + lgn)not so good if the best expert makes a mistake 30% of the time. Can we do better? • Yes.Instead of taking majority vote, use weights as probabilities & predict each outcome with prob. ~ to its weight. (e.g., if 70% on up, 30% on down, then pick 70:30) Key Point: smooth out the worst case.
Randomized Weighted Majority 2.4(OPT + lg n)not so good if the best expert makes a mistake 20% of the time. Can we do better? • Yes.Instead of taking majority vote, use weights as probabilities. (e.g., if 70% on up, 30% on down, then pick 70:30) • Also, generalize ½ to 1- e. Equivalent to select an expert with probability proportional with its weight.
Formal Guarantee for RWM Theorem: If M = expected # mistakes we've made so far and OPT = # mistakes best expert has made so far, then: • M (1+e)OPT + (1/e) log(n)
Analysis • Key idea: if the algo has significant expected loss, then the total weight must drop substantially. • Say at time t we have fraction Ft of weight on experts that made mistake. i.e., • For all t, • Ft is our expected loss at time t; probability we make a mistake at time t.
Analysis • Say at time t we have fraction Ft of weight on experts that made mistake. • So, we have probability Ft of making a mistake, and we remove an eFt fraction of the total weight. • Wfinal = n(1-eF1)(1 - eF2)… • ln(Wfinal) = ln(n) + åt [ln(1 - eFt)] < ln(n) - eåt Ft • (using ln(1-x) < -x) • = ln(n) - eM. (å Ft = E[# mistakes]) • If best expert makes OPT mistakes, ln(Wfinal) > ln((1-e)OPT). • Now solve: ln(n) - eM> OPT ln(1-e).
Randomized Weighted Majority Solves to:
Additive Regret Bounds • So, have M < OPT + eOPT + (1/e)log(n). • Say we know we will play for Ttime steps. Then can set e=(log(n) / T)1/2 and get M < OPT + 2(T * log(n))1/2. • If we don’t know Tin advance, can guess and double. • These are called “additive regret” bounds.
Extensions: Experts as Actions • What if experts are actions? • different ways to drive to work each day • different ways to invest our money • rows in a matrix game… • At each time t, each action has a loss (cost) in {0,1}. • Can still run the algorithm • Rather than viewing as “pick a prediction with prob proportional to its weight”, • View as “pick an expert with probability proportional to its weight” • Same analysis applies.
Extensions: Experts as Actions Note: Did not see the predictions to select an expert (only needed to see their losses to update our weights)
Extensions: Losses in {0,1} • What if experts losses are not in {0,1}, but in the continuous interval [0,1]? • If expertihas loss li, do: wi := wi(1-li). [before if an expert had a loss of 1, we multiplied by (1-epsilon), if it had loss of 0 we left it alone, now we do linearly in between] • Same analysis applies.