770 likes | 977 Views
Smoothness and Learning Equilibria. Price of Anarchy Seminar Presenting: Aviv Kuvent Professor: Michal Feldman. Reminder. PoA bound results for PNE, MNE. However: PNE need not always exist MNE is hard to compute Introduces 2 additional types of equilibrium: Correlated Equilibrium (CE)
E N D
Smoothness and Learning Equilibria Price of Anarchy Seminar Presenting: Aviv Kuvent Professor: Michal Feldman
Reminder • PoA bound results for PNE, MNE. • However: • PNE need not always exist • MNE is hard to compute • Introduces 2 additional types of equilibrium: • Correlated Equilibrium (CE) • Coarse Correlated Equilibrium (CCE)
Reminder - CE • Definition (CE): • A distribution σ on the set of outcomes of a cost-minimization game is a correlated equilibrium if for every player , strategy , and every deviation ,
Reminder - CCE • Definition (CCE): • A distribution σ on the set of outcomes of a cost-minimization game is a coarse correlated equilibrium if for every player and every deviation ,
Reminder • We saw that in smooth games, the PoA bounds proven for PNE/MNE also hold for CE/CCE. • CE/CCE are easy to compute: • Linear Programming • No Regret Dynamics
The Game Model • Single Player • Set of actions , where • Adversary • time steps
The Game Model • At time : • Player picks a distribution over his action set . • Adversary picks cost vector • Action is chosen according to the distribution , and the player incurs the cost . • Player is aware of the values of the entire cost vector (Full Information Model) • Partial Information Model • Player only knows the value of the incurred cost.
Regret • The difference between the cost of our algorithm and the cost of a chosen benchmark • -the benchmark we choose. • Time-averaged Regret:
No Regret Algorithm • An algorithm has no-regret if for every sequence of cost vectors, the expected time-averaged regret as . • Equivalently, we would like to show that: • Goal: develop such an algorithm • Which benchmark to use?
Optimal Sequence • Or “Best Action in Hindsight” • Too strong to be a useful benchmark. • Theorem: • For any algorithm there exists a sequence of cost vectors such that , where is the size of the action set of the player.
Optimal Sequence • Proof: • Build the following sequence: in time , • ⇒ Expected cost of in time • The cost of for time steps • An optimal sequence of cost 0 exists. ⇒
External Regret • Benchmark – the best fixed action: • Playing the same action in all time steps • External Regret:
The Greedy Algorithm • First attempt to develop no-external regret algorithm • Notations: • The cumulative cost of action up to time : • The cost of best fixed action: • The cost of the greedy algorithm:
The Greedy Algorithm • For simplicity, assume that . • The idea – choose the action which incurred minimal cost so far. • Greedy Algorithm: • Initially: select to be some arbitrary action in the action set . • At time : • Let , • Choose .
The Greedy Algorithm • Theorem: • For any sequence of losses, the greedy algorithm has • Proof: • At each time where the greedy algorithm incurs a cost of 1 and does not increase, at least one action is removed from the set . • This can occur at most times before increases by 1. • Therefore, the greedy algorithm incurs cost at most between successive increments in ⇒ • The additional is for the cost incurred in the steps from the last time increased.
Deterministic Algorithms • The greedy algorithm has a cost of at most factor larger than that of the fixed best action – not “no regret algorithm” • Theorem:For any deterministic algorithm , there exists a cost sequence for which and . • Corollary:There does not exist a no-regret deterministic algorithm.
Deterministic Algorithms • Proof: • Fix deterministic algorithm • - action it selects at time • Generate the following cost sequence: • incurs a cost of 1 in each time step ⇒ • selects some action at most times, that is the best fixed action ⇒ ⇒ not dependent on ⇒ is not a no-regret algorithm.
Lower Bound on Regret • Theorem: • Even for actions, no (randomized) algorithm has expected regret which vanishes faster than . • Proof: • Let be a randomized algorithm. • Adversary: on each time step , chooses uniformly at random between two cost vectors - • ⇒ . • Think of adversary as flipping a fair coins : , standard deviation =.
Lower Bound on Regret • Proof (cont): • ⇒ One action has an expected cost of other action has an expected cost of . • Second action - the best fixed action ⇒ • ⇒ • General case of actions, similar argument ⇒ lower bound of .
Multiplicative Weights (MW) • The idea: • Keep a weight for each action • Probability of playing the action depends on weight • The weights are updated; used to "punish" actions which incurred cost. • Notations: • – weight of action in time t • – sum of the weights of the actions in time t • – cost of action in time t • – cost of the MW algorithm in time t • – cost of the MW algorithm • – cost of best fixed action
Multiplicative Weights (MW) • MW Algorithm: • Initially: and for each • At time step : • Play an action according to distribution , where • Given the cost vector , decrease the weights:
Multiplicative Weights (MW) • What is ? • A tends towards 0, the distribution tends towards a uniform distribution (exploration) • As tends towards 1, the distribution tends towards a distribution which places most of the mass on the action which incurred the minimal cost so far (exploitation). • Set specific later, for now assume
Multiplicative Weights (MW) • Theorem: • MW is a no-regret algorithm, with expected (time-averaged) external regret of . • Proof: • The weight of every action (and therefore their sum, ) can only decrease with . • The idea of the proof -relate and to , and from there get the bound.
Multiplicative Weights (MW) • Proof (cont): • Relate to : • Let be a the best fixed action with a low cost • if no such an action exists – we can always get low regret with respect to the fixed action benchmark • : the weight of action at time • ⇒
Multiplicative Weights (MW) • Proof (cont): • Relate to : • At time : • for
Multiplicative Weights (MW) • Proof (cont): • ⇒ • So:
Multiplicative Weights (MW) • Proof (cont): • We got: , • ⇒ • Taylor expansion of • So, , and because ,
Multiplicative Weights (MW) • Proof (cont): • ⇒ • ⇒
Multiplicative Weights (MW) • Proof (cont): • We take ⇒ , and get: ⇒ • This assumes that is known in advance. If it is not known, at time step take , where is the largest power of 2 smaller than (doubling trick).
Multiplicative Weights (MW) • Up to the factor of 2, we have reached the lower bound for external regret rate (). • To achieve a regret of at most , we only need iterations of the MW algorithm.
No Regret Dynamics • Moving from a single player setting to a multi player setting • In each time step : • Each player independently chooses a mixed strategy using some no-regret algorithm . • Each player receives a cost vector , which is the cost of each of the possible actions of player , given the mixed strategies played by the other players:
No Regret Dynamics and CCE • Theorem: • Let be the outcome sequence generated by the no-regret dynamics after time steps. • Let be a uniform distribution over the multi-set of the outcomes . • is an approximate Coarse Correlated Equilibrium (CCE). • Corollary: smooth games PoA bounds apply (approximately) to no regret dynamics.
No Regret Dynamics and CCE • Proof: • Player has external regret : • From definition of • Expected cost of player when playing according to his no-regret algorithm: • Expected cost of player when always playing fixed strategy is:
No Regret Dynamics and CCE • Proof (cont): • ⇒ • For any fixed action : • ⇒ • All players play no regret algorithms: as , and we get the definition of CCE.
Swap Regret • Another benchmark • Define a switching function on the actions • Swap Regret: • External regret is a subset of swap regret, where we only look at the constant switching functions .
No Swap Regret Algorithm • Definition: • An algorithm has no swap regret if for all cost vectors and all switching functions , the expected (time averaged) swap regret: as
Swap-External Reduction • Black box reduction from no swap regret to no external regret • (Corollary: there exists poly-time no swap regret algorithms) • Notations: • - number of actions • - instantiations of no external regret algorithms. • Algorithm receives a cost vector and returns a probability distribution over actions.
Swap-External Reduction • Reduction: • The main algorithm , at time : • Receive distributions over actions from algorithms • Compute and output consensus distribution • Receive a cost vector from the adversary • Give algorithm the cost vector • Goal: use the no external regret guarantee of the algorithms , and get a no swap regret guarantee for . • Need to show how to compute
Swap-External Reduction M1 M2 . . . Mn
Swap-External Reduction • Theorem: • This reduction provides a no swap regret algorithm • Proof: • Time averaged expected cost of : • Time averaged expected cost with some switching function :
Swap-External Reduction • Proof (cont): • Want to show: for all switching functions , , where when • Time-averaged expected cost of no-external regret algorithm : • is a no regret algorithm ⇒for every fixed action :
Swap-External Reduction • Proof (cont): • Fix . • Summing the inequality over , with for each :
Swap-External Reduction • Proof (cont): • For each , as , and is fixed ⇒ as • If then we’ve shown , where when , as needed. • Need to choose so that for every :
Swap-External Reduction • Proof (cont): • is a stationary distribution of a Markov chain. • Build the following Markov chain: • The states - the actions set - • For every , the transition probability from to is
Swap-External Reduction • Proof (cont): • Every stationary distribution of this Markov chain satisfies • This is an ergodic Markov chain • For such type of chain there is at least one stationary distribution, and it can be computed in polynomial time (eigenvector computation).
Swap Regret Bound • We have shown a no regret reduction, with regret • If we take MW as the algorithms , we get a swap regret bound of (each is bounded by ) • We can improve this: • In MW, the actual bound we found was , and we just bounded by . • If we take the original bound, we get:
Swap Regret Bound • The cost vector we gave to at time was scaled by ⇒ • Square root function is concave ⇒Jansen’s inequality: • ⇒
Swap Regret Bound • ⇒ • We get the swap regret bound of
No Swap Regret Dynamics and CE • Equivalent definition of CE using switching functions: • A distribution σ on the set of outcomes of a cost-minimization game is a correlated equilibrium if for every player , every strategy , and every switching function ,