260 likes | 445 Views
Regret Minimization and the Price of Total Anarchy. Paper by A. Blum, M. Hajiaghayi, K. Ligett, A.Roth Presented by Michael Wunder. Nash Anarchy vs. Total Anarchy. In a multiagent setting, want to find the ratio between the socially optimal value and the “selfish” agent outcome
E N D
Regret Minimization and the Price of Total Anarchy Paper by A. Blum, M. Hajiaghayi, K. Ligett, A.Roth Presented by Michael Wunder
Nash Anarchy vs. Total Anarchy • In a multiagent setting, want to find the ratio between the socially optimal value and the “selfish” agent outcome • Traditionally, assumed to be Nash, where no agent has incentive to change • Can also find the price of total anarchy, when selfish agents act repeatedly to minimize regret over previous actions
Why Regret Minimization? • Finding Nash equilibria can be computationally difficult • Not clear that agents would converge to it, or remain in one if there are several • Regret minimization is realistic because there are efficient algorithms that minimize regret, it is locally computed, and players improve by lowering regret
Results comparing prices • Shows how PoTA compares with PoA • Four classes of games • Hotelling Games • Valid Games • Atomic Linear Congestion Games • Parallel Link Congestion Games
Preliminaries (maximization) • Ai : set of pure strategies for player i • Si : set of mixed strategies for player i (distributions over Ai ) • Social Utility Function: • Individual utility function: • Strategy set if player i changes from si to s’i: ’
Preliminaries (cont.) • Socially Optimal Value: • Regret of Player i given action sets A: The difference between action taken and best available action over all timesteps • Price of Total Anarchy: Ratio of social value of best strategies to the “regret minimizers”
Hotelling Games • Problem: k sellers must set up a vendor stand on a graph to sell to n tourists, who buy from first seller along a path • Strategy set Ai = V S1 S2 T1
Hotelling Games cont. • Social welfare at time t: • To maximize fairness (and maximize the lowest player), split all vertices equally OPT = n/k Si T1
Hotelling Games cont. • Claim: Price of anarchy = (2k-2)/k • Proof: Consider alternate set: • Some player h achieves: • If player i plays same strategy as h, the expected payoff is: • Therefore, Price of Anarchy
Hotelling with Total Anarchy • The price of total anarchy is also (2k-2)/k • Proof from symmetry: Let Oti be the set of plays at time t by players other than i • Δit->u be the difference between expected payoff from choosing from Oti at time step u, and n/(2k-2) • For all i, for all 1<=t, u<=T: Δit->u + Δiu->t >=0 • Imagine a (2k-2) player game where there is a time t and a time u player for each original player but i • If player i replaces a random player, αi = n/(2k-2)
Hotelling Total Anarchy Proof • If player i replaces a time t player, and all other time t players are removed, player i’s payoff only improves • The expected payoff of player i from picking an action oti uniformly at random from Oti and playing over all T rounds:
Generalized Hotelling Games • The above proof does not use specifics of the game as described • In general, PoTA is (2k-2)/k even in the presence of arbitrarily many Byzantine players making arbitrary decisions • Regret-minimizing players may not converge to a Nash equilibrium, and play can cycle forever
Valid Games, Price of Anarchy • Valid games are a broad class of games that includes a market sharing game, the facility location problem, and others. Example: Cable television market sharing • Game is bipartite graph G = ((V,U),E). Each v in V is a player, each u in U is a market • Markets have value and cost • Players have budget • Players may enter adjacent markets, and receive value of market divided by players in market
Valid Games Definition • For a set function f, define the derivative of f at X in V in direction D in V-X to be f’D(X)=f(X U D)-f(X) • A game is valid if: • For X in A, γi’(X)>= γi’(A) for all i in V – A (submodularity) (Vickrey)
Valid Games Price of Anarchy • Vetta shows that for any Nash equilibrium strategy S, if γ is non-decreasing, γ(S) >= OPT/2 • PoTA matches PoA • While PoA does not hold with the addition of Byzantine players, PoTA does
Total Anarchy w/Byzantines So there is a regret minimizing player i which violates the regret minimizing condition.
Atomic Congestion Games • An atomic congestion game is a minimization game consisting of k players and a set of facilities V(ai over Vi) • Each facility e has a latency function fe(le) • Each player i has weight wi (unweighted wi = 1) • Player i experiences cost: • load on facility le
Atomic Congestion Games • Consider two types of social utility function: linear and makespan in parallel link networks • Linear Edge Costs: • Social utility:
Congestion Games PoA • Price of Anarchy with unweighted players, sum social utility function, and linear cost functions is 2.5 (Christodoulou et al. 2005) • Claim: Price of Total Anarchy is the same: “By assuming regret minimization, each player’s time average cost is no better than the cost of best action in hindsight. That is, no better than optimal strategy.”
Congestion Games: PoTA • Proof: for all i: • Summing over all players: • After math:
Congestion Games: PoTA • For atomic congestion games with unweighted players, sum social function, and polynomial latency functions of degree d, PoTA <= dd1-o(1)
Parallel Link Congestion Game • n identical links, k weighted players • Each player pays sum of weights of jobs on link chosen • Social cost is total weight of worst loaded link (makespan):
2 Parallel Links: PoTA • For 2 links, Price of Total Anarchy matches Price of Anarchy = 3/2, but only in expectation
n Parallel Links: PoTA • With n parallel links, PoTA is not the same as PoA • PoTA with makespan utility and n links is Ω(n½), versus O(log n/ log log n) for PoA • Proof: with n links and n players, OPT = 1 • We can construct a situation with negative regret but with maximum latency = Ω(n½)
n Parallel Links: PoTA • Divide the players into groups of size n½/2 and rotate each group to take link 1 • The rest distribute evenly on the remaining links • Each player has average latency 5/4 – ½ (n-½) • If a player plays a fixed link, the average latency is 2 – ½ (n-½) • Therefore, players have negative regret but maximum latency = Ω(n½)
Conclusion • Thank you!