420 likes | 576 Views
Best Reply Mechanisms. Justin Thaler and Victor Shnayder. What are best-reply dynamics?. Start with an arbitrary strategy profile In each step let some player switch his strategy to be a best reply to the current strategies of the others. What are best-reply dynamics?. Definition:
E N D
Best Reply Mechanisms • Justin Thaler and Victor Shnayder
What are best-reply dynamics? • Start with an arbitrary strategy profile • In each step let some player switch his strategy to be a best reply to the current strategies of the others.
What are best-reply dynamics? • Definition: • A repeated-reply mechanism for a private info game G: • Extensive form game with perfect recall (same players) • At most M steps. In each step: • A single player announces an element of Ai • Players play in round-robin order • Stop when all players “pass” in n consecutive steps. • Enforce action profile of the most recently announced actions • If M steps go by without stopping, penalize the players.
What are best-reply dynamics? • Need a penalty to ensure non-convergence is not in best interest of any player. • Realistic modeling assumption for BGP, TCP, etc. • Best-reply dynamics is the strategy profile of a repeated-reply mechanism in which each player i updates to i’s best-reply to the other players’ strategies each time it is i’s turn.
Why best reply dynamics? • If convergence occurs, we have a highly justifiable Nash Equilibrium • Computationally simple • Players only need private information • Feasible in distributed, asynchronous settings • Prescribed by existing protocols (Ex: BGP)
Why best reply dynamics? • In light of Theorems 1 and 2 (which we’ll see soon): • Often gives a non-VCG way of creating incentive compatible mechanisms (?). And sometimes without $$$. • Often get collusion-proofness, Pareto-efficiency
Outline • When do best reply dynamics work? • Universal max-solvability (UMS) • Thm: UMS implies convergence to unique NE, collusion-proofness • Example applications (correlated markets, BGP, etc) • Connections to strategy-proofness • Discussion
Universal max-dominance • A subset T of S is universally max-dominated if: • Very strong condition! • Existence of max-dominated set is strictly stronger than existence of dominated strategy. • Exists si, si’ s.t. ui(si, s-i) < ui(si’, s-i) for all s-i
Universal max-solveability (UMS) • A game G is universally max-solvable if we can iteratively remove universally max-dominated strategy sets and get to a single strategy for each player. • Stronger condition than solvable by iterated removal of strictly dominated strategies (IRSDS)
Example 1 Solvable by IRSDS, but not UMS. Neither player has a universally max-dominated set. Note unique NE is not PE, and best-reply dynamics are not incentive compatible for the row player.
Example 2 UMS
Example 2 UMS
Example 2 UMS
Example 3 (UMS) L M R A B C
Example 3 (UMS) L M R A B C
Example 3 (UMS) L M R A B C
Example 3 (UMS) L M R A B C
Example 3 (UMS) L M R A B C
Theorems Theorems Theorem 1: G is UMS ⇒ G has unique, pure NE, and it is collusion-proof. Corollary: Collusion-proof NE ⇒ NE is Pareto optimal Note that solvable by IRSDS suffices for unique, pure NE. UMS is needed for collusion-proofness and PE.
Proof of theorem 1: • By contradiction: G is UMS, so fix an elimination sequence of dominated strategy-sets. • Let s* be the final strategy profile. • If s* is not collusion proof NE, some set of players T can deviate and be better off. • Let s be new strategies where players in T change strategy from s* • Let si be first strategy eliminated. Then it was max-dominated, so si* is strictly better, so i can’t be better off.
Example 1 Solvable by IRSDS, but not UMS. Neither player has a universally max-dominated set. Note unique NE is not PE, and best-reply dynamics are not incentive compatible for the row player.
Theorems Theorems Theorem 2: If G is UMS with private information, then best reply dynamics are incentive-compatible in ex-post NE, and converge to the unique NE of the induced full-information game. Proof: Similar to Theorem 1. The main idea is that a strategy eliminated in the t‘th step of the UMS elimination process can never be used after the nt’th step of the best-reply mechanism.
Correlated two-sided markets • Agents: buyers and sellers • Game: weighted bipartite graph -- buyers on one side, sellers on the other • Buyers have preference order over sellers (higher edge weight = higher preference) • Sellers prefer buyers connected by heavier edges
Correlated two-sided markets are UMS • Let e be maximum weight edge. Choosing it universally max-dominates all other strategies of both endpoints. • Remove the two endpoints of e and all incident edges, repeat. • Therefore, best reply dynamics converge to ex-post NE.
d 1 2 Internet routing: BGP • Receive update messages from neighbours announcing routes to d. • Choose a single neighbor, whose route you prefer most, to send traffic through. • Announce your new route to all your neighbors 12d 1d 21d 2d
Internet routing: BGP • BGP is asynchronous, distributed • Prescribes best-reply dynamics • But does BGP converge? • And is BGP “incentive compatible”? Do ASes have an incentive to deviate from the protocol?
Does BGP Converge? • We can break this into two questions: • Does a stable solution even exist in the static game? • If so, will BGP find such a solution? • But we only need one answer.
d 1 2 3 Does a Stable Solution Exist? 21d 2d 13d 1d It is actually NP-complete to determine existence in general networks No stable solution exists! 32d 3d
d 1 2 Does BGP Converge When A Stable Solution Exists? 12d 1d 21d 2d • Notice that multiple NE exist. • And asynchronous best-reply dynamics do not necessarily converge. • So must not be UMS.
So What Do We Do? • Approach #1: Use mechanism design to achieve IC convergence, but solution must be distributed. • Approach #2: Identify conditions (on network topology and/or AS preferences) under which BGP converges and is IC. • Both approaches are canonical problems in Distributed Algorithmic Mechanism Design.
Approach #2 for Convergence • Griffin et al. (1999): If BGP fails to converge, then there exists a Dispute Wheel. • Each ui would rather route clockwise through ui+1 than Qi Image Source: Levin et al. “Internet Routing and Games,” 2008.
Approach #2 for Convergence • Gao and Rexford (2001): Identified reasonable conditions based on economic structure of the Internet that guarantee No Dispute Wheel and hence convergence. (No bounds on convergence rate given). • But limited progress made until recently on conditions for guaranteeing that BGP is IC.
Approach #2 for Incentive Compatibility • Theorem 3: Assuming non-convergence after n3 rounds is a penalty, and No Dispute Wheel holds, then routing games are UMS. • Corollary: Under the above conditions, best-reply strategies are IC in collusion-proof ex-post NE. • Corollary: Under the Gao-Rexford conditions, BGP converges in O(n3) time and is IC.
Theorem 3 • Proof sketch: The case of finding the first universally max-dominated action set is general. • Find a node a1 with at least 2 actions. Let R be a1’s most preferred existing route. One of two cases must occur:
Theorem 3 • Every node a2 on R prefers the suffix of R leading from a2 to d. In this case, if u is the closest node to d on R with at least two actions, then (u, d) universally max-dominates all other actions of u, and we’re done. • 2. Some node a2 on R prefers some other path over the suffix of R leading from a2 to d. In this case, we repeat the analysis at a2. Eventually we either form a dispute wheel or find ourselves in Case 1.
What’s left in Routing? • Complete characterization of BGP convergence (No Dispute Wheel sufficient, not necessary). • Conditions for convergence to globally optimal solution. Can it even be efficiently found? • Do mechanism design and/or $$$ have a role to play? • Changes in network topology?
Other applications • Congestion control • Criticism: Best-reply dynamics are only somewhat descriptive of how TCP works in practice. • Cost sharing games • Matching games (stable-roommate, intern assignment) • Auctions (unit demand bidders, GSP) • Relies a lot on VCG results • Main contribution is proof of convergence! (opposite of BGP)
Play s(θ) Ex-post NE θ Outcome Relationship to DSIC Given UMS game, best-replying is a strategy that gives ex-post NE. Get a direct-revelation, dominant strategy IC mechanism. Good: New way to create DSIC mechanisms. Bad: Impossibility results limit the class of problems amenable to this approach (at least without money or limits on preferences).
Discussion • What is the main contribution? • 1. Sufficient conditions for IC convergence of best-reply dynamics. General enough to encompass many applications, esp. BGP. • 2. Bounds on time to convergence. • 3. New framework for developing IC mechanisms?
Next Steps • Necessary conditions for best-reply dynamics to converge? To be IC (under what definition?)? • Better-reply dynamics? Other types of dynamics aka algorithms? What types of dynamics are reasonable or “natural”?
Economists and Complexity • See recent blog post by Noam Nisan: Does complexity of equilibria matter? • Kamal Jain: “If your laptop can’t find it then neither can the market“. • Jeff Ely: “Solving the n-body problem is beyond the capabilities of the world’s smartest mathematicians. How do those rocks-for-brains planets manage to do pull it off?“