170 likes | 324 Views
Concepts of Game Theory II. The prisioner’s reasoning…. Put yourself in the place of prisoner i (or j )… Reason as follows: Suppose I cooperate … If j cooperates , we both get a payoff of 3. If j defects , then I will get a payoff of 0.
E N D
The prisioner’s reasoning… • Put yourself in the place of prisoner i (or j)… • Reason as follows: • Suppose I cooperate… • If jcooperates, we both get a payoff of 3. • If jdefects, then I will get a payoff of 0. Best payoff I can be guaranteed to get if I cooperate is 0. • Suppose I defect… • If jcooperates, I get a payoff of 5. • If j defects, then I will get a payoff of 2. Best payoff I can be guaranteed to get if I defect is 2. • In summary: • If I cooperate the worst case is that I will get a payoff of 0 • If I defect the worst case is that I will get a payoff of 2 • I’d prefer a guaranteed payoff of 2 to a payoff of 0! i j
Features of Prisoner’s Dilemma (1) • The individual rational action is defect • This guarantees a pay-off of no worse than 2 • Whereas cooperating guarantees a pay-off of at most 1. • So, defection is the best response to all possible strategies: • Both agents defect and get a pay-off of 2 • But naïve intuition says this is not the best outcome: • They could both cooperate and each get a pay-off of 3!
Features of Prisoner’s Dilemma (2) • This apparent paradox is the fundamental problem of multi-agent interactions. • It seems to imply that cooperation will not occur in societies of self-interested agents. • A real world example: nuclear arms reduction • The prisoner’s dilemma is ubiquitous (very common!) • Can we recover cooperation?
Arguments for Recovering Cooperation • Some conclusions that have been drawn from this analysis: • The game theory notion of rational action is wrong! • Somehow the dilemma is being formulated incorrectly. • Arguments to recover cooperation: • We are not all Machiavellian! • The other prisoner is my twin! • People are not (always) rational! • The shadow of the future…
The Iterated Prisoner’s Dilemma • One answer: play the game more than once • Let’s use an applet: • If you know you will be meeting your opponent again • Then the incentive to defect appears to evaporate. • Cooperation is the rational choice in the infinitely repeated prisoner’s dilemma
Backwards Induction • Suppose you both know that you will play the game exactly n times • On round n, you have an incentive to defect to gain that extra bit of pay-off. • This makes round n-1 the last “real” game, and so you have an incentive to defect there too • And so on… • When playing the prisoner’s dilemma with a • fixed • finite • pre-determined and • commonly known number of rounds, defection is the best strategy.
Axelrod’s Tournament • Suppose you play the prisoner’s dilemma game against a range of opponents. • What single strategy should you use to play against all these opponents so that you maximise your overall pay-off? • Axelrod (1984) investigated this problem with a tournament for computer programs playing the prisoner’s dilemma. Robert Axelrod http://www-personal.umich.edu/~axe/
Strategies • ALL-D • Always defect — the hawk strategy. • TIT-FOR-TAT • On round u=0, cooperate • On round u>0, copy the opponent’s round u-1 move • TESTER • On round u=0, defect. • If the opponent retaliated, then play TIT-FOR-TAT • Otherwise intersperse cooperation and defection • JOSS • As for TIT-FOR-TAT, except periodically defect
Howto succeedinAxelrod’sTournament Axelrod suggests the following: • Don’t be envious • Don’t play as if it were a zero sum game • You don’t have to beat your opponent for you to do well • Be nice (don’t be the first to defect) • Start by cooperating, and reciprocate cooperation • Retaliate appropriately • Always punish defection immediately, • But use “measured” force — don’t overdo it • Don’t hold grudges • Always reciprocate cooperation immediately
Who wins? • In the 1980s tournament, TIT-FOR-TAT won. • But, when paired with a mindless strategy like RANDOM, TIT-FOR-TAT sinks to its opponent's level. • So, it can’t be seen as a “best” strategy. • The tournament was run again in 2004, and TIT-FOR-TAT did not win. • What strategy won, and why?
Game of Chicken i j • Difference to prisoner’s dilemma: • Mutual defection is the most feared outcome. • Strategies (C,D) and (D,C) in Nash equilibrium.
The Stag Hunt (1) • You can hunt deer (cooperate) or hare (defect) • Only if both cooperate will they succeed in catching the deer and receive the maximum pay-off. i j
The Stag Hunt (2) • A pessimist would always hunt hare. • A cautious player who is uncertain about what the other player will choose to do would also hunt hare. • For agents to cooperate in the Stag Hunt, there must be a measure of trust between them. • This measure of trust is a kind of social contract between the players; a contract that requires prior agreement.
A Variation of the Prisoner’s Dilemma • A spatial variant of the iterated prisoner's dilemma • A model for cooperation vs. conflict in groups • It shows spread of • altruism • exploitation for personal gain in an interacting population of agents learning from each other • Initially population consists of cooperators and a certain amount of defectors • Advantage of defection is determined by value of b in the 'payoff matrix' • A player determines its new strategy by selecting the most favourable strategy from itself and its direct neighbours
Variation of the Prisoner’s Dilemma • Applet:
Recommended Reading • An Introduction to Multi-Agent Systems, M. Wooldridge, John Wiley & Sons, 2002. Chapter 6. Also check: • Various applets for the prisoner’s dilemma: http://www.gametheory.net/applets/prisoners.html • Spatial variant of the iterated prisoner’s dilemma: http://prisonersdilemma.groenefee.nl/ • Software for Axelrod’s Tournament: http://www.econ.iastate.edu/tesfatsi/demos/axelrod/axelrodt.htm