Design of Multi-Agent Systems

Design of Multi-Agent Systems • Teacher • Bart Verheij • Student assistants • Albert Hankel • Elske van der Vaart • Web site • http://www.ai.rug.nl/~verheij/teaching/dmas/ • (Nestor contains a link)

Student presentations

Some practical matters • Please submit exercises to designofmas@gmail.com. • Please use naming conventions for file names and message subjects. • Please read your student mail.

Overview • Introduction • Evaluation criteria & equilibria • Social welfare • Pareto efficiency • Nash equilibria • The Prisoner’s Dilemma • Loose end: dominant strategies Not or differentin the book

Typical structure of a multi-agent system

Interactions • Communication • Influence on environment (‘spheres of influence’) • Organizations, communities, coalitions • Hierarchical relations • Cooperation, competition

Utilities & preferences • How to measure the results of a multi-agent systems? In terms of preferences and utilities. • Some notation: • ={1,2, … } ‘outcomes’, future environmental states • group preferences (assumes cooperation) • individual preferences

Preferences • Strict preferences • Properties Reflexive: Transitive: Comparable:

Utilities • According to utility theory, preferences can be measured in terms of real numbers • Example: money But money isn’t always the right measure: think of the subjective value of a million dollars when you have nothing or when you are Bill Gates.

Utility & money

Simplification: two agents Constant sum games The sum of all players' payoffs is the same for any outcome. ui(w) +uj(w) = C for all wW Zero-sum games All outcomes involve a sum of the players’ payoffs of 0: ui(w) +uj(w) = 0 for all wW Chess 0, ½, 1 -½, 0, ½ Zero-sum & constant-sum games

Zero-sum & constant-sum games • One agent’s gain is another agent’s loss. • Zero-sum games are necessarily always competitive. • But there are many non-zero sum situations.

Overview • Introduction • Evaluation criteria & equilibria • Social welfare • Pareto efficiency • Nash equilibria • The Prisoner’s Dilemma • Loose end: dominant strategies

Kinds of evaluation criteria & equilibria • Social welfare • Pareto efficiency • Nash equilibrium

Social welfare • Social welfare measures the sum of all individual outcomes. • Optimal social welfare may not be achievable when individuals are self-interested • Individual agents follow their own (different) utility function.

Example 1 highest social welfare

Pareto efficiency or optimality • An outcome is Pareto optimal if a better outcome for one agent always results in a worse outcome for some other agent • When all agents pursue social welfare, highest social welfare is Pareto optimal. However, a Pareto optimal outcome need not be desirable. E.g., dictatorship • Pareto improvement: change that is an improvement for someone without hurting anyone

Example 1 Pareto efficient Pareto improvements

Nash equilibrium • Two strategies s1 and s2are in Nash equilibrium if: • under the assumption that agent iplays s1, agent jcan do no better than play s2; and • under the assumption that agent jplays s2, agent ican do no better than play s1. • No individual has the incentive to unilaterally change strategy • Example: driving on the right side of the road • Nash equilibria do not always exist and are not always unique

Example 1 Nash equilibria ‘Nashincentives’

outcomes corresponding to strategies in Nash equilibrium Example 1

Example 2 no Nash equilibrium

unique Nash equilibrium Example 3

unique Nash equilibrium Example 3 highest social welfare & Pareto efficient

The Prisoner’s Dilemma • Two men are collectively charged with a crime and held in separate cells, with no way of meeting or communicating. They are told that: • if one confesses and the other does not, the confessor will be freed, and the other will be jailed for three years • if both confess, then each will be jailed for two years • Both prisoners know that if neither confesses, then they will each be jailed for one year

The Prisoner’s Dilemma • The prisoners can either defect or cooperate. • The rational action for each individual prisoner is to defect. • Example 3 is a prisoner’s dilemma (but note that it tables utilities, not prison years: less years in prison has a higher utility). • Real life: nuclear arms reduction, free riders

The Prisoner’s Dilemma • The Prisoner’s Dilemma is the fundamental problem of multi-agent interactions. • It appears to imply that cooperation will not occur in societies of self-interested agents.

Recovering cooperation ... • Conclusions that some have drawn from this analysis: • the game theory notion of rational action is wrong! • somehow the dilemma is being formulated wrongly • Arguments to recover cooperation: • We are not all Machiavelli! • The other prisoner is my twin! • The shadow of the future…

The Iterated Prisoner’s Dilemma • One answer: play the game more than once • If you know you will be meeting your opponent again, then the incentive to defect appears to evaporate • When you now how many times you’ll meet your opponent, defection is again rational

Axelrod’s tournament • Suppose you play iterated prisoner’s dilemma against a range of opponents…What strategy should you choose, so as to maximize your overall payoff? • Axelrod (1984) investigated this problem, with a computer tournament for programs playing the prisoner’s dilemma

Strategies in Axelrod’s tournament • ALL-D: Always defect • TIT-FOR-TAT: At the first meeting of an opponent: cooperate. Then do what your opponent did on the previous meeting • TESTER: First: defect. If the opponent retaliates, play TIT-FOR-TAT. Otherwise intersperse cooperation and defection. • JOSS: As TIT-FOR-TAT, except periodically defect

Reasons for TIT-FOR-TAT’s success • Don’t be envious:Don’t play as if it were zero sum! • Be nice:Start by cooperating, and reciprocate cooperation • Retaliate appropriately:Always punish defection immediately, but use “measured” force — don’t overdo it • Don’t hold grudges:Always reciprocate cooperation immediately

Dominant strategy • A strategy is dominant for an agent if it is the best under all circumstances • Dominant strategy equilibrium: each agent uses a dominant strategy • A dominant strategy equilibrium is always a Nash equilibrium (but there are ‘more’ of the latter).

Agent • a2 • Strategy • s2,1 • s2,2 • s1,1 • (2,3) •  • (4,5) • a1 •  •  • s1,2 • (1,2) •  • (2,3) Example 4 Dominant for a2 Dominant for a1

B A D C Just to play with: new roads • There are 6 cars going from A to D each day. • (A,B) and (C,D) are highways time(c) = 5 + 2c, where c is the number of cars • - (B,D) and (A,C) are local roads time(c) = 20 + c What will happen when a new highway is made between B and C?

Design of Multi-Agent Systems