300 likes | 463 Views
Cooperative Agent Systems: Artificial Agents Play the Ultimatum Game. Steven O. Kimbrough Presented at FMEC 2001, Oslo Joint work with Fang Zhong and D.J. Wu. Research Motivation. How design and control cooperative agent systems in strategic situation.
E N D
Cooperative Agent Systems: Artificial Agents Play the Ultimatum Game Steven O. Kimbrough Presented at FMEC 2001, Oslo Joint work with Fang Zhong and D.J. Wu
Research Motivation • How design and control cooperative agent systems in strategic situation. • How well do different identify-centric agents perform against each other. • How well do various adaptive mechanism perform. • Value of intelligence: What intelligence buys you?
Methodology • Adaptive artificial agents play iterated ultimatum game. • Ultimatum game is the most fundamental building block for negotiation (e.g., Croson, 1996) • Reinforcement learning (a simple version) • Regimes of play • Two agents play against each other • Populations of different type of agents
A x B Reject Accept (N-x, x) (0,0) One-shot Ultimatum Game • Two players A and B. • Player A has endowment of N. • Player A offers x[0, N] (N = 100 in this study) • Player B can either accept the offer or reject the offer.
One-shot Ultimatum Game (Cont.) • Classical Game Theory • Player A offer a tiny amount , and player B will always accept this offer. • Infinite number of Nash Equilibria along the line of x + y = N. • Behavior Game Theory • Human beings in the lab do not behave as classical game theory predicted (e.g, people tends to be fair, and reject offers that do not meet their threshold amounts of share).
Repeated Ultimatum Game • A “supergame” consists of iterations of the ultimatum game. • Indefinite episodes • Agents do not know how many iterations are yet to come. • No single best strategy for the repeated ultimatum game.
Reinforcement Learning • Favoring actions producing better results. • Estimating the values of state-action pairs. • Sample-average for estimation/evaluation. • -greedy for selection.
Reinforcement Learning (Cont.) • Algorithm Initialize Q(s, a) = 0 Repeat for each episode Choose action a from current state Receive immediate payoff r, and arrive at the next state. Q(s, a) <- QB(s, a)*(k-1)/k + r/k Until n episodes have been played.
Experiment 0: Repeated One-Shot Game • Agents have no memory of past actions. • Agents find the game-theoretic result. • No cooperation among agents.
Experiment 1: Learning Agent Against Fixed Rules • Fixing player B’s strategy IF (currentOffer >= p * Endowment) Accept currentOffer. ELSE Reject currentOffer. 0 < p < 1
Experiment 1 (Cont.) • Player A will propose an offer no greater than his last offer if player B accepted his last offer. • Player A eventually learns the value of p, and proposes only the amount of pN.
1 2000 5000 7000 10000 p 0.40 0.35 0.45 0.60 0.40 The values of p in different episodes Experiment 2: Learning Agent Against Dynamic Rules • The value of p is changing along the game playing period. • Agent A can track the change very well given enough time periods
Experiment 3: Learning Agent Against Rotating Rules • The value of p is changing with a rotating pattern, i.e. pt-1 = .40, pt = .50, pt+1 = .60. • Player A converges to a proposal of 60 which the highest value of p * 100. • Memory of at least one previous move might lead player A track the rotated rules.
Experiment 4: Learning Simultaneously • Both agents have memory of one previous move. • Player B chooses the value of p for each episode according to: IF bt –1 is “accept” THEN pt = dt-1 / N ELSE pt [0, N] / N
Accept d Accept d*a Reject d*b Reject Accept Reject Experiment 4 (Cont.) • Decision-making process using finite automata Agent A:
C C p = d p D D p* C D Experiment 4 (Cont.) Agent B:
Experiment 4 - Result • Cooperation emerges through co-evolution within 2000 episodes. Player A converges at proposing 55 or 56, and correspondingly, player B converges at setting his lower limit at 55 or 56.
Value of Intelligence • Will smart agents be able to do better than dumb ones through learning? • Experiment: • 5a: A population of smart agents play against a population of various dumb agents • 5b: A population of smart agents play against each other and against a population of various dumb agents.
Experiment 5a: One Smart Agent vs. Multiple Dumb Agents • Three types of dumb agents using fixed rules: • db1: demand/accept 70 or higher; • db2: demand/accept 50 or higher; • db3: demand/accept 30 or higher. • Smart agent learns via reinforcement learning. • There is 25 percent possibility that a smart agent can be chosen to play the game. • Tracking the changing population of dumb agents for each generation.
Experiment 5a : Process • Draw one smart agent with 25 percent possibility; otherwise draw one dumb agent randomly in proportional to their frequency. • Draw another dumb agent randomly in proportional to their frequency. • Decide the role of each agent (proposer or responder). • Agents play the one-shot game against each other. • Go to the first step until a certain number of games, e.g. 1000 episodes, has been completed. • Update frequency of the dumb agent.
Experiment 5a – Results. • Fair dumb (db2: demand/accept 50 or higher) agents take over the dumb agent population. • Smart agents learn to be fair.
Experiment 5b: Multiple Smart Agents vs. Dumb Agents • Smart agents can play against each other.
Impact of Memory • Repeat experiment 5a and 5b, but introduce different memory size for each experiment.
Conclusions • Artificial agents using reinforcement learning are able to play the ultimatum game efficiently and effectively. • Agent intelligence and memory have impacts on performance. • Agent-based approach replicates and explains real human behavior better.
Future Research • Toward cooperative agent systems in strategic situations in virtual communities, especially in electronic commerce such as in supply chains. • Currently investigating two versions of the trust games: “The classical economic trust game” vs. “The Mad Mex Game”. • Comments?