310 likes | 489 Views
Iterated Prisoner’s Dilemma Game in Evolutionary Computation. 2003. 10. 2 Seung-Ryong Yang. Agenda. Motivation Iterated Prisoner’s Dilemma Game Related Works Strategic Coalition Improving Generalization Ability Experimental Results Conclusion. Motivation. Evolutionary approach
E N D
Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang
Agenda • Motivation • Iterated Prisoner’s Dilemma Game • Related Works • Strategic Coalition • Improving Generalization Ability • Experimental Results • Conclusion
Motivation • Evolutionary approach • Understanding complex behaviors by investigating simulation results using evolutionary process • Giving a way to find optimal strategies in a dynamic environment • IPD game • Model complex phenomena such as social and economic behaviors • Provide a testbed to model dynamic environment • Objectives • Obtaining multiple good strategies • Forming coalition to improve generalization ability
Iterated Prisoner’s Dilemma Game (1/2) • Overview • Prisoner’s possible choice • Defection • Cooperation • Characteristics • Non-cooperative • Non-zerosum • Types of Game • 2IPD (2-player Iterated Prisoner’s Dilemma) game • NIPD (N-player Iterated Prisoner’s Dilemma) game Payoff Matrix of 2IPD Game by Axelrod, R.(1984)
0 1 0 ∙∙∙ 1 Iterated Prisoner’s Dilemma Game (2/2) • Representation of Strategy Own History Opponent’s History History Table Recent Action ∙∙∙ Last Action Recent Action ∙∙∙ Last Action 2NHistory l = 2 : Example History 11 01
Related Works • Previous Study • Paul J. Darwen and Xin Yao (1997) : Speciation as Automatic Categorical Modularization • Onn M. Shehory, et al. (1998) : Multi-agent Coordination through Coalition Formation • Y. G. Seo and S. B. Cho (1999) : Exploiting Coalition in Co-Evolutionary Learning • Issues • Topics are broad about coalition formation in multi-agent environment • Darwen and Yao have studied coalition in IPD game, but different • Focused on cooperation, the number of player, payoff variances, etc
What is Different? • Co-evolutionary Learning • Selection Method • Rank Based • Roulette wheel • Tournament • Coalition Formation • Coalition keeps surviving to next generation • Condition to form coalition is flexible • Decision Making in Coalition • Adapting several decision making methods to coalition • Borda Function, Condorect Function • Average Payoff, Highest Payoff • Weighted Voting
Evolving Strategy • To evolve strategy, we use ; • Genetic algorithm • Co-evolutionary learning • Strategic coalition • Evolutionary Process
C1 C1 Cj C1 Cj Ci Ci Ck Ck Evolution of Agents (1/2) • Evolution of Agents • Agents can develop their strategy using co-evolutionary learning • Weak agents are removed from the population • Evolution of Coalition • Formed coalition survives to next generation • Agents can join coalition generation by generation Before Population Current Population Next Population Ci Cl Ck Coalition survives or grows up
Evolution of Agents (2/2) • Problem : Possibility of evolving by weak agents • Caused by removing better agent from the population who belongs to coalition Making new agents by mixing better agents within coalition Repeat as the number of agents belong to coalition A1 Ci Random Extraction Ai Population Ck Cj Mutation A2 Coalition
Strategic Coalition (1/2) • What is Coalition? • A cooperative game as a set A of agents in which each subset of A is called coalition-Matthias Klusch and Andreas Gerber, 2002 • A group of agents that work jointly in order to accomplish their tasks-Onn M. Shehory, 1995 • Coalition in the IPD game • Forming coalition through round-robin game • Pursuing more payoff using generalization ability • Coalition forms autonomously without supervision
Strategic Coalition (2/2) • Definitions • Definition 1 : Coalition Value • Definition 2 : Payoff Function • Definition 3 : Coalition Identification • Definition 4 : Decision Making • Definition 5 : Payoff Distribution (1) (2) (3)
2IPD game Form Coalition Population Including coalition Initial Population A1 C1 C1 A5 A2 A2 A4 An A3 Aj Ak A1 C2 C2 Aj Al Ak Ai A3 Am . . . . . . Ci A5 Ai A4 Am . . . . . . Ci An . . . C1 C2 Ci Ai A5 A5 Coalition Formation (1/2)
Coalition Formation (2/2) Y Satisfy condition? • Algorithm Stop N • Forming coalition • Round-robin 2IPD game • Obtain rank • Determine confidence of agent according to the rank • Joining coalition • Round-robin 2IPD game • Obtain rank • If number of agents > max. number of agents within a coalition, remove the weakest agent • Determine confidence of each agent Exceeds iteration per generation? Y N 2IPD Game N Satisfy condition for forming coalition? Y Game type? Coalition vs. Coalition Agent vs. Coalition Agent vs. Agent Forming Coalition Joining Coalition Genetic Operation
Previous Action Next Action Ci Ci C ∑ Cj Cj or Ck Ck D ∑ Cl Cl Coalition Decision Making • Decision making • To decide coalition’s opinion • Use weighted voting method • Sharing profits • Distribution payoff with each agent’s confidence • Rank influences each weight • Determining next action of coalition • : Weight for cooperation of coalition Ci • : Weight for defection of coalition Ci
Previous Action Next Action Ci Ci C ∑ Cj Cj or Ck Ck D ∑ Cl Cl Weight of Agents • Adjusting weight • Give incentive to agents in coalition • It reflects decision making of coalition Adjusting weight
Improving Generalization Ability (1/2) • Problem of one good strategy • Not adaptive to dynamic environment Obtain multiple good strategies for specific environment Ex) Biological immune system • Method • Fitness sharing Adjust confidences of multiple strategies by evolution • Co-evolution • Coalition formation
1 0001110... 2 0000100... 3 0100100... 4 0001100... 5 0010010... 10 0000010... Improving Generalization Ability (2/2) • How good a player performs against unknown player • Evaluation Random Generation of 100 Strategies IPD Game 2IPD Game Extract Top Strategies in the Population Top Strategies Genetically Evolved Strategies
Test Strategy • Test Strategies • Example Strategy Tit-for-Tat CDCD 0 0 1 0 1 1 0 0 0 1 0 1 0 1 0 1 Trigger CCD 0 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 AllD Random 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1
0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Example of Game Evolved Strategy Vs. Tit-for-Tat 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 history 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 history 1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 0 1 0 0 1 1 0 0 0 1 0 0 1 1 1 0 1 1 0 0 0 1 1 1 2 3 4 5 2 3 4 5 Payoff Payoff 3 5 1 1 1 3 0 1 1 1
Experimental Result Test Environment • Population size : 100 • Crossover rate : 0.3 • Mutation rate : 0.001 • Number of generations : 200 • Number of iterations : a third of population • Training set : Well-known 6 strategies
Experimental Result Evolved Strategy vs. Random Random strategy is one of the weakest strategies for 2IPD game. In this game, the evolved strategies have a good performance. All strategies win the game against Random test strategies with high payoffs.
Experimental Result Evolved Strategy vs. Tit-for-Tat Tit-for-Tat is a mimic strategy that gives “cooperation” on the first move in 2IPD game. The evolved strategies counteract in a proper way not to lose the game. It proves the generalization ability of the evolved strategies well.
Experimental Result Evolved Strategy vs. Trigger Trigger strategy is never forgiving strategy for opponent’s defection. The way to win a game against Trigger is also choosing “defection” iteratively.
Experimental Result Evolved Strategy vs. AllD The only way not to lose the game against AllD is only choosing “defection” on all moves. There is no way to cooperate for the game.
Experimental Result Number of Coalition Coalition Generation Coalition survives next generation. In early evolutionary process, most of coalition are formed. It makes genetic diversity high and better choice against opponents. Coalition can grow if the conditions of agents are satisfied.
Experimental Result Comparing the Results The evolved strategies get more payoff against Random, CCD and CDCD than Tit-for-Tat, Trigger and AllD. It describes the evolved strategies exploit opponent’s actions well.
Experimental Result Bias of the Strategy Bias Generation Bias shows how next choice of the strategies is selected against its opponents. The higher rate of bias means that a strategy chooses more “cooperation” than “defection” with a bias rate and vice versa.
Conclusions • Conclusion • Strategic coalition might be a robust method that can adapt to a dynamic environment • Decision making methods influence the results, but not serious • The evolved strategies by coalition generalize well against various opponents • Discussion • Can the strategic coalition be adapted to n-IPD game ? • Which parameters in IPD game influence generalization ability ? • How can make opponent strategies to test ? • How can adapt this problem to real world ?
Examples (1) • Market Observer
Examples (2) • Forest Prediction