240 likes | 255 Views
Explore the evolution of cooperation through a computational model focusing on stable strategies, with CUDA and Blue Gene GPU implementations, results, and future directions.
E N D
PARALLEL MODEL OF EVOLUTIONARY GAME DYNAMICS Amanda Peters MIT 18.337 5/13/2009
Outline • Motivation • Model • GPU Implementation • Blue Gene Implementation • Hardware • Results • Future Work
Motivation • Why does cooperation evolve? • Examples: • Total War vs. Limited War • Quorum Sensing Bacteria • Pathogens • Goal of the project: • Create computational model to test role of behavioral strategies and related variables
Model • Focus on finding evolutionarily stable strategies • Five strategies: • Mouse • Hawk • Bully • Retaliator • Prober-Retaliator • Payoffs • Win +60 • Seriously Injured -100 • Small Injuries Each -2 • Emerge from Short Game uninjured +20
Why parallelize it? • Reduce computational time • Enable trials of more strategies • Enable analysis of different variables roles • Introduce more actions to the action space
CUDA Implementation • Embarrassingly parallel code • Distribute rounds of the game to different threads • Only payoff array in global memory • Copy it back for post processing
Sample Code __global__ void gameGPU(int player1, int player2, float* d_payoff1, float* d_payoff2,float* rand_si, int max_rounds){ //Thread index __global__ void gameGPU(int player1, int player2, float* d_payoff1, float* d_payoff2,float* rand_si, int max_rounds){ //Thread index const int tid=blockDim.x * blockIdx.x + threadIdx.x; //Total number of threads in grid const int THREAD_N = blockDim.x * gridDim.x; int max_moves=500; for (int round = tid; round < max_rounds; round += THREAD_N) { play_round(player1, player2, d_payoff1[round], d_payoff2[round], rand_si[round],max_moves); } }
Design Fundamentals • Low Power PPC440 Processing Core • System-on-a-chip ASIC Technology • Dense Packaging • Ducted, Air Cooled, 25 kW Racks • Standard proven components for reliability and cost
Blue Gene/L System BG/P Rack 32 node cards Node card 180/360 TF/s 32 TB (For the original 64 rack system) (32 chips 4x4x2) 16 compute, 0-2 IO cards 2.8/5.6 TF/s 512 GB Compute card 2 chips, 1x2x1 90/180 GF/s 16 GB Chip 2 processors 5.6/11.2 GF/s 1.0 GB 2.8/5.6 GF/s 4 MB
Node Card (32 chips 4x4x2) 32 compute, 0-1 IO cards Blue Gene/P System Cabled 8x8x16 Rack 32 Node Cards 1 PF/s 144 TB 14 TF/s 2 TB Compute Card 1 chip, 20 DRAMs Key Differences: • 4 cores per chip • Speed bump • 72 racks (+8) 435 GF/s 64 GB Chip 4 processors 13.6 GF/s 2.0 (or 4.0) GB DDR 13.6 GF/s 8 MB EDRAM
BG System Overview: Integrated system • Lightweight kernel on compute nodes • Linux on I/O nodes handling syscalls • Optimized MPI library for high speed messaging • Control system on Service Node with private control network • Compilers and job launch on Front End Nodes
Blue Gene/L interconnection networks 3 Dimensional Torus • Interconnects all compute nodes (65,536) • Virtual cut-through hardware routing • 1.4Gb/s on all 12 node links (2.1 GB/s per node) • Communications backbone for computations • 0.7/1.4 TB/s bisection bandwidth, 67TB/s total bandwidth Global Collective Network • One-to-all broadcast functionality • Reduction operations functionality • 2.8 Gb/s of bandwidth per link; Latency of tree traversal 2.5 µs • ~23TB/s total binary tree bandwidth (64k machine) • Interconnects all compute and I/O nodes (1024) Low Latency Global Barrier and Interrupt • Round trip latency 1.3 µs Control Network • Boot, monitoring and diagnostics Ethernet • Incorporated into every node ASIC • Active in the I/O nodes (1:64) • All external comm. (file I/O, control, user interaction, etc.)
C/MPI Implementation of Code • Static Partitioning of work units • work_unit = number_rounds/partition_size • Each node will get a chunk of the data • Loops that in serial iterate over the length of the game will now be split up to handle specific rounds • ‘Bookkeeping Node’ • MPI Collectives to coalesce data
Pseudo Code • Foreach species: • Foreach species: • gamePlay(var1…); • MPI_Reduce(var1…); • If (rank==0) Calculate_averages(); • If (rank==0) Print_game_results;
Game Dynamics • Evolutionarily Stable Strategies: • Retaliator • ~Prober-Retaliator • Result: • ‘Limited War’ is a stable and dominant strategy given individual selection
CUDA Implementation 97% time reduction
Blue Gene Implementation 99% time reduction
Future Directions • Investigate more behavioral strategies • Increase action space • CUDA implementation: data management • Blue Gene implementation: • Examine superlinearity • Test larger problem sizes • Optimize single node performance