320 likes | 523 Views
Using Genetic Programming to Evolve Sumobots Shai Sharabi Dept. of Computer Science Ben-Gurion University, Israel. Overview. Introduction Sumobot system description GP system description Preparatory steps Evolutionary process Results Conclusions. Sumo history.
E N D
Using Genetic Programming to Evolve Sumobots Shai Sharabi Dept. of Computer Science Ben-Gurion University, Israel
Overview • Introduction • Sumobot system description • GP system description • Preparatory steps • Evolutionary process • Results • Conclusions
Sumo history • Sumo has its roots in the Shinto religion (800 A.D). • There are two principal ways to win a Sumo bout: • The first wrestler to touch the ground outside the circle loses • The first wrestler to touch the ground with any part of his body other than the soles of his feet loses A Sumo match (Ozeki Kaio vs. Tamanoshima in May 2005).
Sumobot rules • Sumobot contests are hosted in Seattle www.robothon.org • 2 robots try to push each other outside the arena boundaries
Sumobots • The complexity of robot behavior • Studies in Evolutionary Robotics (Nolfi & Floreano) • Floreano and Mondada utilized a Khepera robot to validate a neural network system evolved using a genetic algorithm – for navigation and obstacle avoidance • Liu & Zhang, Multi-Phase Genetic Programming: A Case Study in Sumo Maneuver Evolution
Sumobot System Description Two overhead web cameras, each connected to its own computer. Each computer transmits via its remote controller the maneuver commands to the respective sumobot, which then acts accordingly.
GP System DescriptionPreparatory steps • Determining the set of terminals • Determining the set of functions • Determining the fitness measure • Determining the parameters for the run (population size, number of generations, minor parameters) • Determining the method for designating a result and the criterion for terminating a run
w1 … w7– empirically derived weights count - number of iterations in one fight radius – distance between robot’s starting & farthest location sticky – rewards spending time close to the target closer – rewards approaching the target speed – rewards higher speed push – rewards pushing the opponent bonuspush – rewards faster wining programs exploring – rewards exploring the grounds bonustay - is a bonus added for staying in the arena Fitness Function (step 3) w1radius(count) + w2sticky(count)+ w3closer(count) + w4speed(count)+ w5push(count) + w6bonuspush(count)+ w7exploring(count)+bonustay
Linear Ranking Selection Based on sorting of individuals by decreasing fitness The probability to be extracted for the ith individual in the ranking is defined as whereb can be interpreted as the expected sampling rate of the best individual
GP System DescriptionEvolutionary Process • Creating random initial population using functions and terminals • Running and evaluating all the programs using the fitness measure • If termination criterion satisfied for the run stop the evolutionary process • Select programs and apply genetic operations to them • Goto step 2
Initial Population • Ramped half-and-half • Using md = 8 [max depth] • Divide the population evenly to md-1 bins • Half the population in each bin is created using the “Grow” method, the other half using the “Full” method. • Each bin is given a new md’ starting with md’=2 up to md’=md
Typical Results of Batch A • Evolved fighter rotates toward the target and approaches it if starting from a position where • < -11 • b) Evolved fighter circles widely unconditionally • c) Simplified program of robot (a) • d) Simplified program of robot (b)
Typical Result of a Fight in Batch B • Demonstration of a fight between two individuals from the 44th generation of Sumo1 and generation 39 of Sumo2 • b) Sumo1's simplified code (bottom robot) • c) Sumo2's simplified code (top robot)
Batch C Details • At some point (criteria dependent*) we continue evolution with a different, more demanding fitness function (like higher score for pushing and fast wining) • Start with fitness function which “easily” evolves simple sumo strategies (like exploring the arena)
Changes of Dynamic Fitness in Batch C Dynamic fitness computation: After 10 generations the fitness weights were adjusted to assign each fitness component with a new range.
Typical Results of Batch C a) A fight at generation 3 b) A fight at generation 7 c) Left robot scored Yuko at generation 13. This is the highest possible score, given when one contender manages to push its opponent out of the arena
Typical Fitness Progress of One Experiment in Batch C After 10 generations the fitness computation changed
Typical Evolved Code in Batch C PPMMOTORSPEED manuver(int x1, int y1, int x2, int y2, int angle){ if(ifl( angle , 7 )) {if(ifl( negative_1( sdiv( negative_1( plus( x2 , angle ) ) , minus( mul( 0 , y1 ) , mul( y2 , y1 ) ) ) ) , minus( negative_1( abs_1( 7 ) ) , x2 ) ) ) {if(ifl(abs_1( plus( minus( 0 , angle ) , negative_1( y2 ) ) ) , abs_1( mul( mul( 0 , 0 ) , x2 ) ) )) {return spin( negative_1( mul( abs_1( x1 ) , plus( x2 , x2 ) ) ) );} else {if(ifl( x1 , abs_1( negative_1( plus( y2 , 0 ) ) ) )) {if(ifl( abs_1( y1 ) , sdiv( x2 , abs_1( angle ) ) )) {if(ifl( plus( angle , x2 ) , x1 )) {if(ifl( y2 , x1 )) {return moveLW( y1 ); }else {if(ifl( 0 , angle )) {return spin( x2 ); }else {return moveFree_2( 0 , y1 ); } } }else {return spin( 0 ); } }else {return moveRW( abs_1( y2 ) ); } }else {if(ifl( sdiv( sdiv( x2 , x1 ) , minus( y1 , 0 ) ) , negative_1( negative_1( y2 ) ) )) {return spin( minus( 0 , x1 ) ); }else {return spin( minus( y2 , angle ) ); } }} } else {return spin( negative_1( mul( abs_1( x1 ) , plus( x2 , angle ) ) ) ); }} else {return moveLW( x1 ); }}
Typical Results of Batch D • Evolved bot fights avoid-contact opponent • Evolved bot fights pushing opponent and scores a Yuko • Evolved bot fights spinning opponent • Fitness graph of a run that produced evolved sumobot. The drop of fitness in generation 12 was due to a mechanical failure in the sumo wheels. Nevertheless evolution overcame this problem and slowly yielded an adapted sumo program.
Movies Presentation from Batch C (co-evolution) • Dancing Approach • Riding the Enemy • Wining Yuko
Conclusion • GP can be utilized to evolve sumo fighter strategies for a simple robot with only 5 input terminals • All 4 Batches yielded positive results • Fitness oscillation • Convergence • Comparing results to others
Future Work • Using more difficult to operate platforms • Adding more specific domain terminal (e.g., past location) • Using ADF