410 likes | 545 Views
Yukiyoshi Fujita Ichiro Suzuki Satoshi Fujita Hajime Asama Masafumi Yamashita. Learning-Based Automatic Generation of Collision Avoidance Algorithms for Multiple Autonomous Mobile Robots. Abstract. This is a discussion about an automatic generation of a collision avoidance algorithm:
E N D
Yukiyoshi Fujita Ichiro Suzuki Satoshi Fujita Hajime Asama Masafumi Yamashita Learning-Based Automatic Generation of Collision Avoidance Algorithms for Multiple Autonomous Mobile Robots
Abstract • This is a discussion about an automatic generation of a collision avoidance algorithm: • Effective algorithm for two robots that simulates human trial and error • Usage of a reward function that is also learned by the robots • Sole usage of the sensor's output 236805 - Seminar in CS (Robotics)
Abstract (cont.) • How a robot can use its gained “experience” for a more complex environment • Usage of reduced state space • Usage of Omni-directional robots • Comparison of theoretical results to the actual results 236805 - Seminar in CS (Robotics)
Introduction • An autonomous multi-robot system is on in which: • No fixed “leader” - each robot is self driven only by it’s own design & data • Each robot adjusts itself independently. • This is an advantage when it comes to failures, scalability, communication overhead etc. • On the other hand the design of algorithms is more difficult 236805 - Seminar in CS (Robotics)
Introduction (cont.) • The discussed robots have eight sensors • Each sensor can detect: • A nearby object (robot or wall) • Direction of the object’s motion (out of eight) • Speed of the object (out of three) • The above results a state space of the sensors outputs that consist (8*3+2)8 states 236805 - Seminar in CS (Robotics)
Introduction (cont.) • This motivates a combined research of: • Collision avoidance algorithm • Automatic reduction of states (by automatic state merging) 236805 - Seminar in CS (Robotics)
Introduction (cont.) • A robot in an unknown environment repeatedly evaluates its performance • The more successful actions (from the past) are more likely to be chosen • We will investigate how the above robots autonomously organize the state space & generate a collision avoidance algorithm based on reduced state space 236805 - Seminar in CS (Robotics)
Introduction (cont.) • We will examine a simulated naive human trial & error learning algorithm and see it presents relatively good results • All algorithm parameters are adjusted without any external intervention • A discussion about how robots can use their experience for a more complicated environment (three robots) will be held 236805 - Seminar in CS (Robotics)
Introduction (cont.) • In addition to the theoretical discussions and experiments we will hold physical experiments as well • Results will show very high probability of collision avoidance - especially for two robots • The algorithm works reasonably well for the case of three robots 236805 - Seminar in CS (Robotics)
The Model of the Robots • The discussed Omni-directional robots have 8 infra-red sensors (trans. & receiv.) and can detect the position of robot i in a relative movement angle j. • For convenience sake we will discard the other detection possibilities (like detecting a wall) 236805 - Seminar in CS (Robotics)
The Model of the Robots (cont.) • Let us assume is a distinct output of the sensors and each is being a vector of the sensors output • A state space is a partition Q of • For each state qQ we prepare an action table Sq whose kth element Sq(k) is the probability that a robot in state q will move in direction k ( ) 236805 - Seminar in CS (Robotics)
The Model of the Robots (cont.) • Each robot decides to move according to its sensors output meaning it moves in direction k under the distribution Sq for each qQ • The task of the robots is to autonomously build a partition Q and an action table Sq for eachqQ • In the future ak notes: the action of moving in direction k 236805 - Seminar in CS (Robotics)
An examplary view of the robots and how their positioning is marked: Direction of motion Robot B S(6,1) 1 0 2 0 1 RobotA S(1,6) 70 3 7 2 6 4 6 3 5 5 4 The Model of the Robots (cont.) 236805 - Seminar in CS (Robotics)
Collision Avoidance for Two Robots Construction of Action Tables by Learning 236805 - Seminar in CS (Robotics)
Action Tables • We start with a case of two robots • The state =(i,j) denotes sensor i is facing sensor j of the other robot |2|=64 • Q2={{}|2} is a partition of 2 • pijk is the value of the kth element of action table S(i,j) - the probability that a robot will take action ak when the sensor’s output is (i,j) 236805 - Seminar in CS (Robotics)
Action Tables (cont.) • To create an un-biased system we assign k ; pijk=1/8 • Evaluation of the influence of ak is done by: reward=( (ft-ft+1)+(1-)(dt+1-dt)) • ft: distance between the robot and the target at time t • dt: distance between the two robots at time t 236805 - Seminar in CS (Robotics)
Action Tables (cont.) • 01 and >0 will be determined by the robots • The reward shows the need to get as close as possible to the target without getting near another robot • A robot that takes action ak from state (i,j) updates the action table S(i,j) by: • pijk=max{pijk+reward, 0} while holds 236805 - Seminar in CS (Robotics)
Action Tables (cont.) • Simulation: =0.5, =0.05, d0=1.0 and a state (i,j) • Move them one step and update the action table • Repeat this 64k times & update all 64 states about 1k times • The vectors of S(i,j) converges to pijk=1.0 for a single k for most states (i,j) 236805 - Seminar in CS (Robotics)
Action Tables (cont.) • The following table shows k for all i & j • The actions in parentheses show the highest probability where convergence to a single number didn’t occur 236805 - Seminar in CS (Robotics)
Action Tables (cont.) • Let us test the algorithm’s performance (in a simulation): • Each robot is a 1.0 radius disc • A sensor can feel a distance of 2.0 • A robot can move in steps of 0.5 • The initial distance between the robots is 2.0 • The target for each robot is at a distance of 10.0 in direction 0 236805 - Seminar in CS (Robotics)
Action Tables (cont.) • CASE (i,j) states an experiment in which the initial state of one of the robots is (i,j) • Each robot moves according to the action selection table unless he can move directly towards its target 236805 - Seminar in CS (Robotics)
Action Tables (cont.) • Results of the simulation show success in all 64 cases • Below are the more difficult cases: • CASE(0,0), CASE(1,6), CASE (1,7), CASE (2,7) 236805 - Seminar in CS (Robotics)
Action Tables (cont.) • For comparison we will simulate a heuristic algorithm in which the robot chooses the first free direction (0,1, ... ,7) • There is no difference in performance between the two 236805 - Seminar in CS (Robotics)
Tuning and • which is used to update the probability table implies the robots collision avoidance policy: • A greater - move forward in direction 0 (less avoidance) • A smaller - stronger avoidance 236805 - Seminar in CS (Robotics)
Tuning and (cont.) • implies the “strength” of the last experience: • A larger - stronger consideration to the last experience • A smaller - a slower learning process • An ideal learning process should be without human assistance 236805 - Seminar in CS (Robotics)
Tuning and (cont.) • We use the following tuning process (assuming the robots reach their target within 30 steps without a collision) starting with =1.0 (and a fixed value of ): • With the current value build the 64 action tables S(i,j) from the previous chapter in 30k updates for random states (i,j) • Evaluate the algorithm for CASE(0,0) to CASE(7,7) while changing until the robots readh their target in 30 steps or less 236805 - Seminar in CS (Robotics)
Tuning and (cont.) • The rules for changing : • If a collision occurs in one of the 64 possibilities decrease by • If no collision occurs in all 64 possibilities but the robots can’t reach their target in 30 steps - increase by • begins as 0.1 and is halved every time uses its last value 236805 - Seminar in CS (Robotics)
Tuning and (cont.) • Figure 3 shows the results of this experiment • is eventually stabilized on 0.4 236805 - Seminar in CS (Robotics)
Tuning and (cont.) • Assumption: • The robots “want” to create the set of 64 action tables S(i,j) within 20k to 30k updates • We start with =1.0 (and a fixed value of ) • If more than 30k updates occur, =/2 and if less than 20k updates occur =2 236805 - Seminar in CS (Robotics)
Automatic state space creation • Reminder: • Q2={{}|2} is a state space • S(i,j) and the action tables (slide 19) are built • can be created by merging two adjacent states with the same action | |=24 • The algorithm based on has the same performance as the original • can be built automatically at the end of the learning process of the action tables 236805 - Seminar in CS (Robotics)
Collision avoidance for three robots • A similar approach can be used for a more complex environment based on simpler environment results • We will compare the method from the previous chapters to a simpler learning method 236805 - Seminar in CS (Robotics)
Direct learning algorithm • For three robots the robots sensor’s output is ((i1,j1),(i2,j2)) • (ik,jk) where k=1,2 means sensor ik is facing sonsor ik and (i2,j2) is undefined if only one robot is visible • Assume Q3 is a partition of 3 and that Q3={{}|3} and build (i1,j1,i2,j2) instead ((i1,j1),(i2,j2)) • We will concentrate on cases with two robots in sight since in the case of one we can adopt the previous action tables 236805 - Seminar in CS (Robotics)
Direct learning algorithm (cont.) • We chose a state ((i1,j1),(i2,j2)) & update S(i1,j1,i2,j2) after a single step with the previously described reward • Repeated the process 1,792k times (each table is updated ~1k times) • =0.5, =0.05 • From the results we can deduce an action selection table similar to the one we saw 236805 - Seminar in CS (Robotics)
Direct learning algorithm (cont.) • CASE (0,0,1,0), CASE (0,1,1,6), CASE (1,0,7,0) • The second figure compares the first (heuristic) algorithm with the learning-based one we just saw • We clearly see that the latter outperforms when the first can’t handle some of the cases well 236805 - Seminar in CS (Robotics)
Reduced state learning for 3 robots • We adopt the from our previous discussion and turn it into as a state space for three robots • We get 300 states instead of 1792 states (including a single robot vision) • Repeat the learning process as discussed in the last three slides with reduced state space 236805 - Seminar in CS (Robotics)
Reduced state learning for 3 robots (cont.) • The table shows the actions that their probability converged to some value • Parentheses show the action with the highest probability but no convergence took place 236805 - Seminar in CS (Robotics)
Reduced state learning for 3 robots (cont.) • As we can see the reduced state space has an advantage • We believe the advantage is the outcome of the need for a single update which can be equivalent to many updates 236805 - Seminar in CS (Robotics)
Experiment with Physical Robots • We installed the obtained algorithms on the omni-robots • Two robots: • 10 experiments for each (CASE(0,0), CASE(1,7), CASE(1,6)) • 8, 6 & 7 avoidances, respectavly • Three robots: • Without showing the results, the robots avoided collisions 5 out of 10 experiments but for some cases the algorithm didn’t perform well when compared to the reduced state (time-wise) 236805 - Seminar in CS (Robotics)
Experiment with Physical Robots (cont.) • We attribute the differences to the following: • Non discrete movement of the robots • Non syncronized movement of the robots 236805 - Seminar in CS (Robotics)
Conclusions • The robots built (without any intervention) a collision avoidance algorithm • We demonstrated how good algorithms can be used for a more complex enviorment containing more than three robots • Most of the time the resulting algorithm gives good results • We didn’t discuss the memory demands 236805 - Seminar in CS (Robotics)
Future Study • A more complex state space • Copying methods from a simple to a complex enviroment • Improve the simulation model 236805 - Seminar in CS (Robotics)