1 / 41

Yukiyoshi Fujita Ichiro Suzuki Satoshi Fujita Hajime Asama Masafumi Yamashita

Yukiyoshi Fujita Ichiro Suzuki Satoshi Fujita Hajime Asama Masafumi Yamashita. Learning-Based Automatic Generation of Collision Avoidance Algorithms for Multiple Autonomous Mobile Robots. Abstract. This is a discussion about an automatic generation of a collision avoidance algorithm:

vevay
Download Presentation

Yukiyoshi Fujita Ichiro Suzuki Satoshi Fujita Hajime Asama Masafumi Yamashita

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Yukiyoshi Fujita Ichiro Suzuki Satoshi Fujita Hajime Asama Masafumi Yamashita Learning-Based Automatic Generation of Collision Avoidance Algorithms for Multiple Autonomous Mobile Robots

  2. Abstract • This is a discussion about an automatic generation of a collision avoidance algorithm: • Effective algorithm for two robots that simulates human trial and error • Usage of a reward function that is also learned by the robots • Sole usage of the sensor's output 236805 - Seminar in CS (Robotics)

  3. Abstract (cont.) • How a robot can use its gained “experience” for a more complex environment • Usage of reduced state space • Usage of Omni-directional robots • Comparison of theoretical results to the actual results 236805 - Seminar in CS (Robotics)

  4. Introduction • An autonomous multi-robot system is on in which: • No fixed “leader” - each robot is self driven only by it’s own design & data • Each robot adjusts itself independently. • This is an advantage when it comes to failures, scalability, communication overhead etc. • On the other hand the design of algorithms is more difficult 236805 - Seminar in CS (Robotics)

  5. Introduction (cont.) • The discussed robots have eight sensors • Each sensor can detect: • A nearby object (robot or wall) • Direction of the object’s motion (out of eight) • Speed of the object (out of three) • The above results a state space of the sensors outputs that consist (8*3+2)8 states 236805 - Seminar in CS (Robotics)

  6. Introduction (cont.) • This motivates a combined research of: • Collision avoidance algorithm • Automatic reduction of states (by automatic state merging) 236805 - Seminar in CS (Robotics)

  7. Introduction (cont.) • A robot in an unknown environment repeatedly evaluates its performance • The more successful actions (from the past) are more likely to be chosen • We will investigate how the above robots autonomously organize the state space & generate a collision avoidance algorithm based on reduced state space 236805 - Seminar in CS (Robotics)

  8. Introduction (cont.) • We will examine a simulated naive human trial & error learning algorithm and see it presents relatively good results • All algorithm parameters are adjusted without any external intervention • A discussion about how robots can use their experience for a more complicated environment (three robots) will be held 236805 - Seminar in CS (Robotics)

  9. Introduction (cont.) • In addition to the theoretical discussions and experiments we will hold physical experiments as well • Results will show very high probability of collision avoidance - especially for two robots • The algorithm works reasonably well for the case of three robots 236805 - Seminar in CS (Robotics)

  10. The Model of the Robots • The discussed Omni-directional robots have 8 infra-red sensors (trans. & receiv.) and can detect the position of robot i in a relative movement angle j. • For convenience sake we will discard the other detection possibilities (like detecting a wall) 236805 - Seminar in CS (Robotics)

  11. The Model of the Robots (cont.) • Let us assume  is a distinct output of the sensors and each is being a vector of the sensors output • A state space is a partition Q of  • For each state qQ we prepare an action table Sq whose kth element Sq(k) is the probability that a robot in state q will move in direction k ( ) 236805 - Seminar in CS (Robotics)

  12. The Model of the Robots (cont.) • Each robot decides to move according to its sensors output  meaning it moves in direction k under the distribution Sq for each qQ • The task of the robots is to autonomously build a partition Q and an action table Sq for eachqQ • In the future ak notes: the action of moving in direction k 236805 - Seminar in CS (Robotics)

  13. An examplary view of the robots and how their positioning is marked: Direction of motion Robot B S(6,1) 1 0 2 0 1 RobotA S(1,6) 70 3 7 2 6 4 6 3 5 5 4 The Model of the Robots (cont.) 236805 - Seminar in CS (Robotics)

  14. Collision Avoidance for Two Robots Construction of Action Tables by Learning 236805 - Seminar in CS (Robotics)

  15. Action Tables • We start with a case of two robots • The state =(i,j) denotes sensor i is facing sensor j of the other robot  |2|=64 • Q2={{}|2} is a partition of 2 • pijk is the value of the kth element of action table S(i,j) - the probability that a robot will take action ak when the sensor’s output is (i,j) 236805 - Seminar in CS (Robotics)

  16. Action Tables (cont.) • To create an un-biased system we assign k ; pijk=1/8 • Evaluation of the influence of ak is done by: reward=( (ft-ft+1)+(1-)(dt+1-dt)) • ft: distance between the robot and the target at time t • dt: distance between the two robots at time t 236805 - Seminar in CS (Robotics)

  17. Action Tables (cont.) • 01 and >0 will be determined by the robots • The reward shows the need to get as close as possible to the target without getting near another robot • A robot that takes action ak from state (i,j) updates the action table S(i,j) by: • pijk=max{pijk+reward, 0} while holds 236805 - Seminar in CS (Robotics)

  18. Action Tables (cont.) • Simulation: =0.5, =0.05, d0=1.0 and a state (i,j) • Move them one step and update the action table • Repeat this 64k times & update all 64 states about 1k times • The vectors of S(i,j) converges to pijk=1.0 for a single k for most states (i,j) 236805 - Seminar in CS (Robotics)

  19. Action Tables (cont.) • The following table shows k for all i & j • The actions in parentheses show the highest probability where convergence to a single number didn’t occur 236805 - Seminar in CS (Robotics)

  20. Action Tables (cont.) • Let us test the algorithm’s performance (in a simulation): • Each robot is a 1.0 radius disc • A sensor can feel a distance of 2.0 • A robot can move in steps of 0.5 • The initial distance between the robots is 2.0 • The target for each robot is at a distance of 10.0 in direction 0 236805 - Seminar in CS (Robotics)

  21. Action Tables (cont.) • CASE (i,j) states an experiment in which the initial state of one of the robots is (i,j) • Each robot moves according to the action selection table unless he can move directly towards its target 236805 - Seminar in CS (Robotics)

  22. Action Tables (cont.) • Results of the simulation show success in all 64 cases • Below are the more difficult cases: • CASE(0,0), CASE(1,6), CASE (1,7), CASE (2,7) 236805 - Seminar in CS (Robotics)

  23. Action Tables (cont.) • For comparison we will simulate a heuristic algorithm in which the robot chooses the first free direction (0,1, ... ,7) • There is no difference in performance between the two 236805 - Seminar in CS (Robotics)

  24. Tuning  and  •  which is used to update the probability table implies the robots collision avoidance policy: • A greater  - move forward in direction 0 (less avoidance) • A smaller  - stronger avoidance 236805 - Seminar in CS (Robotics)

  25. Tuning  and  (cont.) •  implies the “strength” of the last experience: • A larger  - stronger consideration to the last experience • A smaller  - a slower learning process • An ideal learning process should be without human assistance 236805 - Seminar in CS (Robotics)

  26. Tuning  and  (cont.) • We use the following  tuning process (assuming the robots reach their target within 30 steps without a collision) starting with =1.0 (and a fixed value of ): • With the current  value build the 64 action tables S(i,j) from the previous chapter in 30k updates for random states (i,j) • Evaluate the algorithm for CASE(0,0) to CASE(7,7) while changing  until the robots readh their target in 30 steps or less 236805 - Seminar in CS (Robotics)

  27. Tuning  and  (cont.) • The rules for changing : • If a collision occurs in one of the 64 possibilities decrease  by  • If no collision occurs in all 64 possibilities but the robots can’t reach their target in 30 steps - increase  by  •  begins as 0.1 and is halved every time  uses its last value 236805 - Seminar in CS (Robotics)

  28. Tuning  and  (cont.) • Figure 3 shows the results of this experiment •  is eventually stabilized on 0.4 236805 - Seminar in CS (Robotics)

  29. Tuning  and  (cont.) • Assumption: • The robots “want” to create the set of 64 action tables S(i,j) within 20k to 30k updates • We start with =1.0 (and a fixed value of ) • If more than 30k updates occur, =/2 and if less than 20k updates occur =2 236805 - Seminar in CS (Robotics)

  30. Automatic state space creation • Reminder: • Q2={{}|2} is a state space • S(i,j) and the action tables (slide 19) are built • can be created by merging two adjacent states with the same action  | |=24 • The algorithm based on has the same performance as the original • can be built automatically at the end of the learning process of the action tables 236805 - Seminar in CS (Robotics)

  31. Collision avoidance for three robots • A similar approach can be used for a more complex environment based on simpler environment results • We will compare the method from the previous chapters to a simpler learning method 236805 - Seminar in CS (Robotics)

  32. Direct learning algorithm • For three robots the robots sensor’s output  is ((i1,j1),(i2,j2)) • (ik,jk) where k=1,2 means sensor ik is facing sonsor ik and (i2,j2) is undefined if only one robot is visible • Assume Q3 is a partition of 3 and that Q3={{}|3} and build (i1,j1,i2,j2) instead ((i1,j1),(i2,j2)) • We will concentrate on cases with two robots in sight since in the case of one we can adopt the previous action tables 236805 - Seminar in CS (Robotics)

  33. Direct learning algorithm (cont.) • We chose a state ((i1,j1),(i2,j2)) & update S(i1,j1,i2,j2) after a single step with the previously described reward • Repeated the process 1,792k times (each table is updated ~1k times) • =0.5, =0.05 • From the results we can deduce an action selection table similar to the one we saw 236805 - Seminar in CS (Robotics)

  34. Direct learning algorithm (cont.) • CASE (0,0,1,0), CASE (0,1,1,6), CASE (1,0,7,0) • The second figure compares the first (heuristic) algorithm with the learning-based one we just saw • We clearly see that the latter outperforms when the first can’t handle some of the cases well 236805 - Seminar in CS (Robotics)

  35. Reduced state learning for 3 robots • We adopt the from our previous discussion and turn it into as a state space for three robots • We get 300 states instead of 1792 states (including a single robot vision) • Repeat the learning process as discussed in the last three slides with reduced state space 236805 - Seminar in CS (Robotics)

  36. Reduced state learning for 3 robots (cont.) • The table shows the actions that their probability converged to some value • Parentheses show the action with the highest probability but no convergence took place 236805 - Seminar in CS (Robotics)

  37. Reduced state learning for 3 robots (cont.) • As we can see the reduced state space has an advantage • We believe the advantage is the outcome of the need for a single update which can be equivalent to many updates 236805 - Seminar in CS (Robotics)

  38. Experiment with Physical Robots • We installed the obtained algorithms on the omni-robots • Two robots: • 10 experiments for each (CASE(0,0), CASE(1,7), CASE(1,6)) • 8, 6 & 7 avoidances, respectavly • Three robots: • Without showing the results, the robots avoided collisions 5 out of 10 experiments but for some cases the algorithm didn’t perform well when compared to the reduced state (time-wise) 236805 - Seminar in CS (Robotics)

  39. Experiment with Physical Robots (cont.) • We attribute the differences to the following: • Non discrete movement of the robots • Non syncronized movement of the robots 236805 - Seminar in CS (Robotics)

  40. Conclusions • The robots built (without any intervention) a collision avoidance algorithm • We demonstrated how good algorithms can be used for a more complex enviorment containing more than three robots • Most of the time the resulting algorithm gives good results • We didn’t discuss the memory demands 236805 - Seminar in CS (Robotics)

  41. Future Study • A more complex state space • Copying methods from a simple to a complex enviroment • Improve the simulation model 236805 - Seminar in CS (Robotics)

More Related