280 likes | 391 Views
Learning Momentum: Integration and Experimentation. Brian Lee and Ronald C. Arkin Mobile Robot Laboratory Georgia Tech Atlanta, GA. Motivation. It’s hard to manually derive controller parameters. The parameter space increases exponentially with the number of parameters.
E N D
Learning Momentum: Integration and Experimentation Brian Lee and Ronald C. Arkin Mobile Robot Laboratory Georgia Tech Atlanta, GA
Motivation • It’s hard to manually derive controller parameters. • The parameter space increases exponentially with the number of parameters. • You don’t always have a priori knowledge of the environment. • Without prior knowledge, a user can’t confidently derive appropriate parameter values, so it becomes necessary for the robot to adapt on its own to what it finds. • Obstacle densities and layout in the environment may be heterogeneous. • Parameters that work well for one type of environment may not work well with another type.
Adaptation and Learning Methods – DARPA MARS • Investigate robot shaping at five distinct levels in a hybrid robot software architecture • Implement algorithms within MissionLab mission specification system • Conduct experiments to evaluate performance of each technique • Combine techniques where possible • Integrate on a platform more suitable for realistic missions and continue development
Overview of techniques THE LEARNINGCONTINUUM: Deliberative (premission) . . . Behavioral switching . . . Reactive (online adaptation) • CBR Wizardry • Guide the operator • Probabilistic Planning • Manage complexity for the operator • RL for Behavioral Assemblage Selection • Learn what works for the robot • CBR for Behavior Transitions • Adapt to situations the robot can recognize • Learning Momentum • Vary robot parameters in real time . . .
Basic Concepts of LM • Provides adaptability to behavior-based systems • A crude form of reinforcement learning. • If the robot is doing well, keep doing what it’s doing, otherwise try something different. • Behavior parameters are changed in response to progress and obstacles. • The system is still fully reactive. • Although the robot changes its behavior, there is no deliberation.
Currently Used Behaviors • Move to Goal • Always returns a vector pointing toward the goal position. • Avoid Obstacles • Returns a sum of weighted vectors pointing away from obstacles. • Wander • Returns vectors pointing in random directions.
Adjustable Parameters • Move to goal vector gain • Avoid obstacle vector gain • Avoid obstacle sphere of influence • Radius around the robot inside of which obstacles are perceived • Wander vector gain • Wander persistence • The number of consecutive steps the wander vector points in the same direction
Four Predefined Situations • no movement • M < T movement • progress toward the goal • M > T movement • P > T progress • no progress with obstacles • M > T movement • P < T progress • O count > T obstacles • no progress without obstacles • M > T movement • P < T progress • O count < T obstacles • M = average movement • M goal = average movement to the goal • P = M goal /M • O count = obstacles encountered • T movement = movement threshold • T progress = progress threshold • T obstacles = obstacles threshold
Parameter adjustments Sample adjustment parameters for ballooning.
Two Possible Strategies • Ballooning - Sphere of influence is increased when obstacles impede progress. The robot moves around large objects. • Squeezing - Sphere of influence is decreased when obstacles impede progress. The robot moves between closely spaced objects.
Integration Base System Sensors Controller Position and Goal Information Move To Goal(Gm) Obstacle Information ∑ Avoid Obstacles(Go,S) Output direction Wander(Gw,P) • Gm = goal gain • Go = obstacle gain • S = obstacle sphere of influence • Gw = wander gain • P = wander persistence
Integration Integrated System Sensors Controller Position and Goal Information Move To Goal(Gm) Obstacle Information ∑ Avoid Obstacles(Go,S) Output direction Wander(Gw,P) • Gm = goal gain • Go = obstacle gain • S = obstacle sphere of influence New Gm, Go, S, Gw, and P parameters. • Gw = wander gain LM Module • P = wander persistence
Experiments in Simulation • 150m x 150m area • robot moves from (10m, 10m) to (140m, 90m) • Obstacle densities of 15% and 20% were used. • Obstacle radii varied between 0.38m and 1.43m.
Observations on Ballooning • Covers a lot of area • Not as easily trapped in box canyon situations • May settle in locally clear areas • May require a high wander gain to carry the robot through closely spaced obstacles
Observations on Squeezing • Results in a straighter path • Moves easily through closely spaced obstacles • May get trapped in small box canyon situations for large amounts of time
Simulations of the Real World End Place Start Place 24m x 10m Simulated setup of the real world environment.
Completion Rates For Simulation Uniform Obstacle Size (1m radii) Varying Obstacle Sizes (0.38m - 1.43m radii)
Average Steps to Completion Uniform Obstacle Size (1m radii) Varying Obstacle Sizes (0.38m - 1.43m radii)
Results FromSimulated Real Environment % Complete Steps to Completion • As before, there is an increase in completion rates with an accompanying increase in steps to completion.
Simulation Results • Completion rates can be drastically improved. • Completion rate improvements come at a cost of time. • Ballooning and squeezing strategies are geared toward different situations.
Physical Robot Experiments • Nomad 150 robot • Sonar ring for obstacle avoidance • Traverses the length of a 24m x 10m room while negotiating obstacles
Physical Experiment Results • Non-learning robots became stuck. • Learning robots successfully negotiated the obstacles. • Squeezing was faster than ballooning in this case. Average steps to goal.
Conclusions • Improved success has a price of time. • Performance of one strategy is very poor in situations better suited for another strategy. • The ballooning strategy is generally faster. • Ballooning robots can move through closely spaced objects faster than squeezing robots can move out of box canyon situations.
Conclusions (cont’d) • If some general knowledge of the terrain is know a priori, an appropriate strategy can be chosen. • If terrain is totally unknown, ballooning is probably the better choice. • A way to dynamically switch strategies should improve performance.