Learning to Coordinate Behaviors

Learning to Coordinate Behaviors Pattie Maes & Rodney A. Brooks Presented by: Javier Martinez

Introduction • Behavior-based system • Learning using positive and negative feedback • Behaviors decide when is time to activate • Distributed algorithm • Test the concept in a robot

Motivation • Behavior control is a weak point initial Behavior-based systems • Behavior control has to be prewired • This approach doesn’t scale too well

New Ideas • Behavior control is learned through experience • Learning algorithm completely distributed • Each behavior learns when to become active • The solution maximizes positive feedback and minimizes negative feedback

The Learning Task What is needed: • Vector of binary perceptual conditions • Set of behaviors • Positive feedback generator • Negative feedback generator

The Learning Task The task: • Change the precondition list from each behavior to maximize relevance and reliability

The Learning Task Constraints: • Relevance: behavior correlated to positive feedback, not correlated with negative feedback • Reliability: behavior receives consistent feedback

The Learning Task More constraints: • Algorithm should deal with noise, • Perform in real time, • Support readaptation

The Learning Task Assumptions: • At least one combination of preconditions is bounded • Feedback is immediate • Only combinations of conditions can be learned

Algorithm Measure: • Number of times a positive/negative feedback did/didn’t happen when a behavior was/wasn’t active • Calculate the correlation between positive/negative feedback and the status of the behavior

Algorithm Measure: • Express relevance and reliability in terms of this correlation • Relevance controls whether a behavior should be active or not • Reliability decides whether the behavior should try to improve itself

Algorithm Measure: • Improvement is done by monitoring a perceptual condition • If reliability increases, the behavior is added to the list of preconditions • Keep monitoring in a circle until reaching the threshold

Genghis • Six-legged robot that walks forward • 12 behaviors, 6 conditions, 8742 nodes • 4 eight-bit microprocessors, 32 KB memory • The challenge is to learn how to coordinate the legs to produce a forward movement

Results Convergence time • Non-intelligent search during the monitoring stage: 10 minutes • Intelligent search: 1min 45sec • A “tripod” gait emerged which is common among six-legged insects

Conclusions • A learning algorithm was developed which allows a behavior-based robot to learn when its behaviors should become active using positive and negative feedback

Comments • Impressive results • Global behavior (walking) emerges from coordinated Behaviors • Simple idea, powerful consequences. Robot learned how to walk, wasn’t taught

Comments • Dead behaviors don’t revive. They might be useful in other situations • How to deal with concurrent actions? (i.e. walking and following a target)

Learning to Coordinate Behaviors