170 likes | 181 Views
In this presentation, Pattie Maes and Rodney A. Brooks introduce a behavior-based system that learns to coordinate behaviors using positive and negative feedback. The distributed algorithm allows each behavior to learn when to become active, maximizing positive feedback and minimizing negative feedback.
E N D
Learning to Coordinate Behaviors Pattie Maes & Rodney A. Brooks Presented by: Javier Martinez
Introduction • Behavior-based system • Learning using positive and negative feedback • Behaviors decide when is time to activate • Distributed algorithm • Test the concept in a robot
Motivation • Behavior control is a weak point initial Behavior-based systems • Behavior control has to be prewired • This approach doesn’t scale too well
New Ideas • Behavior control is learned through experience • Learning algorithm completely distributed • Each behavior learns when to become active • The solution maximizes positive feedback and minimizes negative feedback
The Learning Task What is needed: • Vector of binary perceptual conditions • Set of behaviors • Positive feedback generator • Negative feedback generator
The Learning Task The task: • Change the precondition list from each behavior to maximize relevance and reliability
The Learning Task Constraints: • Relevance: behavior correlated to positive feedback, not correlated with negative feedback • Reliability: behavior receives consistent feedback
The Learning Task More constraints: • Algorithm should deal with noise, • Perform in real time, • Support readaptation
The Learning Task Assumptions: • At least one combination of preconditions is bounded • Feedback is immediate • Only combinations of conditions can be learned
Algorithm Measure: • Number of times a positive/negative feedback did/didn’t happen when a behavior was/wasn’t active • Calculate the correlation between positive/negative feedback and the status of the behavior
Algorithm Measure: • Express relevance and reliability in terms of this correlation • Relevance controls whether a behavior should be active or not • Reliability decides whether the behavior should try to improve itself
Algorithm Measure: • Improvement is done by monitoring a perceptual condition • If reliability increases, the behavior is added to the list of preconditions • Keep monitoring in a circle until reaching the threshold
Genghis • Six-legged robot that walks forward • 12 behaviors, 6 conditions, 8742 nodes • 4 eight-bit microprocessors, 32 KB memory • The challenge is to learn how to coordinate the legs to produce a forward movement
Results Convergence time • Non-intelligent search during the monitoring stage: 10 minutes • Intelligent search: 1min 45sec • A “tripod” gait emerged which is common among six-legged insects
Conclusions • A learning algorithm was developed which allows a behavior-based robot to learn when its behaviors should become active using positive and negative feedback
Comments • Impressive results • Global behavior (walking) emerges from coordinated Behaviors • Simple idea, powerful consequences. Robot learned how to walk, wasn’t taught
Comments • Dead behaviors don’t revive. They might be useful in other situations • How to deal with concurrent actions? (i.e. walking and following a target)