180 likes | 315 Views
Affective Facial Expressions Facilitate Robot Learning. Joost Broekens Pascal Haazebroek LIACS, Leiden University, The Netherlands. Why?. Interactive robot learning Facilitate human-robot interaction Study learner-teacher relations Study learning and adaptation Future:
E N D
Affective Facial Expressions Facilitate Robot Learning Joost Broekens Pascal Haazebroek LIACS, Leiden University, The Netherlands
Why? • Interactive robot learning • Facilitate human-robot interaction • Study learner-teacher relations • Study learning and adaptation • Future: • enable robots to interactively cooperate in an efficient manner with humans in ways that are natural to humans. • This talk: • human affective facial expressions as training signal to robot
Outline • Emotion influences thought and behavior • EARL: • Studying relation emotion and adaptation in reinforcement learning • This talk: Human affect as reinforcement to robot • Reinforcement-based robot learning • Experiments: • Affect as additional reinforcement • Affect as input to social reward function • Results and conclusions • Learning is positively influence, especially in the learned social reward function case
Emotion, Thought and Behavior • Emotion • Bodily expression (face, posture) • Action tendencies (Frijda) • Feelings • Cognitive appraisal (Arnolds, Lazarus, Scherer) • Affect • “Everything to do with emotion”, as in affective computing, or • abstraction over emotion (e.g., Russell) composed of • Arousal (alertness) • Valence (pleasure) • We use the latter definition of affect in the experiment • Short timescale • Ignore arousal
Emotion, Thought and Behavior • Emotion and affect influence thought and behavior: • The kind of thoughts we have • Mood congruency • The way we process information • Narrow vs. broad look (Goschke & Dreisbach) • A lot vs. a little processing effort (Scherer, Forgas) • What we think about things • Emotion/mood as information (Clore & Gasper) • Emotion as belief anchor (Frijda & Mesquita) • How we learn and adapt • Emotion/affect as social reinforcement • Emotion/affect as intrinsic reinforcement • Emotion as “metaparameter” to control learning process • Empathy
EARL • To study relations between emotion and adaptation in context of reinforcement learning. • Simulated robot (but see later comments) • Maze navigation tasks • Webcam and emotion recognition to interpret emotions • Reinforcement learning (RL) approach to robot learning • Robot has own model of emotion • Robot head to express emotion • Potential influences experimented with • Evaluate models of emotion in RL setting • Evaluate models of emotional expression. • Test influence of emotion/affect on RL learning parameters • Experiment with communicated and robot emotion as reward
EARL • Short movie
Human Affect as Reinforcement to Robot • Interactive robot learning • Learning by Example • E.g., imitation learning (see Breazeal & Scassellati) • Learning by Guidance (Thomaz & Breazeal) • Future directed learning cues • Anticipatory reward • Learning by Feedback • Additional reinforcement signal (Breazeal & Velasquez; Isbell et al; Mitsunaga et al; Papudesi & Hubert) • In our experiment: affective signal as additional reinforcement
Human Affect as Reinforcement to Robot • Affective signal as additional reinforcement • Web cam • Emotional expression analysis • Positive emotion (happy) = reward • Negative emotion (sad) = punishment • So: emotional expression is used in learning as rhuman, a social reward coming from the human observer • Note: • We interpret happy as positively valenced and sad as negatively valenced. • THIS IS A SIMPLIFIED SETUP THAT ENABLES US TO TEST OUR HYPOTHESIS!
Reinforcement-based robot learning • Continuous Gridworld • World features placed on grid • Agent has Real coordinates and speed, unlimited locations • Local perception, agent-based perspective, = current state s • Task • find food (as usual) • Training: Multilayer perceptron networks (MLP) • Input is agent’s perceived state, s. • Each action (fwd, left, right) has two networks • First to train action-value Qa(s) • Second to train inverse action-value (value of NOT doing the action) • Value function has network trained to predict Q(s) • Action-selection uses action values as predicted by MLPs • In terms of representing the world, the perceived state and the actions, this setup is close to real-world robotics.
2 path wall food b 1/e Reinforcement-based robot learning
Experiments:Affect as additional reinforcement • Test difference between standard agent and social agents • 200 trials to learn path to food • Standard agent uses R(s) from environment to update Qa(s) and Q(s). • Social agent uses rhuman in addition to R(s)
Experiments:Affect as additional reinforcement • Three social settings • Moderate social reinforcement (setting a) • rhuman is small • Long period of training with rhuman (trials 20-30) • Strong social reinforcement (setting b) • rhuman is large • Short period of training with rhuman (trials 20-25) • Learned social reinforcement (setting c) • rhuman is used as above and to train Rsocial(s) (an MLP). • Period using rhuman is between 29 and 45. • After that, Rsocial(s) is used.
Results • Moderate social reinforcement
Results • Strong, short, social reinforcement
Results • Learned social reinforcement
Conclusion • A critical learning period can be used to influence robot learning using affective signals, in real-time, in a non-trivial learning environment. • This has a benefit on learning • Most specifically when the robot learns to predict the social feedback by training a reward function Rsocial(s)
Further work • Use affect/emotion as metaparameter to control • Learning rate • Exploration exploitation • Differentiate between meanings of negative and positive emotions • Anger: negative feedback due to action of agent • Fear: negative anticipatory feedback • Surprise: strong positive feedback due to action of agent • Frustration: connect to exploration/exploitation rate? • Affective Robot-Robot interaction? • Use robot to human signals such as hesitation