220 likes | 440 Views
Emotion-Driven Reinforcement Learning. Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08. Introduction. Interested in the functional benefits of emotion for a cognitive agent Appraisal theories of emotion PEACTIDM theory of cognitive control
E N D
Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08
Introduction • Interested in the functional benefits of emotion for a cognitive agent • Appraisal theories of emotion • PEACTIDM theory of cognitive control • Use emotion as a reward signal to a reinforcement learning agent • Demonstrates a functional benefit of emotion • Provides a theory of the origin of intrinsic reward
Outline • Background • Integration of emotion and cognition • Integration of emotion and reinforcement learning • Implementation in Soar • Learning task • Results
A situation is evaluated along a number of appraisal dimensions, many of which relate the situation to current goals • Novelty, goal relevance, goal conduciveness, expectedness, causal agency, etc. • Appraisals influence emotion • Emotion can then be coped with (via internal or external actions) Appraisal Theories of Emotion Situation Goals Coping Appraisals Emotion
Unification of PEACTIDM and Appraisal Theories Perceive Environmental Change Raw Perceptual Information Motor Encode Suddenness Unpredictability Goal Relevance Intrinsic Pleasantness Stimulus Relevance Motor Commands Prediction Outcome Probability Decode Attend Causal Agent/Motive Discrepancy Conduciveness Control/Power Action Stimulus chosen for processing Intend Comprehend Current Situation Assessment
Emotion: Result of appraisals • Is about the current situation • Mood: “Average” over recent emotions • Provides historical context • Feeling: Emotion “+” Mood • What agent actually perceives Distinction between emotion, mood, and feeling(Marinier & Laird 2007)
Reward = Intensity * Valence Intrinsically Motivated Reinforcement Learning(Sutton & Barto 1998; Singh et al. 2004) External Environment Environment Actions Sensations Critic “Organism” Internal Environment Actions Rewards States Appraisal Process Critic Agent +/- Feeling Intensity Decisions Rewards States Agent
Extending Soar with Emotion(Marinier & Laird 2007) Episodic Semantic Symbolic Long-Term Memories Procedural Semantic Learning Episodic Learning Chunking Reinforcement Learning Appraisal Detector Short-Term Memory Situation, Goals Decision Procedure Visual Imagery Perception Action Body
Extending Soar with Emotion(Marinier & Laird 2007) Episodic Semantic Symbolic Long-Term Memories Procedural Semantic Learning Episodic Learning Chunking Reinforcement Learning +/-Intensity Appraisal Detector Feeling .9,.6,.5,-.1,.8,… Short-Term Memory Situation, Goals Feelings Decision Procedure Feelings Appraisals Visual Imagery Emotion .5,.7,0,-.4,.3,… Mood .7,-.2,.8,.3,.6,… Perception Action Knowledge Body Architecture
Learning task Start Goal
Learning task: Encoding North Passable: false On path: false Progress: true East Passable: false On path: true Progress: true West Passable: false On path: false Progress: true South Passable: true On path: true Progress: true
Learning task: Encoding & Appraisal North Intrinsic Pleasantness: Low Goal Relevance: Low Unpredictability: High East Intrinsic Pleasantness: Low Goal Relevance: High Unpredictability: High West Intrinsic Pleasantness: Low Goal Relevance: Low Unpredictability: High South Intrinsic Pleasantness: Neutral Goal Relevance: High Unpredictability: Low
Learning task: Attending, Comprehending & Appraisal South Intrinsic Pleasantness: Neutral Goal Relevance: High Unpredictability: Low Conduciveness: High Control: High …
Learning task: Tasking Optimal Subtasks
What is being learned? • When to Attend vs Task • If Attending, what to Attend to • If Tasking, which subtask to create • When to Intend vs. Ignore
Discussion • Agent learns both internal (tasking) and external (movement) actions • Emotion allows for more frequent rewards, and thus learns faster than standard RL • Mood “fills in the gaps” allowing for even faster learning and less variability
Conclusion & Future Work • Demonstrated computational model that integrates emotion and cognitive control • Confirmed emotion can drive reinforcement learning • We have already successfully demonstrated similar learning in a more complex domain • Would like to explore multi-agent scenarios