A Computational Unification of Cognitive Control, Emotion, and Learning

A Computational Unification of Cognitive Control, Emotion, and Learning Bob Marinier Oral Defense University of Michigan, CSE June 17, 2008

Introduction • The link between core cognitive functions and emotion has not been fully explored • Existing computational models are largely pragmatic • We integrate the PEACTIDM theory of cognitive control and appraisal theories of emotion • PEACTIDM supplies process, appraisal theories supply data • We use emotion-driven reinforcement learning to demonstrate improved functionality • Automatically generate rewards, set parameters

Cognitive Control: PEACTIDM

PEACTIDM Cycle Perceive Environmental Change Raw Perceptual Information Motor Encode What is this information? Stimulus Relevance Motor Commands Prediction Decode Attend Action Stimulus chosen for processing Intend Comprehend Current Situation Assessment

Appraisal Theories of Emotion • A situation is evaluated along a number of appraisal dimensions,many of which relate the situation to current goals • Novelty, goal relevance, goal conduciveness, expectedness, causal agency, etc. • Result of appraisals influences emotion • Emotion can then be coped with (via internal or external actions) Situation Goals Coping Appraisals Emotion

Appraisals to Emotions (Scherer 2001) • Why these dimensions? • What is the functional purpose of emotion?

Unification of PEACTIDM and Appraisal Theories Perceive Environmental Change Raw Perceptual Information Motor Encode Suddenness Unpredictability Goal Relevance Intrinsic Pleasantness Stimulus Relevance Motor Commands Prediction Outcome Probability Decode Attend Causal Agent/Motive Discrepancy Conduciveness Control/Power Action Stimulus chosen for processing Intend Comprehend Current Situation Assessment

Example: Simple Choice Response Task

PEACTIDM in the Button Task Appraisal Frame Suddenness 1 Goal Relevance 1 Conduciveness 1 Discrepancy 0 Outcome Probability 1 “Surprise Factor”

PEACTIDM in the Button Task Appraisal Frame Suddenness 1 Goal Relevance 1 Conduciveness Conduciveness -1 1 Discrepancy Discrepancy 0 1 Outcome Probability 1

Summary of Evaluation • Cognitively generated emotions • Emotions arise from appraisals • Fast primary emotions • Some appraisals generated and activated early • Emotional experience • Cognitive access to emotional state, but no physiology • Body-mind interactions • Emotions can influence behavior • Emotional behavior • Model works and produces useful, purposeful behavior • Different environments lead to: • Different time courses • Different feeling profiles • Choices impact emotions and success

Primary Contributions • Appraisals are functionally required by cognition • They specify the data used by certain steps in PEACTIDM • Appraisals provide a task-independent language for control knowledge • They influence choices such as Attend and Intend • PEACTIDM implies a partial ordering of appraisal generation • Data dependencies imply that some appraisals can’t be generated until after others • Circumplex models can be synthesized from appraisal models • Emotion intensity and valence can be derived from appraisals • Emotion intensity is largely determined by expectations • “Surprise Factor” is determined by Outcome Probability and Discrepancy from Expectation • Some appraisals may require an arbitrary amount of inference • Comprehend can theoretically require arbitrary processing • Internal and external stimuli are treated identically • Tasking options can be Attended and Intended just like external stimuli

Additional Exploration • Functionality: What is emotion good for? • Emotion-driven reinforcement learning • Scale: Does it work in non-trivial domains? • Continuous time/space environment • More complex appraisal generation • Understanding: How do appraisals influence performance? • Try subsets of appraisals

Intrinsically Motivated Reinforcement Learning(Sutton & Barto 1998; Singh et al. 2004) External Environment • Reward = Intensity * Valence Environment Actions Sensations Critic “Organism” Internal Environment Actions Rewards States Critic Appraisal Process Agent +/- Emotion Intensity Decisions Rewards States Agent

Clean House Domain Storage Room Blocks Gateways Rooms Agent

Stimuli in the Environment Gateway to 73 Gateway to 78 Current room Block 1 Gateway to 93 Create subtask clean current room Create subtask go to room 73 Create subtask go to room 78 Create subtask go to room 93

Learning • In this domain, the agent is only learning what to Attend to (including Tasking) • Not learning what action to take • Goal: What is the impact of various appraisals? • Disabled most and developed a few • Conduciveness • Discrepancy from Expectation and Outcome Probability • Goal Relevance • Intrinsic Pleasantness • Method: SARSA, epsilon-greedy, fixed ER and LR • 50 trials, 15 episodes per trial

Conduciveness • Measures how good or bad a stimulus is • Influences emotion intensity and valence • Sufficient to generate a reward • Value based on “progress” and “path” • Progress: Is agent getting closer to goal over time? • Path: Will acting on stimulus get agent closer to goal?

Conduciveness

Outcome Probability and Discrepancy from Expectation • Measures how likely a prediction is and how accurate the prediction is • Influences emotion intensity via “surprise factor” (unvalenced) • Predictions and Outcome Probability generated via learned task model • Results in non-stationary reward • Discrepancy generated via comparison to prediction • Added these appraisals on top of Conduciveness

Outcome Probability and Discrepancy from Expectation

Goal Relevance • Measures how important a stimulus is for the goal • Influences emotion intensity (unvalenced) • Value based on “path” knowledge • Agent actually had too much path knowledge, so removed some • The value of Goal Relevance for some stimulus is used to “boost” the Q-value of the Attend operator for that stimulus • Added this appraisal on top of Conduciveness, Outcome Probability, and Discrepancy

GR Knowledge Reduction Results

Intrinsic Pleasantness • Measures how attracted the agent is to a stimulus independent of the current goal • Influences emotion intensity and valence • Made blocks intrinsically pleasant • This is good because blocks need to be Attended to get cleaned up • This is bad because agent may be distracted by blocks that have already been cleaned up • Replaced Goal Relevance with this appraisal

Intrinsic Pleasantness Results

Dynamic Exploration Rate • Dynamically adjust exploration rate based on current emotion • If Valence < 0, then things could probably be betterER = |Intensity * Valence| • If Valence > 0, then things are okER = 0 • Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only

Dynamic Exploration Rate

Dynamic Learning Rate • Dynamically adjust learning rate based on current emotion • If reward magnitude is large, then there may be something to learnLR = |Intensity*Valence| • Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only, Dynamic Exploration Rate enabled

Dynamic Exploration and Learning Rates • Dynamically adjust exploration and learning rates based on current emotion • If Valence < 0, then things could probably be betterER = |Intensity * Valence| • If Valence > 0, then things are okER = 0 • If reward magnitude is large, then there may be something to learnLR = |Intensity*Valence| • Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only • Results: Tighter convergences, better prediction accuracy, small number of failures

Learning Summary

Secondary Contributions • Reinforcement learning can be driven by intrinsically generated rewards based on the agent’s feeling • Reinforcement learning parameters can be influenced by the current emotional state, resulting in improved performance • Each appraisal contributes to the agent’s performance • The system scales to continuous time and space environments • Mood averages reward over time, allowing states with no reward-invoking stimulus to still have a reward associated with them

Future Work • Integration with other architectural mechanisms • Learning (appraisal values, intend, etc.) Non-verbal communication • Sociocultural interactions • More appraisals (social, perceptual, etc.) Basic drives • Human data • Functionality • Decision making Action tendencies • Behavior • Believability • Physiological measures

Backup Slides

Benefits of Soar • Parallel rule firing allows for: • Parallel Encoding • Parallel appraisal generation • Parallel Decoding (theoretically) • Impasses provide: • Architectural support for PEACTIDM-related subgoals • Intend • Comprehend (theoretically) • Support for fast and extended inference, and transitioning from extended to fast (chunking) • Intend in button task starts out extended and becomes fast • Reinforcement learning allows fast learning from emotion feedback • Future benefits: • New modules may assist in appraisal generation • Episodic/semantic memories, visual imagery, etc.

Architectural Requirements:Soar vs. ACT-R

PEACTIDM and GOMS • In general, these are complementary techniques • GOMS • Focused on HCI • Focused on motor actions (e.g. keypresses) • Less focus on cognitive aspects (more abstract) • PEACTIDM • Focused on required cognitive functions • Allows for a mapping with appraisals • Could implement PEACTIDM with GOMS, but would lack the proper labels that allow for the mapping

Relating Emotion to Intrinsically Motivated RL • Emotion intensity and valence used to: • Generate intrinsic rewards • Various appraisals contribute to the reward signal with varying success • Frequent reward signals allow agent to learn faster, but can also introduce infinite reward cycles • Task modeling helps address cycles • Automatically adjust parameters • Learning and exploration rates • Helps reduce unnecessary exploration, bad learning

Button Task Timing:Before and After Learning

Learning the Task Model Perception/Encoding Stimulus 1 Stimulus 2 Stimulus 3 Task Memory Stim3 Stim2 Prediction (generic) Stim2 0.57 Outcome Probability 0.5 1.0 0.0 0.1 0.15 0.2 Discrepancy 0.0 0.5 Stim1 Surprise Factor 0.5 Medium Medium 0.43 Lower Lower 0.0 0.0 0.0 1.0 0.4 0.2 Intensity Stim3 Reward

Extending Soar with Emotion(Marinier & Laird 2007) • Soar is a cognitive architecture • A cognitive architecture is a set of task-independent mechanisms that interact to give rise to behavior • Cognitive architectures are general agent frameworks Episodic Semantic Symbolic Long-Term Memories Procedural Semantic Learning Episodic Learning Chunking Reinforcement Learning Feeling Generation Short-Term Memory Situation, Goals Decision Procedure Visual Imagery Perception Action Body

Extending Soar with Emotion(Marinier & Laird 2007) Episodic Semantic Symbolic Long-Term Memories Procedural Semantic Learning Episodic Learning Chunking Reinforcement Learning +/-Intensity Feeling Generation Feeling .9,.6,.5,-.1,.8,… Short-Term Memory Situation, Goals Feelings Decision Procedure Feelings Appraisals Visual Imagery Emotion .5,.7,0,-.4,.3,… Mood .7,-.2,.8,.3,.6,… Perception Action Knowledge Body Architecture

Appraisal Value Ranges

Computing Feeling from Emotion and Mood • Assumption: Appraisal dimensions are independent • Limited Range: Inputs and outputs are in [0,1] or [-1,1] • Distinguishability: Very different inputs should lead to very different outputs • Non-linear: Linearity would violate limited range and distinguishability

Example

Maze Tasks no distractions distractions single subgoal impossible multiple subgoals

Time Course and Impact of Feelings

Feeling Dynamics Results very easy

Computing Feeling Intensity • Motivation: Intensity gives a summary of how important (i.e., how good or bad) the situation is • Limited range: Should map onto [0,1] • No dominant appraisal: No single value should drown out all the others • Can’t just multiply values, because if any are 0, then intensity is 0 • Realization principle: Expected events should be less intense than unexpected events

Example

Learning task Start Goal Optimal Subtasks

A Computational Unification of Cognitive Control, Emotion, and Learning

A Computational Unification of Cognitive Control, Emotion, and Learning

Presentation Transcript

Emotion as a Layered Control System

Computational Cognitive Linguistics

Cognitive Neuroscience of Mindful emotion regulation

Computational Cognitive Modelling

Computational Cognitive Modelling

Computational Cognitive Neuroscience

Computational Cognitive Modelling

Computational Cognitive Modelling

Emotion: Cognitive and Biological influences

Computational Cognitive Neuroscience

Cognitive control of emotion

A computational unification of cognitive behavior and emotion

Computational Modeling of Cognitive Activity

Computational Aspects of Emotion in Adaptive Behavior

Computational Cognitive Modelling

Computational Cognitive Neuroscience Lab

Computational models of cognitive control (I)

Computational models of cognitive control (II)

Computational Modeling of Mood and Feeling from Emotion

Computational Models of Emotion and cognition

Learning objectives: Emotion

COMPUTATIONAL COGNITIVE SCIENCE