550 likes | 726 Views
A Computational Unification of Cognitive Control, Emotion, and Learning. Bob Marinier Oral Defense University of Michigan, CSE June 17, 2008. Introduction. The link between core cognitive functions and emotion has not been fully explored Existing computational models are largely pragmatic
E N D
A Computational Unification of Cognitive Control, Emotion, and Learning Bob Marinier Oral Defense University of Michigan, CSE June 17, 2008
Introduction • The link between core cognitive functions and emotion has not been fully explored • Existing computational models are largely pragmatic • We integrate the PEACTIDM theory of cognitive control and appraisal theories of emotion • PEACTIDM supplies process, appraisal theories supply data • We use emotion-driven reinforcement learning to demonstrate improved functionality • Automatically generate rewards, set parameters
PEACTIDM Cycle Perceive Environmental Change Raw Perceptual Information Motor Encode What is this information? Stimulus Relevance Motor Commands Prediction Decode Attend Action Stimulus chosen for processing Intend Comprehend Current Situation Assessment
Appraisal Theories of Emotion • A situation is evaluated along a number of appraisal dimensions,many of which relate the situation to current goals • Novelty, goal relevance, goal conduciveness, expectedness, causal agency, etc. • Result of appraisals influences emotion • Emotion can then be coped with (via internal or external actions) Situation Goals Coping Appraisals Emotion
Appraisals to Emotions (Scherer 2001) • Why these dimensions? • What is the functional purpose of emotion?
Unification of PEACTIDM and Appraisal Theories Perceive Environmental Change Raw Perceptual Information Motor Encode Suddenness Unpredictability Goal Relevance Intrinsic Pleasantness Stimulus Relevance Motor Commands Prediction Outcome Probability Decode Attend Causal Agent/Motive Discrepancy Conduciveness Control/Power Action Stimulus chosen for processing Intend Comprehend Current Situation Assessment
PEACTIDM in the Button Task Appraisal Frame Suddenness 1 Goal Relevance 1 Conduciveness 1 Discrepancy 0 Outcome Probability 1 “Surprise Factor”
PEACTIDM in the Button Task Appraisal Frame Suddenness 1 Goal Relevance 1 Conduciveness Conduciveness -1 1 Discrepancy Discrepancy 0 1 Outcome Probability 1
Summary of Evaluation • Cognitively generated emotions • Emotions arise from appraisals • Fast primary emotions • Some appraisals generated and activated early • Emotional experience • Cognitive access to emotional state, but no physiology • Body-mind interactions • Emotions can influence behavior • Emotional behavior • Model works and produces useful, purposeful behavior • Different environments lead to: • Different time courses • Different feeling profiles • Choices impact emotions and success
Primary Contributions • Appraisals are functionally required by cognition • They specify the data used by certain steps in PEACTIDM • Appraisals provide a task-independent language for control knowledge • They influence choices such as Attend and Intend • PEACTIDM implies a partial ordering of appraisal generation • Data dependencies imply that some appraisals can’t be generated until after others • Circumplex models can be synthesized from appraisal models • Emotion intensity and valence can be derived from appraisals • Emotion intensity is largely determined by expectations • “Surprise Factor” is determined by Outcome Probability and Discrepancy from Expectation • Some appraisals may require an arbitrary amount of inference • Comprehend can theoretically require arbitrary processing • Internal and external stimuli are treated identically • Tasking options can be Attended and Intended just like external stimuli
Additional Exploration • Functionality: What is emotion good for? • Emotion-driven reinforcement learning • Scale: Does it work in non-trivial domains? • Continuous time/space environment • More complex appraisal generation • Understanding: How do appraisals influence performance? • Try subsets of appraisals
Intrinsically Motivated Reinforcement Learning(Sutton & Barto 1998; Singh et al. 2004) External Environment • Reward = Intensity * Valence Environment Actions Sensations Critic “Organism” Internal Environment Actions Rewards States Critic Appraisal Process Agent +/- Emotion Intensity Decisions Rewards States Agent
Clean House Domain Storage Room Blocks Gateways Rooms Agent
Stimuli in the Environment Gateway to 73 Gateway to 78 Current room Block 1 Gateway to 93 Create subtask clean current room Create subtask go to room 73 Create subtask go to room 78 Create subtask go to room 93
Learning • In this domain, the agent is only learning what to Attend to (including Tasking) • Not learning what action to take • Goal: What is the impact of various appraisals? • Disabled most and developed a few • Conduciveness • Discrepancy from Expectation and Outcome Probability • Goal Relevance • Intrinsic Pleasantness • Method: SARSA, epsilon-greedy, fixed ER and LR • 50 trials, 15 episodes per trial
Conduciveness • Measures how good or bad a stimulus is • Influences emotion intensity and valence • Sufficient to generate a reward • Value based on “progress” and “path” • Progress: Is agent getting closer to goal over time? • Path: Will acting on stimulus get agent closer to goal?
Outcome Probability and Discrepancy from Expectation • Measures how likely a prediction is and how accurate the prediction is • Influences emotion intensity via “surprise factor” (unvalenced) • Predictions and Outcome Probability generated via learned task model • Results in non-stationary reward • Discrepancy generated via comparison to prediction • Added these appraisals on top of Conduciveness
Goal Relevance • Measures how important a stimulus is for the goal • Influences emotion intensity (unvalenced) • Value based on “path” knowledge • Agent actually had too much path knowledge, so removed some • The value of Goal Relevance for some stimulus is used to “boost” the Q-value of the Attend operator for that stimulus • Added this appraisal on top of Conduciveness, Outcome Probability, and Discrepancy
Intrinsic Pleasantness • Measures how attracted the agent is to a stimulus independent of the current goal • Influences emotion intensity and valence • Made blocks intrinsically pleasant • This is good because blocks need to be Attended to get cleaned up • This is bad because agent may be distracted by blocks that have already been cleaned up • Replaced Goal Relevance with this appraisal
Dynamic Exploration Rate • Dynamically adjust exploration rate based on current emotion • If Valence < 0, then things could probably be betterER = |Intensity * Valence| • If Valence > 0, then things are okER = 0 • Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only
Dynamic Learning Rate • Dynamically adjust learning rate based on current emotion • If reward magnitude is large, then there may be something to learnLR = |Intensity*Valence| • Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only, Dynamic Exploration Rate enabled
Dynamic Exploration and Learning Rates • Dynamically adjust exploration and learning rates based on current emotion • If Valence < 0, then things could probably be betterER = |Intensity * Valence| • If Valence > 0, then things are okER = 0 • If reward magnitude is large, then there may be something to learnLR = |Intensity*Valence| • Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only • Results: Tighter convergences, better prediction accuracy, small number of failures
Secondary Contributions • Reinforcement learning can be driven by intrinsically generated rewards based on the agent’s feeling • Reinforcement learning parameters can be influenced by the current emotional state, resulting in improved performance • Each appraisal contributes to the agent’s performance • The system scales to continuous time and space environments • Mood averages reward over time, allowing states with no reward-invoking stimulus to still have a reward associated with them
Future Work • Integration with other architectural mechanisms • Learning (appraisal values, intend, etc.) Non-verbal communication • Sociocultural interactions • More appraisals (social, perceptual, etc.) Basic drives • Human data • Functionality • Decision making Action tendencies • Behavior • Believability • Physiological measures
Benefits of Soar • Parallel rule firing allows for: • Parallel Encoding • Parallel appraisal generation • Parallel Decoding (theoretically) • Impasses provide: • Architectural support for PEACTIDM-related subgoals • Intend • Comprehend (theoretically) • Support for fast and extended inference, and transitioning from extended to fast (chunking) • Intend in button task starts out extended and becomes fast • Reinforcement learning allows fast learning from emotion feedback • Future benefits: • New modules may assist in appraisal generation • Episodic/semantic memories, visual imagery, etc.
PEACTIDM and GOMS • In general, these are complementary techniques • GOMS • Focused on HCI • Focused on motor actions (e.g. keypresses) • Less focus on cognitive aspects (more abstract) • PEACTIDM • Focused on required cognitive functions • Allows for a mapping with appraisals • Could implement PEACTIDM with GOMS, but would lack the proper labels that allow for the mapping
Relating Emotion to Intrinsically Motivated RL • Emotion intensity and valence used to: • Generate intrinsic rewards • Various appraisals contribute to the reward signal with varying success • Frequent reward signals allow agent to learn faster, but can also introduce infinite reward cycles • Task modeling helps address cycles • Automatically adjust parameters • Learning and exploration rates • Helps reduce unnecessary exploration, bad learning
Learning the Task Model Perception/Encoding Stimulus 1 Stimulus 2 Stimulus 3 Task Memory Stim3 Stim2 Prediction (generic) Stim2 0.57 Outcome Probability 0.5 1.0 0.0 0.1 0.15 0.2 Discrepancy 0.0 0.5 Stim1 Surprise Factor 0.5 Medium Medium 0.43 Lower Lower 0.0 0.0 0.0 1.0 0.4 0.2 Intensity Stim3 Reward
Extending Soar with Emotion(Marinier & Laird 2007) • Soar is a cognitive architecture • A cognitive architecture is a set of task-independent mechanisms that interact to give rise to behavior • Cognitive architectures are general agent frameworks Episodic Semantic Symbolic Long-Term Memories Procedural Semantic Learning Episodic Learning Chunking Reinforcement Learning Feeling Generation Short-Term Memory Situation, Goals Decision Procedure Visual Imagery Perception Action Body
Extending Soar with Emotion(Marinier & Laird 2007) Episodic Semantic Symbolic Long-Term Memories Procedural Semantic Learning Episodic Learning Chunking Reinforcement Learning +/-Intensity Feeling Generation Feeling .9,.6,.5,-.1,.8,… Short-Term Memory Situation, Goals Feelings Decision Procedure Feelings Appraisals Visual Imagery Emotion .5,.7,0,-.4,.3,… Mood .7,-.2,.8,.3,.6,… Perception Action Knowledge Body Architecture
Computing Feeling from Emotion and Mood • Assumption: Appraisal dimensions are independent • Limited Range: Inputs and outputs are in [0,1] or [-1,1] • Distinguishability: Very different inputs should lead to very different outputs • Non-linear: Linearity would violate limited range and distinguishability
Maze Tasks no distractions distractions single subgoal impossible multiple subgoals
Feeling Dynamics Results very easy
Computing Feeling Intensity • Motivation: Intensity gives a summary of how important (i.e., how good or bad) the situation is • Limited range: Should map onto [0,1] • No dominant appraisal: No single value should drown out all the others • Can’t just multiply values, because if any are 0, then intensity is 0 • Realization principle: Expected events should be less intense than unexpected events
Learning task Start Goal Optimal Subtasks