1.24k likes | 1.7k Views
multimodal+emotion+recognition. a.k.a. ‘better than the sum of its parts’. Kostas Karpouzis Assoc. researcher ICCS/NTUA http://www.image.ntua.gr. multimodal+emotion+recognition. Three very different (and interesting!) problems
E N D
multimodal+emotion+recognition a.k.a. ‘better than the sum of its parts’ Kostas Karpouzis Assoc. researcher ICCS/NTUA http://www.image.ntua.gr
multimodal+emotion+recognition • Three very different (and interesting!) problems • What is ‘multimodal’, why do we need it, what do we earn from that? • What is ‘emotion’ in HCI applications? • What can we recognize and, better yet, what should we recognize?
multimodal+emotion+recognition • In terms of R&D, emotion/affect-aware human-computer interaction is a hot topic • Novel, interesting application for existing algorithms • Demanding test bed for feature extraction and recognition tasks • …and just wait until we bring humans in the picture!
multimodal+emotion+recognition • In terms of R&D, emotion/affect-aware human-computer interaction is a hot topic • Dedicated conferences (e.g. ACII, IVA, etc.) and planned journals • Humaine Network of Excellence Humaine Association • http://emotion-research.net • Integrated Projects (CALLAS, Companions, LIREC, Feelix Growing, etc.)
yours truly • Associate researcher at ICCS/NTUA, Athens • Completed post-doc within Humaine • Signals to signs of emotion • Co-editor of Humaine Handbook • Member of the EC of the Humaine Association • Emotion modelling and development in Callas, Feelix Growing FP6 Projects
what next • first we define ‘emotion’ • terminology • semantics and representations • computational models • emotion in interaction • emotion in natural interaction
what next • then ‘multimodal’ • modalities related to emotion and interaction • fusing modalities (how?, why?) • handling uncertainty, noise, etc. • which features from each modality? • semantics of fusion
what next • and ‘recognition’ • from individual modalities (uni-modal) • across modalities (multi-modal) • static vs. dynamic recognition • what can we recognize? • can we extend/enrich that? • context awareness
what next • affect and emotion aware applications • can we benefit from knowing a user’s emotional state? • missing links • open research questions for the following years
terminology • Emotions, mood, personality • Can be distinguished by • time (short-term vs. long-term) • influence (unnoticed vs. dominant) • cause (specific vs. diffuse) • Affect classified by time • short-term: emotions (dominant, specific) • medium-term: moods (unnoticed, diffuse) • and long-term: personality (dominant)
terminology • what we perceive is the expressed emotion at a given time • on top of a person’s current mood, which may change over time, but not drastically • and on top of their personality • usually considered a base line level • which may differ from what a person feels • e.g. we despise someone, but are forced to be polite
terminology • Affect is an innately structured, non-cognitive evaluative sensation that may or may not register in consciousness • Feeling is defined as affect made conscious, possessing an evaluative capacity that is not only physiologically based, but that is often also psychologically oriented. • Emotion is psychosocially constructed, dramatized feeling
how it all started • Charles Darwin, 1872 • Ekman et al. since the 60s • Mayer and Salovey, papers on emotional intelligence, 90s • Goleman’s book: Emotional Intelligence: Why It Can Matter More Than IQ • Picard’s book: Affective Computing, 1997
why emotions? • “Shallow” improvement of subjective experience • Reason about emotions of others • To improve usability • Get a handle on another aspect of the "human world" • Affective user modeling • Basis for adaptation of software to users
name that emotion • so, we know what we’re after • but we have to assign it a name • in which we all agree upon • and means the same thing for all (most?) of us • different emotion representations • different context • different applications • different conditions/environments
emotion representations • most obvious: labels • people use them in everyday life • ‘happy’, ‘sad’, ‘ironic’, etc. • may be extended to include user states, e.g. ‘tired’, which are not emotions • CS people like them • good match for classification algorithms
labels • but… • we have to agree on a finite set • if we don’t, we’ll have to change the structure of our neural nets with each new label • labels don’t work well with measurements • is ‘joy’ << ‘exhilaration’ and in what scale? • do scales mean the same to the expresser and all perceivers?
labels • Ekman’s set is the most popular • ‘anger’, ‘disgust’, ‘fear’, ‘joy’, ‘sadness’, and ‘surprise’ • added ‘contempt’ in the process • Main difference to other sets of labels: • universally recognizable across cultures • when confronted with a smile, all people will recognize ‘joy’
from labels to machine learning • when reading the claim that ‘there are six facial expressions recognized universally across cultures’… • …CS people misunderstood, causing a whole lot of issues that still dominate the field
strike #1 • ‘we can only recognize these six expressions’ • as a result, all video databases used to contain images of sad, angry, happy or fearful people • a while later, the same authors discussed ‘contempt’ as a possible universal, but CS people weren’t listening
strike #2 • ‘only these six expressions exist in human expressivity’ • as a result, more sad, angry, happy or fearful people, even when data involved HCI • can you really be afraid when using your computer?
strike #3 • ‘we can only recognize extreme emotions’ • now, happy people grin, sad people cry or are scared to death when afraid • however, extreme emotions are scarce in everyday life • so, subtle emotions and additional labels were out of the picture
labels are good, but… • don’t cover subtle emotions and natural expressivity • more emotions are available in everyday life and usually masked • hence the need for alternative emotion representations • can’t approach dynamics • can’t approach magnitude • extreme joy is not defined
other sets of labels • Plutchik • Acceptance, anger, anticipation, disgust, joy, fear, sadness, surprise • Relation to adaptive biological processes • Frijda • Desire, happiness, interest, surprise, wonder, sorrow • Forms of action readiness • Izard • Anger, contempt, disgust, distress, fear, guilt, interest, joy, shame, surprise
other sets of labels • James • Fear, grief, love, rage • Bodily involvement • McDougall • Anger, disgust, elation, fear, subjection, tender-emotion, wonder • Relation to instincts • Oatley and Johnson-Laird • Anger, disgust, anxiety, happiness, sadness • Do not require propositional content
going 2D • vertical: activation (active/passive) • horiz.: evaluation (negative/positive)
going 2D • emotions correspond to points in 2D space • evidence that some vector operations are valid, e.g. ‘fear’ + ‘sadness’ = ‘despair’
going 2D • quadrants useful in some applications • e.g. need to detect extreme expressivity in a call-centre application
going 3D • Plutchik adds another dimension • vertical intensity, circle degrees ofsimilarity • four pairs of opposites
going 3D • Mehrabian considers pleasure, arousal and dominance • Again, emotions are points in space
what about interaction? • these models describe the emotional state of the user • no insight as to what happened, why the user reacted and how the user will react • action selection • OCC (Ortony, Clore, Collins) • Scherer’s appraisal checks
OCC (Ortony, Clore, Collins) • each event, agent and object has properties • used to predict the final outcome/expressed emotion/action
OCC (Ortony, Clore, Collins) • Appraisals • Assessments of events, actions, objects • Valence • Whether emotion is positive or negative • Arousal • Degree of physiological response • Generating appraisals • Domain-specific rules • Probability of impact on agent’s goals
Scherer’s appraisal checks 2 theoretical approaches: • “Discrete emotions” (Ekman, 1992; Ekman & Frisen, 1975: EMFACS) • “Appraisal theory” of emotion (Scherer, 1984, 1992)
Scherer’s appraisal checks • Componential Approach • Emotions are elicited by a cognitive evaluation of antecedent events. • Patterning of reactions are shaped by this appraisal process. Appraisal dimensions are used to evaluate stimulus, in an adaptive way to the changes. • Appraisal Dimensions: Evaluation of significance of event, coping potential, and compatibility with the social norms
Autonomic responses contribute to the intensity of the emotional experience. Stimulus (Bang!) Stimulus (loud) General autonomic Arousal (heart races) Perception/ Interpretation Context (danger) Particular emotion experienced (fear) Emotion experienced will affect future interpretations Of stimuli and continuing autonomic arousal
Scherer’s appraisal checks • 2 theories, 2 sets of predictions:the example of Anger
summary on emotion • perceived emotions are usually short-lasting events across modalities • labels and dimensions are used to annotate perceived emotions • pros and cons for each • additional requirements for interactive applications
a definition • Raisamo, 1999 • “Multimodal interfaces combine many simultaneous input modalities and may present the information using synergistic representation of many different output modalities”
Twofold view • A Human-Centered View • common in psychology • often considers human input channels, i.e., computer output modalities, and most often vision and hearing • applications: a talking head, audio-visual speech recognition, ... • A System-Centered View • common in computer science • a way to make computer systems more adaptable
going multimodal • ‘multimodal’ is this decade’s ‘affective’! • plethora of modalities available to capture and process • visual, aural, haptic… • ‘visual’ can be broken down to ‘facial expressivity’, ‘hand gesturing’, ‘body language’, etc. • ‘aural’ to ‘prosody’, ‘linguistic content’, etc.
multimodal design Adapted from [Maybury and Wahlster, 1998]
paradigms for multimodal user interfaces • Computer as a tool • multiple input modalities are used to enhance direct manipulation behavior of the system • the machine is a passive tool and tries to understand the user through all different input modalities that the system recognizes • the user is always responsible for initiating the operations • follows the principles of direct manipulation [Shneiderman, 1982; 1983]
paradigms for multimodal user interfaces • Computer as a dialogue partner • the multiple modalities are used to increase the anthropomorphism in the user interface • multimodal output is important: talking heads and other human-like modalities • speech recognition is a common input modality in these systems • can often be described as an agent-based conversational user interface
why multimodal? • well, why not? • recognition from traditional unimodal databases had reached its ceiling • new kinds of data available • what’s in it for me? • have recognition rates improved? • or just introduced more uncertain features
essential reading • Communications of the ACM,Nov. 1999, Vol. 42, No. 11, pp. 74-81