290 likes | 416 Views
CINEMO – A French Spoken Language Resource for Complex Emotions: Facts and Baselines. Björn Schuller , Riccardo Zaccarelli, Nicolas Rollet, Laurence Devillers. CNRS-LIMSI Spoken Language Processing Group Orsay , France. Thursday 20th May 2010, 12.25-12.45 PM, O21 - Emotion, Sentiment.
E N D
CINEMO – A French Spoken Language Resource for Complex Emotions: Facts and Baselines Björn Schuller, Riccardo Zaccarelli, Nicolas Rollet, Laurence Devillers CNRS-LIMSI Spoken Language Processing Group Orsay, France Thursday 20th May 2010, 12.25-12.45 PM, O21 - Emotion, Sentiment
Outline • Introduction • CINEMO Corpus Statistics • Recognition of Complex Emotions • Conclusions Björn Schuller
Models of Emotion • Dimensional Model • Orthogonal system: • Arousal, valence, dominance/potency, ... • Ideally non-correlated • Categorical Model • Discrete affective states • e.g. „Big 6“ (Ekman/MPEG-4) • Assignable in emotion sphere • “Intensity” turns category into dimension • Complex Emotions • “Soft” hit for several categories • “Major / minor” emotion Björn Schuller
Databases – Nine Popular Examples Björn Schuller
Corpus Stats and Protocol • Size • 3 992 instances after segmentation • 2:13:59 h net playtime • Subjects • 51 speakers: • 21 female (1 656 instances), 30 male (2 336 instances) • 4 age groups • None professional actor • Protocol • Dubbing selected scenes from 12 French movies • Broad coverage of emotions • Situations close to everyday emotions (Rottenberg et al., 2007) • Suited to well induce mood (Gerrards-Hesse et al., 1994) Björn Schuller
A Dozen Movies • Good Blend to Cover Emotions • Extrapolation of interpersonal behavior patterns • Affective Computing • Areas of Application • Interpretation of the user intention • Accommodation in the communication • Objective measurement • Transmission of emotion • Emotional adaptation • Multimedia Retrieval • Video gaming and entertainment • Surveillance • Encoding Björn Schuller
Movies • “Karaoke” • Participants superpose voice on actor’s • Actor’s voice audible or muted • Dialog/pauses shown as a Karaoke • Current word highlighted • Spoken interactions, natural contexts • Example Scene: “Chaos” • Affective state: sadness, disappointment • Description: speaker reports • humiliating behavior of boyfriend • Involvement’s degree: highly implicated • Type of action: storytelling • Implied temporalities: recent past Björn Schuller
Scenes and Roles • Numbers • 29 scenes, 1 or 2 players at a time: • 14 male, 7 female, 6 mixed gender, 2 female–female scenes • 31 roles:14 female and 17 male • Scene Repetition • Each scene could be repeated • Number of occurrences per attempt: • 1 945 (first), 1 518 (second), 433 (third), 84 (fourth), 12 (fifth) • Mean number of scene repetition: 1.67 Björn Schuller
A Linguistic Perspective • N-Gram Frequencies • 119 turns with 1 609 words • Vocabulary size of 562 • 4.4 graphemes on average • Uni-grams “c”’ (this), “est” (is), and “j’ ” (I) > 50 times • Bi-gram “c’est” > 10 times Björn Schuller
Segmentation and Annotation • Sequential Processing • At present complete annotation by 2 experienced labelers: • 𝐿1: male, 31 years; 𝐿2: female, 26 years • 2 strategies intentionally followed: • 𝐿1 provided with sequential order, manually segmented audio • 𝐿2 provided with single instances in random order for verification • Balanced Segmentation Interests • Syntax, pragmatic, stationarity of major emotion • Shorter segments preferred • Predominant non-linguistic vocalizations as boundaries • After segmentation: • min. 24, max. 189, median 74, std. dev. 41 instances per speaker Björn Schuller
Segmentation and Annotation • Labelling per Instance • Speaker ID/gender, movie ID, attempt, running ID, begin/end time • Major and minor emotion attribute (16 options) • Mood (7 options: amusement, irritation, neutrality, embarrassment, positivity, stress, timidity, 𝜅=0.41) • 6 Dimensions: 3 states Björn Schuller
Annotation • Major and Minor • Frequencies per labeller Björn Schuller
Annotation • Major and Minor • Heat map of pairs • Potentially 256 combinations • 118 found in the set • Strong presence of blended • Full agreement on major/minor: • 105 combinations • 2 091 instances • i. e. half of the corpus • Blended emotions well identifiable Björn Schuller
Annotation • Distribution of Dimensions • Typical imbalance in favor of negative valence Björn Schuller
Annotation • Agreement Dimensions • Monotonic increase from unweighted to quadratic kappa: • label confusions preferably in neighboring classes • Apart from suddenness, good concurrence at 𝜅 ≥ 0.4 Björn Schuller
Data Partitioning • Train, Development, Test • Foster easy reproducibility of results • Proper definition of a development set • Straightforward three-fold partitioning by speaker index: • Train (≈40%/ 21 speakers: ID 1–21) • Development (≈30%/15 speakers: ID 22–36) • Test (≈30%/ 15 speakers: ID 37–51) • Strict speaker independence • ‘Genuine’ results w/o previous fine-tuning on the test partition Björn Schuller
Acoustic Features • openEAR • openSMILE’s “base” set • 988 features • Slight extension over • INTERSPEECH 2009 • Emotion Challenge • Systematic brute-forcing • 19 functionals of • 26 low-level descriptors • SMA LP filtered • Plus regression coeff’s Björn Schuller
Problem Complexity • Upper Bounds • First major and minor emotions separately • Max. 16 classes • Then complex compound • Max. 256 classes (quadratic number as order matters) • Not all permutations occur • Dependencies among labels have to be assumed: • Scripted recording protocol and in general Björn Schuller
Classification Strategy • Alternatives • Best fuzzy architecture for multiple labels: • e.g. multi-task neural networks? • Different weighting of major/minor emotion • comparison with the N-best result list? • Chosen Way • ‘Traditional’ Support Vector Machines • Polynomial Kernel • Pair-wise multi-class discrimination • Sequential Minimal Optimization learning • Training up-sampled in case of high class imbalance Björn Schuller
Three Examples • ‘Fixed Minor’ • ‘Conventional’ case • Minor emotion fixed as neutral • Major emotion varied • Full labeler agreement • 950 instances, 5 classes providing sufficient instances • (major–minor, # instances): • AMU –NEU (79) • DEC –NEU (204) • ENE –NEU (359) • INQ –NEU (202) • SAT –NEU (106) Björn Schuller
Three Examples • ‘Fixed Major’ • Different blends of irritation • Major emotion fixed as irritation • Minor emotion varied • Full labeler agreement • 607 instances, again 5 classes providing sufficient instances • ENE– COL (186) • ENE– DEC (110) • ENE– INQ (66) • ENE– IRO (51) • ENE– NEU (184) Björn Schuller
Three Examples • ‘Fully Mixed’ • Full labeler agreement • 533 instances, again 5 classes providing sufficient instances • INQ–NEU (114) • STR–INQ (63) • ENE–COL (186) • ENE–DEC (110) • JOI–SUR (60) • Examples in no stricter relation to each other • But: demonstrate that feasible even in full major/minor mix Björn Schuller
Three Examples • Results • Weighted Average Recall (WAR, i. e. recognition rate) • Unweighted Average Recall (UAR, reflect imbalance among classes) • Area under the receiver operating curve (AUC) Björn Schuller
Regression Baseline • Results for Selected Dimensions • Ground truth by mean of labellers • All instances used • Cross correlation (CC), mean linear error (MLE) • Support Vector Regression • Prediction can be used as features for complex emotions • Highly imbalanced distribution Björn Schuller
Conclusions • Corpus for Complex Emotions • Comparatively large CINEMO corpus • Baselines • First impressions on the challenge • Future Directions • … Future large resources with recordings ‘in the wild’ • Tailored classification architectures: • Exploit the mutual information among major and minor emotions • Complex ‘language models’ to reflect transition probabilities Björn Schuller
This work was partly funded by the ANR project Affective Avatar. Merci.