1 / 29

CINEMO – A French Spoken Language Resource for Complex Emotions: Facts and Baselines

CINEMO – A French Spoken Language Resource for Complex Emotions: Facts and Baselines. Björn Schuller , Riccardo Zaccarelli, Nicolas Rollet, Laurence Devillers. CNRS-LIMSI Spoken Language Processing Group Orsay , France. Thursday 20th May 2010, 12.25-12.45 PM, O21 - Emotion, Sentiment.

campbellg
Download Presentation

CINEMO – A French Spoken Language Resource for Complex Emotions: Facts and Baselines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CINEMO – A French Spoken Language Resource for Complex Emotions: Facts and Baselines Björn Schuller, Riccardo Zaccarelli, Nicolas Rollet, Laurence Devillers CNRS-LIMSI Spoken Language Processing Group Orsay, France Thursday 20th May 2010, 12.25-12.45 PM, O21 - Emotion, Sentiment

  2. Outline • Introduction • CINEMO Corpus Statistics • Recognition of Complex Emotions • Conclusions Björn Schuller

  3. Models of Emotion • Dimensional Model • Orthogonal system: • Arousal, valence, dominance/potency, ... • Ideally non-correlated • Categorical Model • Discrete affective states • e.g. „Big 6“ (Ekman/MPEG-4) • Assignable in emotion sphere • “Intensity” turns category into dimension • Complex Emotions • “Soft” hit for several categories • “Major / minor” emotion Björn Schuller

  4. Databases – Nine Popular Examples Björn Schuller

  5. CINEMO Corpus Statistics

  6. Corpus Stats and Protocol • Size • 3 992 instances after segmentation • 2:13:59 h net playtime • Subjects • 51 speakers: • 21 female (1 656 instances), 30 male (2 336 instances) • 4 age groups • None professional actor • Protocol • Dubbing selected scenes from 12 French movies • Broad coverage of emotions • Situations close to everyday emotions (Rottenberg et al., 2007) • Suited to well induce mood (Gerrards-Hesse et al., 1994) Björn Schuller

  7. A Dozen Movies • Good Blend to Cover Emotions • Extrapolation of interpersonal behavior patterns • Affective Computing • Areas of Application • Interpretation of the user intention • Accommodation in the communication • Objective measurement • Transmission of emotion • Emotional adaptation • Multimedia Retrieval • Video gaming and entertainment • Surveillance • Encoding Björn Schuller

  8. Movies • “Karaoke” • Participants superpose voice on actor’s • Actor’s voice audible or muted • Dialog/pauses shown as a Karaoke • Current word highlighted • Spoken interactions, natural contexts • Example Scene: “Chaos” • Affective state: sadness, disappointment • Description: speaker reports • humiliating behavior of boyfriend • Involvement’s degree: highly implicated • Type of action: storytelling • Implied temporalities: recent past Björn Schuller

  9. Scenes and Roles • Numbers • 29 scenes, 1 or 2 players at a time: • 14 male, 7 female, 6 mixed gender, 2 female–female scenes • 31 roles:14 female and 17 male • Scene Repetition • Each scene could be repeated • Number of occurrences per attempt: • 1 945 (first), 1 518 (second), 433 (third), 84 (fourth), 12 (fifth) • Mean number of scene repetition: 1.67 Björn Schuller

  10. A Linguistic Perspective • N-Gram Frequencies • 119 turns with 1 609 words • Vocabulary size of 562 • 4.4 graphemes on average • Uni-grams “c”’ (this), “est” (is), and “j’ ” (I) > 50 times • Bi-gram “c’est” > 10 times Björn Schuller

  11. Segmentation and Annotation • Sequential Processing • At present complete annotation by 2 experienced labelers: • 𝐿1: male, 31 years; 𝐿2: female, 26 years • 2 strategies intentionally followed: • 𝐿1 provided with sequential order, manually segmented audio • 𝐿2 provided with single instances in random order for verification • Balanced Segmentation Interests • Syntax, pragmatic, stationarity of major emotion • Shorter segments preferred • Predominant non-linguistic vocalizations as boundaries • After segmentation: • min. 24, max. 189, median 74, std. dev. 41 instances per speaker Björn Schuller

  12. Segmentation and Annotation • Labelling per Instance • Speaker ID/gender, movie ID, attempt, running ID, begin/end time • Major and minor emotion attribute (16 options) • Mood (7 options: amusement, irritation, neutrality, embarrassment, positivity, stress, timidity, 𝜅=0.41) • 6 Dimensions: 3 states Björn Schuller

  13. Annotation • Major and Minor • Frequencies per labeller Björn Schuller

  14. Annotation • Major and Minor • Heat map of pairs • Potentially 256 combinations • 118 found in the set • Strong presence of blended • Full agreement on major/minor: • 105 combinations • 2 091 instances • i. e. half of the corpus • Blended emotions well identifiable Björn Schuller

  15. Annotation • Distribution of Dimensions • Typical imbalance in favor of negative valence Björn Schuller

  16. Annotation • Agreement Dimensions • Monotonic increase from unweighted to quadratic kappa: • label confusions preferably in neighboring classes • Apart from suddenness, good concurrence at 𝜅 ≥ 0.4 Björn Schuller

  17. Recognition ofComplexEmotions

  18. Data Partitioning • Train, Development, Test • Foster easy reproducibility of results • Proper definition of a development set • Straightforward three-fold partitioning by speaker index: • Train (≈40%/ 21 speakers: ID 1–21) • Development (≈30%/15 speakers: ID 22–36) • Test (≈30%/ 15 speakers: ID 37–51) • Strict speaker independence • ‘Genuine’ results w/o previous fine-tuning on the test partition Björn Schuller

  19. Acoustic Features • openEAR • openSMILE’s “base” set • 988 features • Slight extension over • INTERSPEECH 2009 • Emotion Challenge • Systematic brute-forcing • 19 functionals of • 26 low-level descriptors • SMA LP filtered • Plus regression coeff’s Björn Schuller

  20. Problem Complexity • Upper Bounds • First major and minor emotions separately • Max. 16 classes • Then complex compound • Max. 256 classes (quadratic number as order matters) • Not all permutations occur • Dependencies among labels have to be assumed: • Scripted recording protocol and in general Björn Schuller

  21. Classification Strategy • Alternatives • Best fuzzy architecture for multiple labels: • e.g. multi-task neural networks? • Different weighting of major/minor emotion • comparison with the N-best result list? • Chosen Way • ‘Traditional’ Support Vector Machines • Polynomial Kernel • Pair-wise multi-class discrimination • Sequential Minimal Optimization learning • Training up-sampled in case of high class imbalance Björn Schuller

  22. Three Examples • ‘Fixed Minor’ • ‘Conventional’ case • Minor emotion fixed as neutral • Major emotion varied • Full labeler agreement • 950 instances, 5 classes providing sufficient instances • (major–minor, # instances): • AMU –NEU (79) • DEC –NEU (204) • ENE –NEU (359) • INQ –NEU (202) • SAT –NEU (106) Björn Schuller

  23. Three Examples • ‘Fixed Major’ • Different blends of irritation • Major emotion fixed as irritation • Minor emotion varied • Full labeler agreement • 607 instances, again 5 classes providing sufficient instances • ENE– COL (186) • ENE– DEC (110) • ENE– INQ (66) • ENE– IRO (51) • ENE– NEU (184) Björn Schuller

  24. Three Examples • ‘Fully Mixed’ • Full labeler agreement • 533 instances, again 5 classes providing sufficient instances • INQ–NEU (114) • STR–INQ (63) • ENE–COL (186) • ENE–DEC (110) • JOI–SUR (60) • Examples in no stricter relation to each other • But: demonstrate that feasible even in full major/minor mix Björn Schuller

  25. Three Examples • Results • Weighted Average Recall (WAR, i. e. recognition rate) • Unweighted Average Recall (UAR, reflect imbalance among classes) • Area under the receiver operating curve (AUC) Björn Schuller

  26. Regression Baseline • Results for Selected Dimensions • Ground truth by mean of labellers • All instances used • Cross correlation (CC), mean linear error (MLE) • Support Vector Regression • Prediction can be used as features for complex emotions • Highly imbalanced distribution Björn Schuller

  27. Conclusions

  28. Conclusions • Corpus for Complex Emotions • Comparatively large CINEMO corpus • Baselines • First impressions on the challenge • Future Directions • … Future large resources with recordings ‘in the wild’ • Tailored classification architectures: • Exploit the mutual information among major and minor emotions • Complex ‘language models’ to reflect transition probabilities Björn Schuller

  29. This work was partly funded by the ANR project Affective Avatar. Merci.

More Related