1 / 19

Multimodal Emotion Recognition

Multimodal Emotion Recognition. Colin Grubb Advisor: Nick Webb. Motivation. Multimodal fusion Research looking at audio, visual, and gesture information Feature Level vs. Decision Level. Previous Research.

saul
Download Presentation

Multimodal Emotion Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multimodal Emotion Recognition Colin Grubb Advisor: Nick Webb

  2. Motivation

  3. Multimodal fusion Research looking at audio, visual, and gesture information Feature Level vs. Decision Level Previous Research

  4. To what extent can we improve emotion recognition by using classification methods on audio and visual data? Research Question

  5. Set of rules vs. training a classifier Rule set is too basic Will use classifier to learn outputs of unimodal systems Decision Level Analysis

  6. EmoVoice (EMV) Real Time Audio Analysis Five emotional states w/ probabilities Published accuracy: 47.67% Audio System https://www.informatik.uni-augsburg.de/en/chairs/hcm/projects/emovoice/

  7. (Negative Active)  Angry (Negative Passive)  Sad (NEutral)  Neutral (PositveActive)  Happy (Positive Passive)  Content EmoVoice Confidence Levels negativeActive <0.40, 0.20, 0.10, 0.15, 0.15>

  8. Software created by Prof. Shane Cotter Uses still images Published accuracy: 93.4% Visual System

  9. Images Emotion: Happy Video Software Classifier Output: Happy I’m in a good mood! System Layout Emotion: Happy EmoVoice

  10. 8 subjects Five male, three female Audio Data Read sample sentences Visual Data Gather facial expressions from regular and long distance (6 ft.) Data Gathering

  11. 1 Weka Data Mining Software Used J48 Classifier C4.5 algorithm – decision tree Each branch represents decision made at that node 2 3 Experiments Output 1 Output 2 Output 3 Output 4 http://www.cs.waikato.ac.nz/ml/weka/

  12. Final dataset classifies between Happy Angry Neutral Sad Audio performance: 38.43% Visual performance: 77.43 % Emotion Classes

  13. Ran combined dataset against J.48 classifier Multimodal data initially ineffective Needed a way to improve dataset Initial Performance

  14. How can we use the two individual systems to complement each other? Two pieces of information: What does the visual system do poorly on? What kind of biases does EmoVoice have? Improving Accuracy

  15. Visual System Performs poorly at Neutral Some inaccuracy for all emotions tested EmoVoice Bias towards negative voice Very strong bias towards active voice Manual Bias

  16. Happy: For all happy training instances, if PP + PA > NA & NE & NP, change EMV Class to Happy Sad: If NP is 2nd to NA and within 0.05, change EMV Class to Sad Neutral: If NE tied with another confidence level, change EMV Class to Neutral If all probabilities within 0.05 of each other, change EMV Class to Neutral EmoVoice – Modification Rules

  17. Post Man. Bias Results

  18. Spring Practicum Refine rules Automation Online Classifier Mount on robot; cause apocalypse Future Work

  19. Questions? Comments? Thank you for listening.

More Related