1 / 1

Introduction

Robust Recognition of Emotion from Speech in e-Leaning Environment Mohammed E. Hoque 1,2 , Mohammed Yeasin 1 , Max Louwerse 2. (a) The word “OK” uttered under confusion . (b) The word “OK” uttered under flow . (d) The word “OK” uttered normally . (c) The word “OK” uttered under delight .

angeni
Download Presentation

Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Robust Recognition of Emotion from Speech in e-Leaning Environment Mohammed E. Hoque1,2 , Mohammed Yeasin1 , Max Louwerse2 (a) The word “OK” uttered under confusion (b) The word “OK” uttered under flow (d) The word “OK” uttered normally (c) The word “OK” uttered under delight Emotion Positive Negative Flow Delight Frustration Confusion 1. Computer Vision, Pattern and Image Analysis Laboratory Department of Electrical and Computer Engineering University of Memphis, TN 38152 2. Multimodal Aspects of Discourse (MAD) Laboratory Department of Psychology / Institute for Intelligent Systems University of Memphis, TN 38152 Introduction • Emotion in Speech in learning environment is a strong indicator of how effective the learning process is [1,2]. • This assertion has impacted the study of emotion significantly. • Our aim is to identify salient words and observe their prosodic features. • Any set of words can be expressed with different intonational patterns and they will convey totally different meanings, as shown in figure 1. • Therefore, we argue that extracting lexical and prosodic features from “salient words” only will yield robust recognition of emotion from speech in learning environment. Figure 1: Pitch of the word “OK” in various emotional states. Categories of Emotion Databases (a) (b) (c) Figure 2: Categories of emotion pertinent to e-Learning Three movies were selected to clip emotional utterances from. (a) Fahrenheit 911. (b) Bowling for Columbine. (c) Before Sunset. Novel Features of Speech Results Pitch: Minimum, Maximum, Mean, Standard deviation, Absolute Value, Quantile, Ratio between voiced and unvoiced frames. Formant: First formant, Second formant, Third formant, Fourthformant, Fifth formant, Second formant / first formant, Third formant / first formant Intensity: Minimum, Maximum, Mean, Standard deviation, Quantile. 21 different classifiers are used to validate the robustness of our algorithm to distinguish between positive and negative emotions as shown in Table 1. The comparison between with and without using data projection/reduction techniques on the features are also demonstrated in Table 1. Table 2 shows a comparison between how the classifiers performed in distinguishing the delight and flow in positive emotion and confusion and frustration in negative emotion. Results show that negative emotions are classified better than positive emotions. Figure 3: Clustered speech features, after reducing their dimensions using both Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). Conclusion • The hypothesis of extracting prosodic features from salient words has been successfully demonstrated. • The results have been validated using 21 different classifiers. • The results from the classifiers are cross validated by 10 folds. • The results show that applying data projection and dimension reduction techniques, such as Principal Component Analysis and Linear Discriminant Analysis, yield better results. • Classifiers performed nearly 100% to distinguish between frustration and confusion. • Classifiers performed comparatively worse in distinguishing between positive patterns such as delight and flow. • The next phase of the project will involve testing the algorithm on maptask data collected from the i-MAP project of Institute for Intelligent Systems (IIS). • Future efforts are going to involve fusing multimodal channels such as facial expression, speech and gestures in both decision and feature levels. Acknowledgements This research was partially supported by grant NSF-IIS-0416128 awarded to the third author. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding institution. References [1] Craig, S. D. & Gholson, B. (2002, July). Does an agent matter?: The Effects of Animated Pedagogical Agents on Multimedia Environments. In Barker P. & Rebelsky S. (Eds.) Proceedings for ED-MEDIA 2002: World Conference on Educational Multimedia, Hypermedia and Telecommunications. (357-362). Norfolk, VA: Association for the Advancement of Computing in Education. [2] Craig, S. D., Gholson, B., & Driscoll, D. (2002) Animated Pedagogical Agents in Multimedia Educational Environments: Effects of Agent Properties, Picture Features, and Redundancy. Journal of Educational Psychology. 94, (428-434). Table 2: Classification results in positive and negative emotion Table 1: Classification results for positive and negative emotion

More Related