120 likes | 308 Views
ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE. Chun-Yu Chen. Wooil Kim and John H. L. Hansen. Outline. Real conversational speech corpus TEO-CB-AUTO-ENV Emotional language model score Experimental results. Real conversational speech corpus.
E N D
ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECHBY LEVERAGING CONTENT STRUCTURE Chun-Yu Chen Wooil Kim and John H. L. Hansen
Outline Real conversational speech corpus TEO-CB-AUTO-ENV Emotional language model score Experimental results
Real conversational speech corpus Neutral speech digits , alphabets , and other words (First, July, August) specific information Angry speech negative words (not, no, can’t, even, how) Complaints others(that, this, here)
TEO-CB-AUTO-ENV one of the acoustic features for angry speech detection designed to represent nonlinear characteristics of the voiced sound production (e.g., vowels) The resulting vector of area coefficients has been shown to be large for neutral speech
Emotional language model score two types of combination methods feature combination MFCC feature vector is appended to the TEO-CB-Auto-Env feature vector classifier combination combining the likelihood scores from both classifiers with a scale factor
Emotional language model score “Emotional” language models Based on an initial language model with a large vocabulary (HUB4) using the transcripts of neutral and angry speech using HTK and CMU-Cambridge SLMT toolkit to adapting the initial laguage model formulate a 2-dimensional feature vector for a “lexical” feature
Experimental results Collect data 15 female and 13 male speakers 136 segments for neutral speech and 124 segments for angry speech Each segment has 3-6 sec
Experimental results Two type of model for test Open-speaker model training by all data except tester’s Close-speaker Split to two part of data Tester only speak utterance in part A Model is training by part B More performance by include more data
Experimental results Without EMLS MFCC-EDZ is best in single feature
Experimental results With EMLS